# Grid Problem Introduction

The premise of this problem is a simple one - we create a deterministic MDP with `n` distinct states and `m` distinct actions, with each state having a (non-unique) reward. We want to see if the splitting algorithm can learn the states based simply on the initial reward-based clustering and by observing contradictions in transitions. 

## Dataset Creation

First, we load the relevant functions - be sure to append the relevant directory with the MDP algorithm by using `sys.path.append`! 

In [1]:
# Importing libararies and loading functions
import numpy as np
import matplotlib.pyplot as plt
import random

# Initializing file-path
import sys
sys.path.append('/Users/janiceyang/Dropbox (MIT)/ORC UROP/Opioids/Algorithm/')

from model import MDP_model
from MDPtools import Generate_random_MDP, sample_MDP_with_features_list
from clustering import initializeClusters, splitter, split_train_test_by_id
from testing import *
from grid_functions import * 

Now, we are going to create the a random MDP and set the rewards for each state of the MDP. `n` is the number of states, `m` is the number of actions, and we can also specify if we want a deterministic or action-dependent reward. Finally, we arbitrarily set some overlapping reward values for different clusters just to see how good our algorithm is at distinguishing between the clusters even though they will intially look the same (based on reward-based clustering).

In [2]:
# Defining parameters
n = 15
m = 3
reward_dep_action = False
deterministic = True

# Generating the actual MDP
P, R = Generate_random_MDP(n,
                           m,
                           reward_dep_action=reward_dep_action,
                           deterministic=deterministic)

# Altering some of the rewards so that there is some overlap
for i in range(n):
    R[i] = i%6*0.2

Now that we have our MDP, we can start generating some sample data by simulating some paths through the system. Here, `N` is the number of actors we want to simulate, and `T` is the total number of timesteps we want each to take. We will generate these samples by using a normal distribution with a `sigma` amount of noise, so that visually, each state will appear to take up a specific portion of an x-y grid (you will see below!). 

In [3]:
# Defining parameters
pfeatures = 2
sigma = [[0.01, 0], [0, 0.01]]
N = 250
T = 5

# Generating the normal distribution based on sigma noise
normal_distributions = UnifNormal(n,
                                     pfeatures,
                                     sigma)

# Generating a list of samples based on the distribution 
samples = sample_MDP_with_features_list(P,
                                        R,
                                        normal_distributions,
                                        N,
                                        T)

Finally, we transform this list of samples into a dataframe, which is what we will feed into the algorithm! The column `OG_CLUSTER` corresponds to the state that this point was in in the MDP we generated earlier. We will use compare this with the `CLUSTER` our algorithm finds to see how well it performs.

In [4]:
df = transformSamples(samples,
                      pfeatures)
df

Unnamed: 0,ID,TIME,FEATURE_1,FEATURE_2,ACTION,RISK,OG_CLUSTER
0,0,0,0.995471,1.048488,1,1.0,5
1,0,1,3.179850,0.034873,2,0.0,12
2,0,2,-0.065029,2.135088,0,0.4,2
3,0,3,1.994077,2.849567,2,1.0,11
4,0,4,-0.008032,0.894757,2,0.2,1
...,...,...,...,...,...,...,...
1245,249,0,1.057399,3.023545,1,0.2,7
1246,249,1,1.056206,0.900695,2,1.0,5
1247,249,2,0.016531,0.982024,0,0.2,1
1248,249,3,1.863872,1.026186,1,0.6,9


## Model Fit

In [7]:
# Model Fitting Parameters
max_k = 25
classification = 'DecisionTreeClassifier'
split_classifier_params = {'random_state':0, 'max_depth':2}
clustering = 'Agglomerative'
n_clusters = None
distance_threshold = 0.5
precision_thresh = 1e-14
random_state = 0
pfeatures = 2
gamma = 1
actions = [0, 1, 2]
h = -1
cv = 5
th = 0
eta = 25

In [8]:
m = MDP_model()
m.fit(df, # df: dataframe in the format ['ID', 'TIME', ...features..., 'RISK', 'ACTION']
    pfeatures, # int: number of features
    h, # int: time horizon (# of actions we want to optimize)
    gamma, # discount factor
    max_k, # int: number of iterations
    distance_threshold, # clustering diameter for Agglomerative clustering
    cv, # number for cross validation
    th, # splitting threshold
    eta, # incoherence threshold
    precision_thresh, # precision threshold
    classification, # classification method
    split_classifier_params, # classification params
    clustering,# clustering method from Agglomerative, KMeans, and Birch
    n_clusters, # number of clusters for KMeans
    random_state, # random seed
    plot=False,
    optimize=True,
    verbose=False)

  res_values = method(rvalues)
Splitting... |#Clusters:6:   0%|          | 0/19 [00:00<?, ?it/s]

Clusters Initialized


Splitting... |#Clusters:15:  47%|████▋     | 9/19 [00:11<00:12,  1.24s/it]


## Evaluating the Model

In [12]:
accuracy, df_accuracy = training_accuracy(m.df_trained)
print('Model Accuracy: ', accuracy*100, '%')

Model Accuracy:  100.0 %
