# Maze Experiment Replication

This notebook is an extension of the previous `Maze Experience` Notebook. Here, we will be running the full experiment on larger datasets, and evaluating the algorithm's performance. 

## Training Models

### Importing Modules

First, make sure the path inside the parenthesis is the path that contains the algorithm code.

In [None]:
import sys
sys.path.append('/Users/janiceyang/Dropbox (MIT)/ORC UROP/Opioids/Algorithm/')

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import random
import pandas as pd
import gym
import gym_maze
import pickle
import plotly.express as px

from model import MDP_model
from maze_functions import createSamples, opt_maze_trajectory, opt_model_trajectory, policy_accuracy, \
    get_maze_transition_reward, plot_paths, value_diff, get_maze_MDP, value_est, opt_path_value_diff
from MDPtools import SolveMDP
from testing import cluster_size, next_clusters, training_value_error, purity, plot_features, testing_value_error, \
    generalization_accuracy

mazes = {1: 'maze-v0',
         2: 'maze-sample-3x3-v0',
         3: 'maze-random-3x3-v0',
         4: 'maze-sample-5x5-v0',
         5: 'maze-random-5x5-v0',
         6: 'maze-sample-10x10-v0',
         7: 'maze-random-10x10-v0',
         8: 'maze-sample-100x100-v0',
         9: 'maze-random-100x100-v0',
         10: 'maze-random-10x10-plus-v0', # has portals 
         11: 'maze-random-20x20-plus-v0', # has portals 
         12: 'maze-random-30x30-plus-v0'} # has portals 

### Setting Parameters

In [None]:
# Dataset parameters
N = 200
T_max = 25
r = 0.4
maze = mazes[4]

In [None]:
# Model Fitting Parameters
max_k = 25
classification = 'DecisionTreeClassifier'
split_classifier_params = {'random_state':0, 'max_depth':2}
clustering = 'Agglomerative'
n_clusters = None
distance_threshold = 0.5
precision_thresh = 1e-14
random_state = 0
pfeatures = 2
gamma = 1
actions = [0, 1, 2, 3]
h = -1
cv = 5
th = 0
eta = 25

In [None]:
# Calculating optimal/true values for the maze
P, R = get_maze_MDP(maze)
f, rw = get_maze_transition_reward(maze)
true_v, true_pi = SolveMDP(P, R, prob='max', gamma=1, epsilon=1e-8)

Finally, we must decide how to split the paths in each dataset, to evaluate the algorithm's performance as we feed it incrementally more data. Make a list of Ns (number of paths) we want to plot, and be sure the last maximum value is equal to the parameter N (total paths) above. 

In [None]:
Ns = [10, 20, 30, 40, 50, 70, 90, 110, 130, 150, 170, 200]

### (Optional) Creating New Datasets:

You can choose to create and save new datasets to be trained here: simply link the folder you save the data in to the `path` section of the Training. Otherwise, you can also train the model using pre-made datasets provided.

In [None]:
# choose number of sets to create
sets = 10

for n in range(sets):
    df = createSamples(N, T_max, maze, r, reseed=True)
    df.to_csv(f'set_{n}.csv')

### Model Training

Now, we will fit models to the datasets! Be sure the datasets used are named as 'set_#.csv', starting with 'set_0.csv' onwards, and that they follow the same column format as the samples provided. Input the total number of sets you would like to train below, and whether you would like to save the trained models below. 

If you already have trained models, skip this step and simply load the models into the notebook. 

In [None]:
total_sets = 10
save_model = False

In [None]:
#path = '/Users/janiceyang/Dropbox (MIT)/ORC UROP/Opioids_Dropbox/Maze/Model Data/Datasets/Set 1 (N=200, T_max = 25, randomness=0.4)'
path = '/Users/janiceyang/Dropbox (MIT)/ORC UROP/Opioids_Dropbox/Maze/Model Data/Datasets/Set 2 (risk = -0.04)'
sys.path.append(path)

all_models = []
for set_num in range(total_sets):
    filename = f'set_{set_num}.csv'
    df = pd.read_csv(path+'/'+filename)

    # taking out extra ID col and changing actions back to integers
    df = df.iloc[:, 1:]
    df.loc[df['ACTION']=='None', 'ACTION'] = 4
    df['ACTION'] = pd.to_numeric(df['ACTION'], downcast='integer')
    df.loc[df['ACTION']==4, 'ACTION'] = 'None'
    
    df_full = df.copy()
    
    models=[]
    for n in Ns:
        df_small = df_full.loc[df_full['ID']<n]

        m = MDP_model()
        m.fit(df_small, # df: dataframe in the format ['ID', 'TIME', ...features..., 'RISK', 'ACTION']
            pfeatures, # int: number of features
            h, # int: time horizon (# of actions we want to optimize)
            gamma, # discount factor
            max_k, # int: number of iterations
            distance_threshold, # clustering diameter for Agglomerative clustering
            cv, # number for cross validation
            th, # splitting threshold
            eta, # incoherence threshold
            precision_thresh, # precision threshold
            classification, # classification method
            split_classifier_params, # classification params
            clustering,# clustering method from Agglomerative, KMeans, and Birch
            n_clusters, # number of clusters for KMeans
            random_state, # random seed
            plot=False,
            optimize=True,
            verbose=False)
        print('N=', n, ' completed')
        models.append(m)
        
        if save_model:
            pickle.dump(m, open(f'round_{set_num}_model_N={n}.sav', 'wb'))


### Load Models (Optional)

Load your saved models into a list of lists, where `all_models` includes lists of models sorted by dataset and in ascending order of datasize `Ns` used. Sample code below:

In [None]:
all_models = []

for i in range(total_sets): 
    models = []
    for n in Ns: 
        m = pickle.load(open(f'round_{i}_model_N={n}.sav', 'rb'))
        models.append(m)
    all_models.append(models)

## Optimality Gap

The Optimality Gap represents the difference between the value, `v_alg`, found by simulating a player starting from the starting cell and taking `t_max` number of actions as prescribed by the trained model, compared with the true value `v_opt` of the maze, by taking `t_max` number of optimal actions. `v_alg` is calculated by randomly generating `K` points in the starting cell, simulating these `t_max` steps in the maze (summing the value of each step). The average of `|v_alg-v_opt*|` across `K` trials is returned. 

In [None]:
K = 100
save_opt_gap = False

In [None]:
gap = np.zeros(len(Ns))

for i, model_set in enumerate(all_models):
    opt_gap = value_diff(model_set, Ns, K, T_max, P, R, f, rw, true_v, true_pi)
    gap += opt_gap
    
    if save_opt_gap: 
        pickle.dump(opt_gap, open(f'round_{i}_opt_gap.sav', 'wb'))
        
avg_opt_gap = gap/len(all_models)
opt_gap_norm = avg_opt_gap/0.44 #0.44 is v_opt

# plot
fig1, ax1 = plt.subplots()
ax1.plot(Ns, opt_gap_norm, 'bo-')
ax1.set_title('Optimality Gap')
ax1.set_xlabel('Number of trajectories N')
ax1.set_ylabel('MAPE')
ax1.set_ylim(0, 3.5)

## Value Estimate

The Value Estimate represents the difference between the value of a random starting point (in the first cell) according to the optimal MDP clustering, compared to the value of the point clustered into the model's MDP. This difference is truncated at a maximum of 1 -- `min(1, difference)` is taken -- and `K` trials are run and averaged per model.

In [None]:
K = 100
save_val_est = False

In [None]:
gap = np.zeros(len(Ns))

for i, model_set in enumerate(all_models):
    val_est = value_est(model_set, Ns, K, P, R, f, rw, true_v, true_pi)
    gap += val_est
    
    if save_val_est: 
        pickle.dump(val_est, open(f'round_{i}_val_est.sav', 'wb'))
        
avg_est = gap/len(all_models)
avg_est_norm = avg_est/0.44 #0.44 is v_opt
    

fig1, ax1 = plt.subplots()
ax1.plot(Ns, avg_est_norm, 'bo-')
ax1.set_title('Value Estimation')
ax1.set_xlabel('Number of trajectories N')
ax1.set_ylabel('MAPE')
ax1.set_ylim(0, 3.5)


## Optimal Action Value Difference

The Optimal Action Value Difference calculates difference in cumulative value between a point randomly generated in the starting cell and taking the true optimal set of actions for `T_max` steps through the real environment, compared to the same point taking the same sequence of actions within the trained model's MDP environment. `K` randomly generated points are used, with the average of the difference (truncated at maximum 1) returned.

In [None]:
K = 100
save_opt_act_gap = False

In [None]:
gap = np.zeros(len(Ns))

for i, model_set in enumerate(all_models):
    opt_act_gap = opt_path_value_diff(model_set, Ns, K, T_max, P, R, f, rw, true_v, true_pi)
    gap += opt_act_gap
    
    if save_opt_act_gap: 
        pickle.dump(opt_act_gap, open(f'round_{i}_opt_acdt_gap.sav', 'wb'))
    
opt_gap = gap/len(all_models)
opt_gap_norm = opt_gap/0.44
fig1, ax1 = plt.subplots()
ax1.plot(Ns, opt_gap_norm, 'bo-')
ax1.set_title('Value Estimation')
ax1.set_xlabel('Number of trajectories N')
ax1.set_ylabel('MAPE')
ax1.set_ylim(0, 3.5)


## Averages

Finally, we can calculate the classification accuracies of the model when measured on the seen training data, as well as a randomly generated testing dataset. The classification accuracy is defined as the percentage of points that the model correctly clusters, based on known original grids of the maze. The classification error, in turn, is defined as the percentage of points that are incorrectly clustered.

In [None]:
save_averages = False

In [None]:
tr_acc = np.zeros(len(Ns))
test_acc = np.zeros(len(Ns))

for i, model_set in enumerate(all_models):
    df_test = createSamples(N, T_max, maze, r, reseed=True)
    training_acc, testing_acc = generalization_accuracy(model_set, df_test, Ns)
    
    tr_acc += training_acc
    test_acc += testing_acc
    
    if save_averages:
        pickle.dump(training_acc, open(f'round_{i}_training_acc.sav', 'wb'))
        pickle.dump(testing_acc, open(f'round_{i}_testing_acc.sav', 'wb'))

train_acc = tr_acc/len(all_models)
testing_acc = test_acc/len(all_models)
fig1, ax1 = plt.subplots()
ax1.plot(Ns, 1-np.array(testing_acc), 'bo--', label='Out-of-Sample')
ax1.set_title('Classification Error')
ax1.set_xlabel('Number of trajectories N')
ax1.set_ylabel('% Error')
ax1.legend()
ax1.set_ylim(0, 1)

fig4, ax4 = plt.subplots()
ax4.plot(Ns, 1-np.array(testing_acc), 'bo-', label='Out-of-Sample')
ax4.plot(Ns, 1-np.array(train_acc), 'mo--', label='In-Sample')
ax4.set_title('Classification Error')
ax4.set_xlabel('Number of trajectories N')
ax4.set_ylabel('% Error')
ax4.legend()
ax4.set_ylim(0, 1)

fig2, ax2 = plt.subplots()
ax2.plot(Ns, train_acc, 'mo-', label='Training Accuracy')
ax2.plot(Ns, np.array(testing_acc), 'bo--', label='Testing Accuracy')
ax2.set_title('Classification Accuracy')
ax2.set_xlabel('Number of trajectories N')
ax2.set_ylabel('% Accuracy')
ax2.legend()
ax2.set_ylim(0, 1)

fig2, ax3 = plt.subplots()
ax3.plot(Ns, train_acc, 'bo-', label='Training Accuracy')
ax3.plot(Ns, np.array(testing_acc), 'bo--', label='Testing Accuracy')
ax3.set_title('Classification Accuracy')
ax3.set_xlabel('Number of trajectories N')
ax3.set_ylabel('% Accuracy')
ax3.legend()
ax3.set_ylim(0, 1)
