## Training notebook for benchmarking the partial exploration variable
In this notebook we are using PyDial to train and test DeepQ Networks with a partial and progressive exploration variable (defined in three different environments: 

- Cambridge Restaurants 
- San Francisco Restaurants
- Laptops 

Hyperparameters are defined in the Independent Research Project in the Methodology section. They are also listed in the configuration file of each environment in configuration_files. The configuration files are as follows: 

#### Handcrafted Policies (4000 dialogs in total, batches of 1000 for training, test on 500 dialogs) 
- env1-hdc-CR.cfg    / Handcrafted policy configuration file for Cambridge Restaurants Domain
- env1-hdc-SFR.cfg  / Handcrafted policy configuration file for San Francisco Restaurants Domain
- env1-hdc-LAP.cfg  / Handcrafted policy configuration file for Laptop Domain 

#### DQN with partial exploration variable (4000 dialogs in total, batches of 1000 for training, test on 500 dialogs) 

- env-exploration-partial-CR.cfg / DQN with partial exploration variable enabled on top of DQNPolicy.py for Cambridge Restaurants Domain
- env-exploration-partial-sfr.cfg / DQN with partial exploration variable enabled on top of DQNPolicy.py for San Francisco Restaurants Domain
- env-exploration-partial-lap.cfg / DQN with partial exploration variable enabled on top of DQNPolicy.py for Laptops Domain

#### DQN with partial exploration variable (4000 dialogs in total, batches of 100 for training, test on 30 dialogs) 

- env-exploration-partial-CR-100.cfg / DQN with partial exploration variable enabled on top of DQNPolicy.py for Cambridge Restaurants Domain
- env-exploration-partial-sfr-100.cfg / DQN with partial exploration variable enabled on top of DQNPolicy.py for San Francisco Restaurants Domain
- env-exploration-partial-lap-100.cfg / DQN with partial exploration variable enabled on top of DQNPolicy.py for Laptops Domain

----------------------------------------------------------------------------------------------------------------------

#### Preparing all necessary imports for this notebook

In [1]:
import os
from exploration_utils.exploration_utilities import ProgressionExploration, RiskIndexCalculator
import pandas as pd

In [2]:
#Finding the path of the notebook. The PyDial Repository should be in the same directory. 
absolute_path = os.path.abspath('')
working_path = os.path.dirname(absolute_path)
print(f"Notebook location: {working_path}")

Notebook location: /Users/crodrigues/Desktop/Submission_Masters


Setting this variable to ensure that GPU machines can take advantage of it  

In [3]:
os.environ["CUDA_VISIBLE_DEVICES"] = ""

In [4]:
# Change directory to the pydial benchmark folder to run the benchmarks 
%cd pydial3-public

/Users/crodrigues/Desktop/Submission_Masters/dqn_exploration_methods/pydial3-public


# DQN with exploration variable - Training

### [DISCLAIMER] - the training process is long (it will take several hours on a CPU-Based machine). For convenience, the results of the runs are provided in results_logs/ folder. These logs contain all necessary information for plotting the results. 

#### Environment - Cambridge Restaurants with exploration variable - 4000/1000/500 - 9 seeds - DQN

In [None]:
%run pydial train ../configuration_files/env-exploration-partial-CR.cfg --seed=\(0,9\)         

*** Seed 0 ***
Policy dir ../policies_directory/ does not exist, creating it
Log dir ../logs_directory/ does not exist, creating it
*** logfile: env-exploration-partial-CR-seed0-00.1-4.train.log ***
[95mRESULTS[0m:: 00:09:29: root                                   pydial.py <train_command>853 :  List of domains: CamRestaurants[1m[0m
*** Training Iteration env-exploration-partial-CR-seed0-00.0->env-exploration-partial-CR-seed0-00.1: iter=0, error-rate=0, num-dialogs=1000 ***
[95mRESULTS[0m:: 00:09:29: root                                      pydial.py <trainBatch>445 :  *** Training Iteration env-exploration-partial-CR-seed0-00.0->env-exploration-partial-CR-seed0-00.1: iter=0, error-rate=0, num-dialogs=1000 ***[1m[0m


2023-03-01 00:09:30.937760: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-01 00:09:39.053006: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-01 00:09:39.055885: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-03-01 00:09:39.056399: I tensorflow/core/common_runtime/pluggable_d

Metal device set to: Apple M1 Pro
nothing loaded in first iteration
load from:  ../policies_directory/env-exploration-partial-CR-seed0-00.0
loaded replay size:  0


2023-03-01 00:09:39.597477: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-03-01 00:09:39.597510: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2023-03-01 00:09:39.610559: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.


no actions in store yet - first step
Saving deepq-network...
Saving deepq-network...
Saving deepq-network...


#### Environment - San Francisco Restaurants with exploration variable - 4000/1000/500 - 9 seeds - DQN

In [None]:
%run pydial train ../configuration_files/env-exploration-partial-sfr.cfg --seed=\(0,9\)         

#### Environment - Laptops with exploration variable - 4000/1000/500 - 9 seeds - DQN

In [None]:
%run pydial train ../configuration_files/env-exploration-partial-lap.cfg --seed=\(0,9\)         

#### Results table for the three environments above on pre-trained logs/policies ( result_logs/) 

In [None]:
%run pydial plot --noplot ../result_logs/env-exploration-partial-CR-seed*1-4.train.log
%run pydial plot --noplot ../result_logs/env-exploration-partial-sfr-seed*1-4.train.log
%run pydial plot --noplot ../result_logs/env-exploration-partial-lap-seed*1-4.train.log

#### Results plots for the three environments above on pre-trained logs/policies ( result_logs/)

In [None]:
%run pydial plot ../result_logs/env-exploration-partial-CR-seed*1-4.train.log
%run pydial plot ../result_logs/env-exploration-partial-sfr-seed*1-4.train.log
%run pydial plot ../result_logs/env-exploration-partial-lap-seed*1-4.train.log

#### Results table and plots for the three environments on freshly trained policies (Disclaimer: models training has to fully complete before fresh results can be plotted) 
For convenience, environments can be commented out and plotted one at a time 

In [None]:
%run pydial plot ../logs_directory/env-exploration-partial-CR-seed*1-4.train.log
%run pydial plot ../logs_directory/env-exploration-partial-sfr-seed*1-4.train.log
%run pydial plot ../logs_directory/env-exploration-partial-lap-seed*1-4.train.log
%run pydial plot --noplot ../logs_directory/env-exploration-partial-CR-seed*1-4.train.log
%run pydial plot --noplot ../logs_directory/env-exploration-partial-sfr-seed*1-4.train.log
%run pydial plot --noplot ../logs_directory/env-exploration-partial-lap-seed*1-4.train.log


# Handcrafted Policies 

Handcrafted policies are provided here for convenience and comparison. The training is skipped and the existing results are used for plotting. If necessary, the handcrafted policies can be tested using the following commands: 
- pydial train ../configuration_files/env1-hdc-CR.cfg --seed=\(0,9\)
- pydial train ../configuration_files/env1-sfr-CR.cfg --seed=\(0,9\)
- pydial train ../configuration_files/env1-lap-CR.cfg --seed=\(0,9\)


#### Results and plots - Environment - Cambridge Restaurants with exploration variable - 4000/1000/500 - 9 seeds - Handcrafted

In [None]:
%run pydial plot ../result_logs/env1-hdc-CR-seed*.1-4.train.log
%run pydial plot --noplot ../result_logs/env1-hdc-CR-seed*.1-4.train.log


#### Results and plots - Environment - San Francisco Restaurants with exploration variable - 4000/1000/500 - 9 seeds - Handcrafted

In [None]:
%run pydial plot ../result_logs/env1-hdc-SFR-seed*.1-4.train.log
%run pydial plot --noplot ../result_logs/env1-hdc-SFR-seed*.1-4.train.log


#### Results and plots - Environment - Laptops with exploration variable - 4000/1000/500 - 9 seeds - Handcrafted

In [None]:
%run pydial plot ../result_logs/env1-hdc-LAP-seed*.1-4.train.log
%run pydial plot --noplot ../result_logs/env1-hdc-LAP-seed*.1-4.train.log


# DQN with partial exploration variable (4000 dialogs in total, batches of 100 for training, test on 30 dialogs) 

To enable the analysis of safety and efficiency on shorter training iterations, we are also presenting the training of DQN with the partial exploration variable on batches of 100 dialogs (with testing on 30). 


### [DISCLAIMER] - the training process is long (it will take several hours on a CPU-Based machine). For convenience, the results of the runs are provided in results_logs/ folder. These logs contain all necessary information for plotting the results. 

#### Environment - Cambridge Restaurants with exploration variable - 4000/1000/500 - 9 seeds - DQN

In [None]:
%run pydial train ../configuration_files/env-exploration-partial-CR.cfg --seed=\(0,9\)         

#### Environment - San Francisco Restaurants with exploration variable - 4000/1000/500 - 9 seeds - DQN

In [None]:
%run pydial train ../configuration_files/env-exploration-partial-sfr.cfg --seed=\(0,9\)         

#### Environment - Laptops with exploration variable - 4000/1000/500 - 9 seeds - DQN

In [None]:
%run pydial train ../configuration_files/env-exploration-partial-lap.cfg --seed=\(0,9\)         

#### Results table for the three environments above on pre-trained logs/policies ( result_logs/) 

In [None]:
%run pydial plot --noplot ../result_logs/env-exploration-partial-CR-seed*1-4.train.log > ../results_logs/table-exploration-partial-CR.tsv
%run pydial plot --noplot ../result_logs/env-exploration-partial-sfr-seed*1-4.train.log > ../results_logs/table-exploration-partial-sfr.tsv
%run pydial plot --noplot ../result_logs/env-exploration-partial-lap-seed*1-4.train.log > ../results_logs/table-exploration-partial-lap.tsv

#### Results plots for the three environments above on pre-trained logs/policies ( result_logs/)

In [None]:
%run pydial plot ../result_logs/env-exploration-partial-CR-seed*1-4.train.log
%run pydial plot ../result_logs/env-exploration-partial-sfr-seed*1-4.train.log
%run pydial plot ../result_logs/env-exploration-partial-lap-seed*1-4.train.log

#### Results table and plots for the three environments on freshly trained policies (Disclaimer: models training has to fully complete before fresh results can be plotted) 
For convenience, environments can be commented out and plotted one at a time 

In [None]:
%run pydial plot ../logs_directory/env-exploration-partial-CR-seed*1-4.train.log
%run pydial plot ../logs_directory/env-exploration-partial-sfr-seed*1-4.train.log
%run pydial plot ../logs_directory/env-exploration-partial-lap-seed*1-4.train.log
%run pydial plot --noplot ../logs_directory/env-exploration-partial-CR-seed*1-4.train.log
%run pydial plot --noplot ../logs_directory/env-exploration-partial-sfr-seed*1-4.train.log
%run pydial plot --noplot ../logs_directory/env-exploration-partial-lap-seed*1-4.train.log


# Handcrafted Policies 

Handcrafted policies are provided here for convenience and comparison. The training is skipped and the existing results are used for plotting. If necessary, the handcrafted policies can be tested using the following commands: 
- pydial train ../configuration_files/env1-hdc-CR.cfg --seed=\(0,9\)
- pydial train ../configuration_files/env1-sfr-CR.cfg --seed=\(0,9\)
- pydial train ../configuration_files/env1-lap-CR.cfg --seed=\(0,9\)


#### Results and plots - Environment - Cambridge Restaurants with exploration variable - 4000/1000/500 - 9 seeds - Handcrafted

In [None]:
%run pydial plot ../result_logs/env1-hdc-CR-seed*.1-4.train.log
%run pydial plot --noplot ../result_logs/env1-hdc-CR-seed*.1-4.train.log


#### Results and plots - Environment - San Francisco Restaurants with exploration variable - 4000/1000/500 - 9 seeds - Handcrafted

In [None]:
%run pydial plot ../result_logs/env1-hdc-SFR-seed*.1-4.train.log
%run pydial plot --noplot ../result_logs/env1-hdc-SFR-seed*.1-4.train.log


#### Results and plots - Environment - Laptops with exploration variable - 4000/1000/500 - 9 seeds - Handcrafted

In [None]:
%run pydial plot ../result_logs/env1-hdc-LAP-seed*.1-4.train.log
%run pydial plot --noplot ../result_logs/env1-hdc-LAP-seed*.1-4.train.log


# DQN with partial exploration variable (4000 dialogs in total, batches of 100 for training, test on 30 dialogs) 

To enable the analysis of safety and efficiency on shorter training iterations, we are also presenting the training of DQN with the partial exploration variable on batches of 100 dialogs (with testing on 30). 


#### Environment - Cambridge Restaurants with exploration variable - 4000/100/30  - 2 seeds, 40 iterations - DQN

In [None]:
%run pydial train ../configuration_files/env-exploration-partial-CR-100.cfg --seed=\(0,2\)         

#### Environment - San Francisco Restaurants with exploration variable - 4000/100/30  - 2 seeds, 40 iterations - DQN

In [None]:
%run pydial train ../configuration_files/env-exploration-partial-CR-100.cfg --seed=\(0,2\)         

#### Environment - Laptops with exploration variable - 4000/100/30  - 2 seeds, 40 iterations - DQN

In [None]:
%run pydial train ../configuration_files/env-exploration-partial-CR-100.cfg --seed=\(0,2\)         

In [None]:
%run pydial plot ../result_logs/env-exploration-partial-lap-100-seed*.train.log
%run pydial plot ../result_logs/env-exploration-partial-CR-100-seed*.train.log
%run pydial plot ../result_logs/env-exploration-partial-sfr-100-seed*.train.log


# Calculate Risk Index 

In [None]:
usecols = ["NumDialogs", "Reward", "Success", "Turns"]
df_cr = pd.read_csv('../result_logs/table-exploration-partial-CR.csv', sep=';', usecols=usecols)
df_sfr = pd.read_csv('../result_logs/table-exploration-partial-sfr.csv', sep=';', usecols=usecols)
df_lap = pd.read_csv('../result_logs/table-exploration-partial-lap.csv', sep=';', usecols=usecols)

risk_calculator_cr = RiskIndexCalculator(df_cr, threshold=65)
risk_calculator_sfr = RiskIndexCalculator(df_sfr, threshold=65)
risk_calculator_lap = RiskIndexCalculator(df_lap, threshold=65)

print(f"Risk index in Cambridge Restaurants: {risk_calculator_cr.calculate_risk_index()}")
print(f"Risk index in San Francisco Restaurants: {risk_calculator_sfr.calculate_risk_index()}")
print(f"Risk index in Laptops: {risk_calculator_lap.calculate_risk_index()}")
