# Deep reinforcement learning for the optimization of traffic light control with real-time data: 

## Hyperparameter gridsearch


### Instructions for running

Below you can find the description of the parameters you can use for the simulation class or wrapper for training a DDQN in a preset SUMO environment.

Also, you can find a self-explanatory example of how to run a gridsearch for algorithm hyperparams.

### Parameters

#### DDQN parameters

**q_network** : (str) keras model instance to predict q-values for current state ('simple' or 'linear')

**gamma** : (int) discount factor for rewards

**target_update_freq** : (int) defines after how many steps the q-network should be re-trained

**train_freq**: (int) How often you actually update your Q-Network. Sometimes stability is improved
    if you collect a couple samples for your replay memory, for every Q-network update that you run.

**num_burn_in** : (int) defines the size of the replay memory to be filled before, using a specified policy

**batch_size** : (int) size of batches to be used to train models

**optimizer** : (str) keras optimizer identifier ('adam')

**max_ep_len** : (int) stops simulation after specified number of episodes

**experiment_id** : (str) ID of simulation

**model_checkpoint** : (bool) store keras model checkpoints during training

**policy** : (str) policy to choose actions ('epsGredy', 'linDecEpsGreedy', 'greedy' 'randUni')

**eps** : (float) exploration factor
    if policy = 'linDecEpsGreedy' -> The epsilon will decay from 1 to eps
    if policy = 'epsGredy' -> eps to evaluate eps policy
    


#### Environment parameters

**network** : (str) network complexity ('simple' or 'complex')

**demand**: (str) demand scenario ('rush' or 'nominal')

**use_gui** : (bool) wether to use user interface

**delta_time** : (int) simulation time between actions

**reward** : type of reward. ('balanced' or 'negative')

#### Memory buffer parameters

 
**max_size** : (int) memory capacity required

#### Additional parameters


**num_episodes** : (int) number of episodes to train the algorithm. THis can also be changed in train method.

**eval_fixed** = (bool) Evaluate fixed policy during training. Used for plotting

**monitoring** : (bool) store episode logs in tensorboard

**episode_recording** : (bool) store intra episode logs in tensorboard

**seed** = (int)

In [None]:
# IMPORTS
##########################

# %load_ext autoreload
# %autoreload 2


import simulation
import plotting
import tools
import glob
import pandas as pd
import os
import json

# import tensorflow as tf
# tf.logging.set_verbosity(tf.logging.ERROR)

import numpy as np
import matplotlib.pyplot as plt

plt.rcParams['figure.figsize'] = (12,12)

#### Instantiate

In [None]:
param = {
    
    "experiment_id" : ["Test_gridsearch"],
    "batch_size" : [30,50],
    "target_update_freq" : [5000,10000],
    "gamma" : [0.99],
    "train_freq" : [1],
    "max_size" : [10000],
    "max_ep_length" : [1000],
    "policy" : ["epsGreedy","linDecEpsGreedy"],
    "eps" : [0.1],
    "delta_time" : [10],
    "reward" : ["balanced"],
    "network" : ["simple"],
    "num_episodes" : [50],
}

log_path = "./logs/"+ param["experiment_id"][0]

#### Run gridsearch

In [None]:
param_grid = tools.iter_params(**param)
tools.gridsearch(list(param_grid), log_path)

#### Monitoring progress in tensorboard

In [None]:
#In your terminal 
#tensorboard --logdir='./Scripts/logs' #Change relative path if needed

#### Selecting best performer

In [None]:
# Get max evaluate results
res = tools.get_grid_search_results(log_path)
res.sort_values("RL_mean_delay")

#### Loading last checkpoint of selected performer 

In [None]:
run = 1 # selected run
sumo_RL = tools.load_last_model_checkpoint(log_path, run)

#### Evaluate using user interface

In [None]:
sumo_RL.evaluate(runs=1, use_gui=True)