# Deep reinforcement learning for the optimization of traffic light control with real-time data: 

## Single run


### Instructions for running

Below you can find the description of the parameters you can use for the simulation class or wrapper for training a DDQN in a preset SUMO environment.

Also, you can find a self-explanatory example of how to run a single scenario training 

### Parameters

#### DDQN parameters

**q_network** : (str) keras model instance to predict q-values for current state ('simple' or 'linear')

**gamma** : (int) discount factor for rewards

**target_update_freq** : (int) defines after how many steps the q-network should be re-trained

**train_freq**: (int) How often you actually update your Q-Network. Sometimes stability is improved
    if you collect a couple samples for your replay memory, for every Q-network update that you run.

**num_burn_in** : (int) defines the size of the replay memory to be filled before, using a specified policy

**batch_size** : (int) size of batches to be used to train models

**optimizer** : (str) keras optimizer identifier ('adam')

**max_ep_len** : (int) stops simulation after specified number of episodes

**experiment_id** : (str) ID of simulation

**model_checkpoint** : (bool) store keras model checkpoints during training

**policy** : (str) policy to choose actions ('epsGredy', 'linDecEpsGreedy', 'greedy' 'randUni')

**eps** : (float) exploration factor
    if policy = 'linDecEpsGreedy' -> The epsilon will decay from 1 to eps
    if policy = 'epsGredy' -> eps to evaluate eps policy
    


#### Environment parameters

**network** : (str) network complexity ('simple' or 'complex')

**demand**: (str) demand scenario ('rush' or 'nominal')

**use_gui** : (bool) wether to use user interface

**delta_time** : (int) simulation time between actions

**reward** : type of reward. ('balanced' or 'negative')

#### Memory buffer parameters

 
**max_size** : (int) memory capacity required

#### Additional parameters


**num_episodes** : (int) number of episodes to train the algorithm. THis can also be changed in train method.

**eval_fixed** = (bool) Evaluate fixed policy during training. Used for plotting

**monitoring** : (bool) store episode logs in tensorboard

**episode_recording** : (bool) store intra episode logs in tensorboard

**seed** = (int)

In [1]:
# IMPORTS
##########################

# %load_ext autoreload
# %autoreload 2


import simulation
import plotting
import tools
import glob
import multiprocessing
import pandas as pd
import os
import json

# import tensorflow as tf
# tf.logging.set_verbosity(tf.logging.ERROR)

import numpy as np
import matplotlib.pyplot as plt

plt.rcParams['figure.figsize'] = (12,12)

Using TensorFlow backend.


#### Instantiate

In [2]:
param = { 
    "experiment_id" : "Test",
    
    "q_network_type" : 'simple', #"linear"
    "gamma" : 0.99,
    "target_update_freq" : 5000,
    "train_freq" : 1,
    "num_burn_in" : 200,
    "batch_size" : 30,
    "optimizer" : 'adam', 
    
    "max_ep_length" : 1000,
    "policy" : "linDecEpsGreedy", #"epsGreedy"    
    "eps" : 0.1,
    
    "network" : "complex", #simple
    "demand" : "rush", #nominal
    
    "use_gui" : False,
    "delta_time" : 10,
    
    "reward" : "balanced", #negative
    
    "max_size" : 100000,
    "num_episodes" : 50,
    
    "eval_fixed" : False,
    "episode_recording" : False,
    "model_checkpoint" : True,
 
}


sumo_RL = simulation.simulator(**param)
log_path = "./logs/"+ param["experiment_id"]

Instructions for updating:
Colocations handled automatically by placer.


#### Train

In [None]:
res = sumo_RL.train()
pd.DataFrame(res)

Filling experience replay memory...
...done filling replay memory
Run lonely_worker -- running episode 1 / 50
Instructions for updating:
Use tf.cast instead.
Run lonely_worker -- running episode 2 / 50
Run lonely_worker -- running episode 3 / 50
Run lonely_worker -- running episode 4 / 50
Run lonely_worker -- running episode 5 / 50
Run lonely_worker -- running episode 6 / 50
Run lonely_worker -- running episode 7 / 50
Run lonely_worker -- running episode 8 / 50
Run lonely_worker -- running episode 9 / 50
Run lonely_worker -- running episode 10 / 50


#### Monitoring training

In [None]:
#In your terminal 
#tensorboard --logdir='./Scripts/logs' #Change relative path if needed

#### Evaluate

In [None]:
_ = sumo_RL.evaluate(runs=3, use_gui=False)

#### Plot results

In [None]:
plotting.plot_evaluation(log_path)