# OR Suite Demonstration

Reinforcement learning (RL) is a natural model for problems involving real-time sequential decision making. In these models, a principal interacts with a system having stochastic transitions and rewards and aims to control the system online (by exploring available actions using real-time feedback) or offline (by exploiting known properties of the system).

These project revolves around providing a unified landscape on scaling reinforcement learning algorithms to operations research domains.

In this notebook we walk through generating plots, and applying the problem to the `ambulance problem` on the line $[0,1]$.

### Step 1: Import Required Packages

The main package for ORSuite is contained in `or_suite`.  However, some additional packages may be required for specific environments / algorithms.  Here, we include `stable baselines`, a package containing implementation for state of the art deep RL algorithms, and `matploblib` for the plotting.

In [1]:
import or_suite
import gym
import matplotlib.pyplot as plt
from stable_baselines3.common.monitor import Monitor
from stable_baselines3 import PPO
from stable_baselines3.ppo import MlpPolicy
import numpy as np

### Step 2: Pick problem parameters for the environment

Here we use the ambulance metric environment as outlined in `or_suite/envs/ambulance/ambulance_metric.py`.  The package has default specifications for all of the environments in the file `or_suite/envs/env_configs.py`, and so we use one the default for the ambulance problem in a metric space.

In addition, we need to specify the number of episodes for learning, and the number of iterations (in order to plot average results with confidence intervals).

In [2]:
DEFAULT_CONFIG = or_suite.envs.env_configs.ambulance_metric_default_config
epLen = DEFAULT_CONFIG['epLen']
nEps = 200
numIters = 5

### Step 3: Pick simulation parameters

Next we need to specify parameters for the simulation.  This includes setting a seed, the frequency to record the metrics, directory path for saving the data files, a deBug mode which prints the trajectory, etc.

In [3]:
DEFAULT_SETTINGS = {'seed': 1, 
                    'recFreq': 1, 
                    'dirPath': '../data/ambulance/', 
                    'deBug': False, 
                    'nEps': nEps, 
                    'numIters': numIters, 
                    'saveTrajectory': True, 
                    'epLen' : 5}

ambulance_env = gym.make('Ambulance-v0', config=DEFAULT_CONFIG)
mon_env = Monitor(ambulance_env)

### Step 4: Pick list of algorithms

We have several heuristics implemented for each of the environments defined, in addition to a `random` policy, and some `RL discretization based` algorithms.  Here we pick a couple of the heuristics, and a PPO algorithm implemented from `stable baselines` just to test.

In [4]:
agents = {'SB PPO': PPO(MlpPolicy, mon_env, gamma=1, verbose=0, n_steps=epLen),
          'Random': or_suite.agents.rl.random.randomAgent(),
          'Stable': or_suite.agents.ambulance.stable.stableAgent(DEFAULT_CONFIG['epLen']),
          'Median': or_suite.agents.ambulance.median.medianAgent(DEFAULT_CONFIG['epLen'])
          }

We recommend using a `batch_size` that is a multiple of `n_steps * n_envs`.
Info: (n_steps=5 and n_envs=1)


### Step 5: Run simulations

In [5]:
for agent in agents:
    print(agent)
    DEFAULT_SETTINGS['dirPath'] = '../data/ambulance_metric_test_'+str(agent)+'/'
    if agent == 'SB PPO':
        or_suite.utils.run_single_sb_algo(mon_env, agents[agent], DEFAULT_SETTINGS)
    else:
        or_suite.utils.run_single_algo(ambulance_env, agents[agent], DEFAULT_SETTINGS)

SB PPO
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Saving data
**************************************************
     episode  iteration  epReward      time    memory
0        0.0        0.0 -3.566262 -1.704257  529390.0
1        1.0        0.0 -2.456220 -2.804247  529390.0
2        2.0        0.0 -1.423682 -3.522476  529390.0
3        3.0        0.0 -2.739193 -3.380563  529390.0
4        4.0        0.0 -2.986239 -3.323382  529390.0
..       ...        ...       ...       ...       ...
995    195.0        4.0 -1.339041 -3.505709   53762.0
996    196.0        4.0 -1.726129 -3.337362   53762.0
997    197.0        4.0 -1.750353 -3.351554   53762.0
998    198.0        4.0 -0.825988 -3.505701   53762.0
999    199.0        4.0 -1.874094 -3.55697

  self.data[index, 4] = np.log(((end_time) - (start_time)))


**************************************************
Experiment complete
**************************************************
**************************************************
Saving data
**************************************************
[[-1.00000000e+00  0.00000000e+00 -8.51879284e-01  3.58400000e+03
  -7.60002165e+00]
 [ 0.00000000e+00  0.00000000e+00 -1.27596690e+00  2.42000000e+03
             -inf]
 [ 1.00000000e+00  0.00000000e+00 -8.48246947e-01  2.16400000e+03
  -7.60192914e+00]
 ...
 [ 1.96000000e+02  4.00000000e+00 -7.72291988e-01  3.76400000e+03
  -7.60049818e+00]
 [ 1.97000000e+02  4.00000000e+00 -9.89294961e-01  1.27240000e+04
             -inf]
 [ 1.98000000e+02  4.00000000e+00 -7.74680257e-01  3.76400000e+03
  -7.60192914e+00]]
Writing to file data.csv
**************************************************
Data save complete
**************************************************
Median
**************************************************
Running experiment
*************************

### Step 6: Generate figures

In [6]:
path_list_line = []
path_list_radar = []
algo_list_line = []
algo_list_radar = []

for agent in agents:
    print(str(agent))
    path_list_line.append('../data/ambulance_metric_test_'+str(agent)+'/data.csv')
    algo_list_line.append(str(agent))
    if agent != 'SB PPO':    
        path_list_radar.append('../data/ambulance_metric_test_'+str(agent)+'/')
        algo_list_radar.append(str(agent))

    

fig_path = '../figures/'
fig_name = 'test_ambulance_metric.pdf'

or_suite.plots.plot_line_plots(path_list_line, algo_list_line, fig_path, fig_name, int(nEps / 40) + 1)

additional_metric = {'MRT': lambda traj : or_suite.utils.mean_response_time(traj, lambda x, y : np.abs(x-y))}


or_suite.plots.plot_radar_plots(path_list_radar, algo_list_radar, fig_path, fig_name, additional_metric)

SB PPO
Random
Stable
Median


RuntimeError: latex was not able to process the following string:
b'lp'

Here is the full report generated by latex:
This is pdfTeX, Version 3.14159265-2.6-1.40.21 (MiKTeX 2.9.7300 64-bit)
entering extended mode
(C:/Users/seanr/.matplotlib/tex.cache/c99befe9a1c97fca3d27815aae7b178b.tex
LaTeX2e <2020-02-02> patch level 5
L3 programming layer <2020-03-06>
("C:\Program Files\MiKTeX 2.9\tex/latex/base\article.cls"
Document Class: article 2019/12/20 v1.4l Standard LaTeX document class
("C:\Program Files\MiKTeX 2.9\tex/latex/base\size10.clo"))
======================================================================

Unfortunately, the package type1cm could not be installed.
Please check the log file:
C:\Users\seanr\AppData\Local\MiKTeX\2.9\miktex\log\latex.log
======================================================================


! LaTeX Error: File `type1cm.sty' not found.

Type X to quit or <RETURN> to proceed,
or enter new name. (Default extension: sty)

Enter file name: 
! Emergency stop.
<read *> 
         
l.5 \usepackage
               {type1ec}
No pages of output.
Transcript written on c99befe9a1c97fca3d27815aae7b178b.log.




RuntimeError: latex was not able to process the following string:
b'lp'

Here is the full report generated by latex:
This is pdfTeX, Version 3.14159265-2.6-1.40.21 (MiKTeX 2.9.7300 64-bit)
entering extended mode
(C:/Users/seanr/.matplotlib/tex.cache/c99befe9a1c97fca3d27815aae7b178b.tex
LaTeX2e <2020-02-02> patch level 5
L3 programming layer <2020-03-06>
("C:\Program Files\MiKTeX 2.9\tex/latex/base\article.cls"
Document Class: article 2019/12/20 v1.4l Standard LaTeX document class
("C:\Program Files\MiKTeX 2.9\tex/latex/base\size10.clo"))
======================================================================
mpmsvc: starting package maintenance...
mpmsvc: installation directory: "C:\Program Files\MiKTeX 2.9"
mpmsvc: package repository: http://mirrors.concertpass.com/tex-archive/systems/win32/miktex/tm/packages/
mpmsvc: visiting repository http://mirrors.concertpass.com/tex-archive/systems/win32/miktex/tm/packages/...
mpmsvc: repository type: remote package repository
mpmsvc: loading package repository manifest...
mpmsvc: downloading http://mirrors.concertpass.com/tex-archive/systems/win32/miktex/tm/packages/miktex-zzdb1-2.9.tar.lzma...
mpmsvc: 0.21 MB, 3.12 Mbit/s
mpmsvc: package repository digest: f491b2f1922a447b699c602836b34663
mpmsvc: going to download 332781 bytes
mpmsvc: going to install 5 file(s) (1 package(s))
mpmsvc: downloading http://mirrors.concertpass.com/tex-archive/systems/win32/miktex/tm/packages/type1cm.tar.lzma...
mpmsvc: 0.33 MB, 2.02 Mbit/s
mpmsvc: extracting files from type1cm.tar.lzma...
======================================================================

("C:\Program Files\MiKTeX 2.9\tex/latex/type1cm\type1cm.sty")
======================================================================
mpmsvc: starting package maintenance...
mpmsvc: installation directory: "C:\Program Files\MiKTeX 2.9"
mpmsvc: package repository: http://mirrors.concertpass.com/tex-archive/systems/win32/miktex/tm/packages/
mpmsvc: package repository digest: f491b2f1922a447b699c602836b34663
mpmsvc: going to download 64797738 bytes
mpmsvc: going to install 840 file(s) (1 package(s))
mpmsvc: downloading http://mirrors.concertpass.com/tex-archive/systems/win32/miktex/tm/packages/cm-super.tar.lzma...
mpmsvc: 64.80 MB, 27.49 Mbit/s
mpmsvc: extracting files from cm-super.tar.lzma...
======================================================================

("C:\Program Files\MiKTeX 2.9\tex/latex/cm-super\type1ec.sty"
("C:\Program Files\MiKTeX 2.9\tex/latex/base\t1cmr.fd"))
("C:\Program Files\MiKTeX 2.9\tex/latex/base\inputenc.sty")
("C:\Program Files\MiKTeX 2.9\tex/latex/geometry\geometry.sty"
("C:\Program Files\MiKTeX 2.9\tex/latex/graphics\keyval.sty")
("C:\Program Files\MiKTeX 2.9\tex/generic/iftex\ifvtex.sty"
("C:\Program Files\MiKTeX 2.9\tex/generic/iftex\iftex.sty"))
("C:\Program Files\MiKTeX 2.9\tex/latex/geometry\geometry.cfg")

Package geometry Warning: Over-specification in `h'-direction.
    `width' (5058.9pt) is ignored.


Package geometry Warning: Over-specification in `v'-direction.
    `height' (5058.9pt) is ignored.

) ("C:\Program Files\MiKTeX 2.9\tex/latex/base\textcomp.sty")
("C:\Program Files\MiKTeX 2.9\tex/latex/l3backend\l3backend-dvips.def")
No file c99befe9a1c97fca3d27815aae7b178b.aux.
*geometry* driver: auto-detecting
*geometry* detected driver: dvips
[1] (c99befe9a1c97fca3d27815aae7b178b.aux)



<Figure size 1080x360 with 3 Axes>