**Class 0: Introduction to Reinforcement Learning**

Contents of the class
1. [The mad hatter's casino](#mad)
2. [References](#refs)
3. [Ruining the suspense with a general definition](#def)
4. [Examples of RL problems](#examples)
5. [Course syllabus](#syllabus)
6. [OpenAI Gym](#gym)

# <a id="gpi"></a>The mad hatter's casino

Getting the main intuitions in 30 minutes by playing a fun game!
<img src="images/madhatter.png"></img>

# <a id="refs"></a>References
<table>
<tr>
<td><img src="images/book_szepesvari.jpg" style="width: 200px;"></td>
<td><b>Algorithms for Reinforcement Learning</b><br>Csaba Szepesvari<br>2010.<br>The essentials in 104 pages. A bit mathematical.<br>PDF available <a href="https://sites.ualberta.ca/~szepesva/RLBook.html">here</a> (last update in 2017).</td>
</tr>
<tr>
<td><img src="images/book_sutton.jpg" style="width: 200px;"></td>
<td><b>Reinforcement Learning: an introduction</b><br>Richard Sutton and Andrew Barto<br>1998.<br>The Reinforcement Learning bible. Both complete and didactical.<br>2nd edition scheduled in 2018, available as an online draft.<br><a href="http://incompleteideas.net/book/the-book.html">Online versions</a> of the <a href="http://incompleteideas.net/book/the-book-1st.html">1st</a> and <a href="http://incompleteideas.net/book/the-book-2nd.html">2nd</a> editions.</td>
</tr>
<tr>
<td><img src="images/web_silver.png" style="width: 200px;"></td>
<td><b>David Silver's UCL course on RL</b><br>10 video lectures + presentation PDFs.<br>2015.<br><a href="http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html">Available here</a>.</td>
</tr>
<tr>
<td><img src="images/web_sigaud.png" style="width: 200px;"></td>
<td><b>Olivier Sigaud's Sorbonne University course on RL</b><br>8 video lectures + presentation PDFs and notebooks.<br>2018.<br><a href="http://pages.isir.upmc.fr/~sigaud/teach/english.html">Available here</a>.</td>
</tr>
</table>


# <a id="def"></a>Ruining the suspense with a general definition

What is Reinforcement Learning about?

It is about controlling dynamic systems.
<img src="images/dynamic.png" style="width: 400px;"></img>
Dynamic systems? **dynamic** evolution of $s$ and $o$ under $\pi$.

Our object of study:<br>
We want to find a control policy $\pi$ (with $u = \pi(o)$) such that the system $\Sigma$ behaves as we desire.

# <a id="examples"></a>Examples of RL problems

<table>
<tr>
  <td><img src="images/spiral.jpg" style="width: 200px;"></td>
  <td>Exiting a spiral</td>
</tr>
<tr>
  <td><img src="images/tests.jpg" style="width: 200px;"></td>
  <td>Dynamic treatment regimes for HIV patients</td>
</tr>
<tr>
  <td><img src="images/pend.png" style="width: 200px;"></td>
  <td>Cart-pole balancing</td>
</tr>
<tr>
  <td><img src="images/waiting.jpg" style="width: 200px;"></td>
  <td>Queueing problems</td>
</tr>
<tr>
  <td><img src="images/market.jpg" style="width: 200px;"></td>
  <td>Portfolio management</td>
</tr>
<tr>
  <td><img src="images/dam.jpg" style="width: 200px;"></td>
  <td>Hydroelectric production</td>
</tr>
</table>

But also:
- Elevator scheduling
- Ship steering
- Bioreactor control
- Aerobatics helicopter control
- Airport departures scheduling
- Airlines scheduling
- Robocup soccer
- Video game playing (Quake, CS, Starcraft...)
- Game of Go
- ...

# <a id="syllabus"></a>Course syllabus

This course is organized as separate and almost independent notebooks.
- RL1 - Markov Decision Processes and model-based policy search
- RL2 - Online Value Function Prediction
- RL3 - Control Problems, model-free Policy Optimization
- RL4 - Deep Reinforcement Learning
- RL5 - Monte Carlo Tree Search

The final evaluation is made on a challenge that will be revealed near the middle of the course, for which you will be asked to hand in your commented code.

# <a id="gym"></a>OpenAI Gym

This class requires a recent version of Python 3 and scikit-learn (available in the <a href="https://www.anaconda.com/download">Anaconda distribution</a>).

You will need standard elements of Anaconda (numpy, matplotlib, scikit-learn, scikit-image) and graphviz.
```sh
conda install graphviz
conda install python-graphviz
```

It also require that you install the <a href="https://github.com/openai/gym">OpenAi Gym</a> collection of Reinforcement Learning environments.

Installation instructions for Gym (11/2017) - note that you can also follow the steps in the link above.
```sh
pip install gym[all]
conda install libgcc
```

If needed, upgrade any previous installation:
```sh
conda update anaconda
pip install gym[all] --upgrade
conda update libgcc
```

Test your installation (if the code below runs fine, you're sorted).

In [2]:
# This should display a 4x4 grid of letters and open a window of the Breakout game.
# Don't close the window yourself (it shouldn't work anyway)
import gym
env0 = gym.make('FrozenLake-v0')
env0.render()
env1 = gym.make('Breakout-v0')
env1.render()


[41mS[0mFFF
FHFH
FFFH
HFFG


True

In [3]:
# This should close the Breakout window
env1.close()

In [2]:
4**16

4294967296