# Introduction

In this tutorial, we will implement synchronous value iteration and Q-learning adapted to the options framework.

To read more details about the options framework before doing these exercises, read the paper by Sutton et al.: http://www.sciencedirect.com/science/article/pii/S0004370299000521

# Notebook setup

## Instructions

- Import numpy, scipy and matplotlib
- Configure inline plots
- Import helper modules

In [10]:
% matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from pylab import *

# Import definitions of the environments.
import RL_worlds as worlds

# Import helper functions for plotting.
from plot_util import *

# Exercise 1: multi-room grid world

1. Implement a 4-room grid world like the one in page 194 of the article. You can do this by extending class *world* from the RL_worlds module. Use only primitive (single-step) options (i.e., simple actions).
Tip: you can implement a new class called option to represent options in general, including single-step ones. This class will include an initiation set $I$, a termination condition $\beta$, and a policy $\pi$ associated with the option.

2. Modify the environment to also include multi-step options that can go from anywhere inside a room into one of that room's two hallways, as described in the paper.

# Exercise 2: SMDP learning

1. Implement a learning method for the SMDP using modified Q-learning (page 195).

2. Replicate the results shown in the paper, using both multi-step and single-step options (Fig. 5).

# Exercise 3: SMDP planning

1. Implement a planning method for the SMDP using synchronous value iteration (page 191).

2. Replicate the results shown in the paper, including the comparison between using multi-step only options vs. including single-step options (Fig. 4).