# Reinforcement Learning CM50270 Lab - Session 0

Lecturer: Özgür Şimşek o.simsek@bath.ac.uk

Tutor: Jan Malte Lichtenberg j.m.lichtenberg@bath.ac.uk

# Learning Goals

* Learn how to navigate in a Jupyter Notebook
* Learn how to import libraries and other code files
* Basic usage of the 'numpy' library for scientific computing
* Basic usage of the 'matplotlib' library for visualization of results
* Learn how to define a class with methods and fields
* Preparation for RL CM50270 Lab - Session 1

# Introduction

This preliminary session will cover some basics of Python. We will cover the basics of the numpy library, which simplifies the manipulation of arrays, and should thus be helpful to code parts of the environment and the reinforcement learning agent.

We will then look at simple examples of visualizing results using the matplotlib.pyplot library. These should enable you to plot learning curves that show the performance of your agent.

Finally, we will show the very basics of object oriented programming (OOP) in Python. We strongly advise you to use OOP for your environments and agents. Note that this is _not_ an introduction to the principles of object-oriented programming. If you need a refresher, please look at, for example, [here](https://python.swaroopch.com/oop.html).

## Cells

After launching this notebook, try running/executing the next cell by pressing shift-enter on it. 

In [None]:
4*5

You can go into 'edit mode' by clicking on a cell. In order to exit the 'edit mode' and go into 'command mode', press escape or click somewhere outside of any cell. Once in 'command mode', you can, for example, create new cells by pressing 'a' for above or 'b' for below. Try it out! 

The cell you just executed above is called a 'code cell'. The kind of cell in which this text is written is called a 'markdown' cell. You can switch between both modes using the dropdown menu in the tool bar or by pressing 'm' or 'y', respectively.

## The Kernel

The Python session in which your cells are executed is called the 'kernel'. You can stop or restart the kernel (that is, start a new Python session) in the toolbar on the top of the page.

## Import libraries and other code

You can import libraries as follows. We will import the library `os`, which is useful for loading data and images. Execute the following cell!

In [None]:
import os

After exectuing this cell, your kernel will have access to everything inside the `os` library which is a common library for interacting with the operating system.  We'll need to use the import statement for all of the libraries that we include. 

Sometimes you may want to write parts of your code (e.g., utility functions or class definitions) in a seperate file. You can again use the `import` statement. We now import everything contained in the file utils_session_0.py.

In [None]:
import utils_session_0

We can then access the function defined in this file as follows.

In [None]:
utils_session_0.utils_test()

# Numpy

The numpy library is a popular scientific computing library for Python. In a reinforcement learning context, we can use it to store numerical data such as, for example, a Q-table of a Q-learning agent or the rewards for different states in an environment. We can use the statement `import long_name_of_a_library as short_name` to make our lives easier. It is common practice to import `numpy` as `np`.

In [None]:
import numpy as np

You can now create arrays as follows.

In [None]:
# One-dimensional array
a = np.array([3, 4, 5, 6])
print("a =", a)

a_2 = np.arange(3, 7) 
print("a_2 =", a_2)

# Two-dimensional array
b = np.array([[1, 2],
              [3, 4]])
print("b ="); print(b)

# Variable-length array of zeros
num_rows = 3
num_cols = 5
c = np.zeros(shape=(num_rows, num_cols))
print("c ="); print(c)

# Variable length array of random integers between 0 (inclusive) and 6 (exclusive) 
d = np.random.randint(low = 0, high=6, size=(num_rows, num_cols))
print("d ="); print(d)

Note that the array `c` consists of floats instead of integers (as can be seen by the trailing '.' behind each 0). You can specify the type as follows.

In [None]:
# Variable-length array of integers
c_int = np.zeros(shape = (num_rows, num_cols), dtype = int)
print("c_int =")
print(c_int)

**Indexing arrays.** Numpy has numerous ways to access elements of your array. Notice that indices start at 0! Let's look at a few of these, more info on indexing can be found [here](http://scipy-cookbook.readthedocs.io/items/Indexing.html).

In [None]:
# Indexing one-dimensional arrays.
second_element_of_a = a[1]
print("Second element of a:  a[1]=", second_element_of_a)

# Two-dimensional arrays
print("Element in the first row, second column of b:   b[0, 1]=", b[0, 1])

**Slicing arrays.** A convinent method to access sub-arrays is to use the ':' operator. Intuitively, ':' says "choose all elements along this dimension" or, if used with a preceding or succeeding integer index, "choose all suceeding or preceding elements", respectively. For example, we can easily access rows or columns of a multi-dimensional array or "elements 5 to 10" of a one-dimensional array.

In [None]:
## Slicing one-dimensional arrays.
print("a =", a)
print("a[1:] =", a[1:])
print("a[2:] =", a[2:])
print("a[:2] =", a[:2])
print("a[1:3] =", a[1:3])

## Slicing two-dimensionl arrays.
# First row of b.
print("b ="); print(b)
print("b[0, :] =", b[0, :])
# Second column of b.
print("b[:, 1] =", b[:, 1])

Numpy provides a large variety of functions for array manipulation, statistics, and linear algebra. Have a look at the library by checking the documentation with `help(np)`. Another option to get information is to write `np.` and then press `<tab>`. This shows a dropdown of all available functions in this library:

In [None]:
# uncomment the lines to try them
# help(np)
# np.<tab>

Selecting a function from the dropdown and adding a `?` at the end will bring up the function's documentation.
Let's look at the mean function of `np`.

In [None]:
np.mean?

We can now calculate means of arrays. If we do not specify any further arguments, np.mean calculates the mean of all elements. Note also the two different ways of executing the same function.

In [None]:
print(np.mean(a))
print(a.mean())
print(np.mean(b))
print(b.mean())

Sometimes we want to have column-wise or row-wise means or maxima. We can do so by specifying the axis-argument of the mean() function.

In [None]:
# Column-wise means
print(np.mean(b, axis = 0))

# Row-wise maxima
print(np.max(b, axis = 1))

**Updating array values.**

In [None]:
print(b)
# Assign a specific value
b[0, 1] = 9
print(b)
# Increase / decrease values.
b[1, 0] += 2
b[0, 0] -= 1
print(b)

# Matplotlib

`matplotlib` is an incredibly powerful Python visualization library. In a reinforcement learning context, we can use it to plot learning curves or to visualize policy functions. We actually use the `matplotlib.pyplot` module which is specifically used in jupyter notebooks.

In [None]:
import matplotlib.pyplot as plt

We'll now tell matplotlib to "inline" plots using an ipython magic function:

In [None]:
%matplotlib inline

This isn't python, so won't work inside of any python script files.  This only works inside notebooks.  What this is saying is that whenever we plot something using matplotlib, put the plots directly into the notebook, instead of using a window popup, which is the default behavior.

Let us now start visualizing some random data. You may use this code as a template to plot a learning curve for an agent. 

Specifically, we will create a sample of 50 random numbers drawn from a Gaussian random variable $X \sim \mathcal{N}(\mu, \sigma)$ with mean $\mu = 0$ and standard deviation $\sigma = 0.1$.

In [None]:
mu, sigma = 0, 0.1
sample = np.random.normal(mu, sigma, size=50)

We can create a simple line plot using the plt.plot() command. Note thaty pyplot automatically assumes that the given data is a function of $x = [1, \dots, 50]$ because we did not provide any further data.

In [None]:
plt.plot(sample);

**Customizing your plot.** `matplotlib` lets you change almost every detail of your plot. The `plt.plot()` that we used in the last cell actually does many things at once: 
* It creates a **figure**-object, which keeps tracks of all 'axes'-objects (see below) and handles general attributes of the plot such as, for example, the figure size. 
* It creates one **axes**-object, which is what you think of as 'a plot', that is, the region where the data is visualized.
* It creates some essential **artist**-objects such as, for example, both the x-axis and the y-axis and their corresponding ticks.
* It uses the data to create the line, another 'artist'.
* Finally, it actually 'shows' the plot by drawing all the artists on the canvas.

We can take control of any of the steps shown above in order to modify aspects of the plot. 

In [None]:
# We create more data that we want to compare with the first sample. 
# Note that this operation adds 2 to every element of the np.array 'sample'
sample2 = np.random.normal(mu, sigma, size=50) + np.linspace(0, 2, num=50)

# We also specify the x-coordinates
x = np.arange(0, 50)

# Create one figure with two subplots that are positioned within one column.
fig, (ax1, ax2) = plt.subplots(nrows=2, ncols=1)

# Modify the first subplot.
ax1.plot(x, sample, color = "red")
ax1.set_title("Two random line plots")
ax1.set_xlabel('Time')
ax1.set_ylabel('y')

# Modify the second subplot.
ax2.plot(x, sample, color="red", label="XYZ")
ax2.plot(x, sample2, color="blue", label="GRLC")
ax2.set_xlabel('Time')
ax2.set_ylabel('y')

plt.legend()
# Show the whole figure including both subplots
plt.show()

# Object-oriented programming in Python

We now define an Agent **class**, which, initially, has no functionality at all. We will then gradually improve its functionality by adding fields and methods.

In [None]:
class Agent:
    pass

agent_x = Agent()
print(agent_x)
agent_y = Agent()
print(agent_y)

We first defined an empty class. The `pass` argument is a null operation, which is useful as a placeholder when a statement is required syntactically, but no code needs to be executed.

We then created two (pretty useless) **instances** of our Agent class. We now redifine our class and add a **method** to the Agent class. A method is a function available to all instances of our class. Methods are defined within the class and differ from functions in that the first argument is always `self`.

In [None]:
class Agent:
    def who_am_i(self):
        print("I am an agent!")
        
agent_x = Agent()
agent_x.who_am_i()
agent_y = Agent()
agent_y.who_am_i()

We now define an **instance variable** or **field** called `name` to give the agents different identities. Instance variables are owned by each individual object/instance. They are not shared with other instances of the same class. Usually, all instance variables are defined within the `__init__()` method and then later updated, if needed.

The `__init__()` method is a special function that is run as soon as the obect is instantiated, that is, when `Agent()` is called. We use the `__init__()` method to pass initial values instance variables upon creation of the instance.


In [None]:
class Agent:
    def __init__(self, name):
        self.name = name
        
    def who_am_i(self):
        print("I am agent", self.name, "!")
        
agent_x = Agent("X")
agent_x.who_am_i()
agent_q = Agent("Q")
agent_q.who_am_i()

You can access instance variables the same way you access methods (if you did not declare them as private variables).

In [None]:
print(agent_x.name)
print(agent_q.name)

If you want to know more about OOP in Python (for example, how to use inheritance), give this [tutorial](https://python.swaroopch.com/oop.html) a try.

You should now be able to get started in Session 1. Think about which fields and methods a RL agent would need and give it a try! 

Sources of this tutorial include:
* https://python.swaroopch.com/oop.html
* https://matplotlib.org/faq/usage_faq.html#usage
* https://docs.scipy.org/doc/numpy-dev/user/quickstart.html
* and the excellent https://github.com/pkmital/CADL
