# Gridworld

### Description

In this seminar we implement an example of the agent-environment interface used in reinforcement learning, called "gridworld".

The world consists of an $n \times m$ grid of squares, indexed by $(i,j)$ with $i = 0, 1, \dots, n-1$ and $j = 0, 1, \dots, m-1$.

The state of the environment consists of the player being in one of the squares.

The possible actions are steps in the directions "UP", "RIGHT", "DOWN", "LEFT".

The rewards and states after each action can depend on multiple factors:
- Some squares can give positive/negative rewards
- Squares might be "blocked" and not possible to step on
- Stepping off the board is impossible
- Invalid moves might give a negative reward as "punishment"
- There might be "portals" that take the player to a distant square, regardless of their action
- There might be deterministic or random effects that change the outcome of actions, e.g. "wind" or "ice".

Variants of this gridworld can be used to illustrate a wide range of concepts and algorithms in reinforcement learning.
For instance, in [Sutton & Barto](http://incompleteideas.net/book/the-book-2nd.html) see:
- Example 3.5, 3.8
- Figure 4.1
- Example 6.5, 6.6
- Figure 7.4
- Example 8.1, 8.3


You can either implement this class in a separate module (recommended) or inside a Jupyter cell, see `reuseCode.ipynb` for details.

### Implementation

Below is a suggested skeleton of a `GridWorld` class, feel free to modify or rename everything.
If you want to implement this class in a separate module, create a new file `gridworld.py`, copy the code there, and delete this code cell.

In [15]:
class GridWorld:

    def __init__(self, height, width):
        # Store the height and width as attributes
        self.height = height
        self.width = width
        
        # Initialize the player position
        self.pos = (0, 0)
        
    # A method to perform an action:
    # def step(??):
    #     ...
    
    # A method to reset the gridworld:
    # def reset(??):
    #     ...
    
    # A method to output the world:
    # def drawWorld(??):
    #     ...
    
    # (!) More difficult:
    # A method to interactively "play" in the gridworld:
    # def play(??):
    #     ...
    

    # Any other method that might be useful to the user
    # ...

    # Any "helper" methods you use internally
    # ...
    

Once you have implemented the basic methods above, you should be able to walk around in an empty gridworld! 🎉

To make things more interesting, implement for example:
- Positive rewards for reaching certain squares
- Negative rewards for "illegal" moves
- Blocked squares that cannot be entered
- Teleporting squares that move the player to another spot. This is useful to avoid optimal "back-and-forth" policies.
- A `.previewMove()` method to simulate given actions from a given start square.
- A random effect (ice, wind, ...) that changes the effect of some actions.


### Testing

Below you can test individual aspects of your gridworld class with short code cells.

You can re-run cells or run them out of order, but it is recommended that the notebook still works if you `Run All` in a fresh jupyter session.

If you implemented an interactive `.play()` method, you might not be able to test it from inside a jupyter notebook.

In [3]:
# Import the GridWorld class from `gridworld.py`
from gridworld import GridWorld

In [4]:
# Create a new gridworld instance
# gw = ...?

In [11]:
# Take some arbitrary actions
# gw.step(??)


In [12]:
# Test any other behaviour that you implemented...