# Gridworld

### Description

In this seminar we implement an example of the agent-environment interface used in reinforcement learning, called "gridworld".

The world consists of an $n \times m$ grid of squares, indexed by $(i,j)$ with $i = 0, 1, \dots, n-1$ and $j = 0, 1, \dots, m-1$.

The state of the environment consists of the player being in one of the squares.

The possible actions are steps in the directions "UP", "RIGHT", "DOWN", "LEFT".

The rewards and states after each action can depend on multiple factors:
- Some squares can give positive/negative rewards
- Squares might be "blocked" and not possible to step on
- Stepping off the board is impossible
- Invalid moves might give a negative reward as "punishment"
- There might be "portals" that take the player to a distant square, regardless of their action
- There might be deterministic or random effects that change the outcome of actions, e.g. "wind" or "ice".

Variants of this gridworld can be used to illustrate a wide range of concepts and algorithms in reinforcement learning.
For instance, in [Sutton & Barto](http://incompleteideas.net/book/the-book-2nd.html) see:
- Example 3.5, 3.8
- Figure 4.1
- Example 6.5, 6.6
- Figure 7.4
- Example 8.1, 8.3


You can either implement this class in a separate module (recommended) or inside a Jupyter cell, see `reuseCode.ipynb` for details.

In [1]:
import numpy as np
import random

### Implementation

Below is a suggested skeleton of a `GridWorld` class, feel free to modify or rename everything.
If you want to implement this class in a separate module, create a new file `gridworld.py`, copy the code there, and delete this code cell.

In [2]:
RIGHT= np.array([1,0])
LEFT = np.array([-1,0])
UP= np.array([0,1])
DOWN = np.array([0,-1])
DIRECTION = (RIGHT,LEFT,UP,DOWN)

class GridWorld:

    def __init__(self, height, width):
        # Store the height and width as attributes
        self.height = height
        self.width = width

        self.teleportation = dict()
        self.rewardPlace = dict()
        self.forbiddenPlace = list()
        self.blockedPlace = list()
        self.setBorder()
        self.totalReward = 0
        self.icePlace= list()
        self.windPlace = list()
        # Initialize the player position
        self.pos = np.array([0,0])
        
    def setBorder(self):
        for i in range(self.height):
            self.forbiddenPlace.append(np.array([-1,i]))
            self.forbiddenPlace.append(np.array([self.width,i]))
        for i in range(self.width):
            self.forbiddenPlace.append(np.array([i,-1]))
            self.forbiddenPlace.append(np.array([i,self.height]))

    # A method to perform an action:
    def step(self,action,returnIfBlocked=False):
        print(self.pos + action)
        if not (any( (self.pos + action == place ).all() for place in self.forbiddenPlace)):
            self.pos += action
        else: #when blocked
            self.totalReward += -1
            print(self.totalReward)
            if returnIfBlocked:
                return
        
        #when ice
        for place in self.icePlace:
            if np.array_equal(self.pos,place):
                print("icy")
                self.step(action,True)
        #when wind
        for place in self.windPlace:
            if np.array_equal(self.pos,place):
                print("windy")
                self.step(random.choice(DIRECTION),True)

        #when teleport
        for place in self.teleportation:
            if np.array_equal(self.pos, np.array(place)) :
                print("teleport")
                self.pos = self.teleportation[place]
                self.totalReward += self.rewardPlace[place]

    #     ...
    def setForbidden(self,place):
        self.forbiddenPlace.append(np.array(place))
        self.blockedPlace.append(np.array(place))
    def setIce(self,place):
        self.icePlace.append(np.array(place))
    def setWind(self,place):
        self.windPlace.append(np.array(place))
    def setTeleportation(self,origin ,destination ,reward = 0):
        origin = tuple(origin)
        destination = np.array(destination)
        self.teleportation[origin]=destination
        self.rewardPlace[origin] = reward

    # A method to reset the gridworld:
    def reset(self):
        self.pos = np.array([0,0])
    
    
    # A method to output the world:
    def drawWorld(self):
        print(self)
    
    def giveIndex(self,array):
        return [int(self.height-list(array)[1]-1),int(list(array)[0])]
    #string implementation
    def __str__(self) -> str:
        worldRepresentation = [["X" if np.array_equal(([x,y]), self.pos) else "_" for x in range(self.width)] for y in reversed(range(self.height))]

        #show special place
        for teleportStart, teleportFinish in self.teleportation.items():
            worldRepresentation[self.giveIndex(teleportStart)[0]][self.giveIndex(teleportStart)[1]] = "S"
            worldRepresentation[self.giveIndex(teleportFinish)[0]][self.giveIndex(teleportFinish)[1]] = "F"

        for blockedPosition in self.blockedPlace:
            worldRepresentation[self.giveIndex(blockedPosition)[0]][self.giveIndex(blockedPosition)[1]] = "B"

        for icePosition in self.icePlace:
            worldRepresentation[self.giveIndex(icePosition)[0]][self.giveIndex(icePosition)[1]] = "I"
        for windPosition in self.windPlace:
            worldRepresentation[self.giveIndex(windPosition)[0]][self.giveIndex(windPosition)[1]] = "W"

        #redraw X, in case it was erased
        worldRepresentation[self.giveIndex(self.pos)[0]][self.giveIndex(self.pos)[1]] = "X"

        worldRepresentation = [" ".join(line) for line in worldRepresentation]
        return "\n".join(worldRepresentation)
    
    # (!) More difficult:
    # A method to interactively "play" in the gridworld:
    # def play(??):
    #     ...
    

    # Any other method that might be useful to the user
    # ...

    # Any "helper" methods you use internally
    # ...

Once you have implemented the basic methods above, you should be able to walk around in an empty gridworld! 🎉

To make things more interesting, implement for example:
- Positive rewards for reaching certain squares
- Negative rewards for "illegal" moves
- Blocked squares that cannot be entered
- Teleporting squares that move the player to another spot. This is useful to avoid optimal "back-and-forth" policies.
- A `.previewMove()` method to simulate given actions from a given start square.
- A random effect (ice, wind, ...) that changes the effect of some actions.


### Testing

Below you can test individual aspects of your gridworld class with short code cells.

You can re-run cells or run them out of order, but it is recommended that the notebook still works if you `Run All` in a fresh jupyter session.

If you implemented an interactive `.play()` method, you might not be able to test it from inside a jupyter notebook.

In [3]:
# Create a new gridworld instance
gw = GridWorld(5,7)
gw.setWind([1,0])
gw.setWind([0,1])
gw.setTeleportation([1,1],[4,4],15)

In [4]:
# Take some arbitrary actions
T=4
for _ in range(T):
    gw.step(random.choice(DIRECTION))
    gw.drawWorld()



[ 0 -1]
-1
_ _ _ _ F _ _
_ _ _ _ _ _ _
_ _ _ _ _ _ _
W S _ _ _ _ _
X W _ _ _ _ _
[ 0 -1]
-2
_ _ _ _ F _ _
_ _ _ _ _ _ _
_ _ _ _ _ _ _
W S _ _ _ _ _
X W _ _ _ _ _
[0 1]
windy
[1 1]
teleport
_ _ _ _ X _ _
_ _ _ _ _ _ _
_ _ _ _ _ _ _
W S _ _ _ _ _
_ W _ _ _ _ _
[4 5]
12
_ _ _ _ X _ _
_ _ _ _ _ _ _
_ _ _ _ _ _ _
W S _ _ _ _ _
_ W _ _ _ _ _


In [5]:
inputDict={"w":UP,"a":LEFT,"s":DOWN,"d":RIGHT}

In [10]:
# input handler
path = input()
for letter in path:
    if letter in inputDict:
        gw.step(inputDict[key])
        gw.drawWorld()
    else:
        print( letter, " is not a direction. use WASD instead")

q  is not a direction. use WASD instead
[7 4]
1
_ _ _ _ _ _ X
_ _ _ _ _ _ _
_ _ _ _ _ _ _
W S _ _ _ _ _
_ W _ _ _ _ _
e  is not a direction. use WASD instead
r  is not a direction. use WASD instead
