# AI-LAB SESSION 1: Tutorial

Welcome to the AI-LAB! This is an introductory tutorial for you to familiarize with Jupyter Notebook and OpenAI Gym

## OpenAI Gym environments

The environment **SmallMaze** is visible in the following figure
![SmallMaze](images/maze.png)
The agent starts in cell $(0, 2)$ and has to reach the treasure in $(4, 3)$

In order to use the environment we need first to import the packages of OpenAI Gym. Notice that due to the structure of this repository, we need to add the parent directory to the path

In [3]:
import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

import gym
import envs

### Free hints:
- You can press TAB while writing code in Jupyter Notebook to open the intellisense with suggestions on how to complete your statement
- CTRL + ENTER executes a cell
- SHITF + ENTER executes a cell and goes to the next one
- CTRL + S saves the work. **Remember to do this from time to time!!!**
- SHIFT + TAB shows a function signature and docs

For other useful shorcuts check the Help menu on top

Than we create a new enviromnent **SmallMaze** and render it

In [4]:
env = gym.make("SmallMaze-v0") #crea le immagini di wall
env.render()

[['C' 'C' 'S' 'C']
 ['C' 'C' 'W' 'C']
 ['C' 'C' 'C' 'C']
 ['C' 'W' 'W' 'W']
 ['C' 'C' 'C' 'G']]


The render is a matrix with cells of different type:
* *S* - Start position
* *C* - Clear
* *W* - Wall
* *G* - Goal

An environment has some useful variables:
* *action_space* - space of possible actions: usually a range of integers $[0, ..., n]$
* *observation_space* - space of possible observations (states): usually a range of integers $[0, ..., n]$
* *actions* - mapping between action ids and their descriptions
* *startstate* - start state (unique)
* *goalstate* - goal state (unique)
* *grid* - flattened grid (1-dimensional array)

In **SmallMaze** we have 4 different possible actions numbered from 0 to 4

In [5]:
env.action_space.n

4

And they are *Left, Right, Up, Down*

In [6]:
env.actions

{0: 'L', 1: 'R', 2: 'U', 3: 'D'}

States are numbered from 0 to 20

In [7]:
env.observation_space.n

20

There are also some mehtods:
* *reneder()* - renders the environment
* *sample(state, action)* - returns a new state sampled from the ones that can be reached from *state* by performing *action* both given as ids
* *pos_to_state(x, y)* - returns the state id given its position in $x$ and $y$ coordinates
* *state_to_pos(state)* - returns the coordinates $(x, y)$ given a state id

For example, if we want to know the ids and positions for both the start and goal state

In [8]:
start = env.startstate
goal = env.goalstate
print("Start id: {}\tGoal id: {}".format(start, goal))
print("Start position: {}\tGoal position: {}".format(env.state_to_pos(start), env.state_to_pos(goal)))
print("Id of state (3, 0): {}".format(env.pos_to_state(3, 0)))

Start id: 2	Goal id: 19
Start position: (0, 2)	Goal position: (4, 3)
Id of state (3, 0): 12


Now, what if we want to move the agent *R* from its start position? Well, he reaches state 3 $(0, 3)$ since the environment is deterministic

In [7]:
env.sample(start, 1)

3

And if we want to make him move *Up* or *Down* instead? Since the agent can not move out of borders or pass through walls, he stays where he is

In [9]:
print("Current position: {}\tMoving UP: {}\tMoving DOWN: {}".format(env.state_to_pos(start),
                                                                    env.state_to_pos(env.sample(start, 2)),
                                                                    env.state_to_pos(env.sample(start, 3))))

Current position: (0, 2)	Moving UP: (0, 2)	Moving DOWN: (0, 2)


Let's do something more interesting: what are all the possible next states (I bet you'll need this later on)? We need to sample every action from the current one. Remember that actions lie in range $[0, env.action\_space.n]$

In [10]:
for action in range(env.action_space.n):
    print("From state {} with action {} -> state {}".format(env.state_to_pos(start), env.actions[action],
                                                               env.state_to_pos(env.sample(start, action))))

From state (0, 2) with action L -> state (0, 1)
From state (0, 2) with action R -> state (0, 3)
From state (0, 2) with action U -> state (0, 2)
From state (0, 2) with action D -> state (0, 2)


## The Fringe

The search algorithms you will be asked to implement make use of a **Fringe**. You already have different types fringe available, each of them being a container for instances of **FringeNode**. Recall the important difference between a node of the fringe and a state of the environment: the former is a container of the latter, plus additional information.

A **FringeNode** accepts the following arguments (that can also be accessed as variables after initialization):
* *state* - state embedded in the node (its id)
* *parent* - parent **FringeNode** of the current node being constructed (optional)

If we want to create a root **FringeNode** for the start state we can do as follows (no parent is specified since it's the root). Also, notice the required import

In [10]:
from utils.fringe import FringeNode, QueueFringe, StackFringe

root = FringeNode(start)

The next step is to create other two **FringeNode** forming a small path moving the agent *Left*

In [11]:
l_state = env.sample(start, 0)
second = FringeNode(l_state, root)  # The parent is the root
ll_state = env.sample(l_state, 0)
third = FringeNode(ll_state, l_state)  # The parent is the previous state
print("State id of 'third': {}\tParent id of 'third': {}".format(third.state, third.parent))

State id of 'third': 0	Parent id of 'third': 1


Now we analyze the difference between two **Fringe** implementations, namely **QueueFringe** and **StackFringe**. The operations allowed include:
* *add(node)* - adds a **FringeNode** to the fringe
* *remove()* - removes a **FringeNode** from the fringe and returns it
* *is_empty()* - True if the fringe is empty, False otherwise
* *state **in** fringe* - True if a state id is contained in some node within the fringe, False otherwise
* *len(fringe)* - returns the length of the fringe (the number of nodes contained therein)

Let's see some examples with **QueueFringe**

In [12]:
q_fringe = QueueFringe()
q_fringe.add(root)
q_fringe.add(second)
print("Fringe length: {}".format(len(q_fringe)))
q_fringe.add(third)

Fringe length: 2


The fringe contains 3 nodes at the moment. Pay attention to the order they are removed: a **QueueFringe** is a FIFO

In [13]:
while not q_fringe.is_empty():
    print("Removed: {}".format(q_fringe.remove().state))
print("Fringe length: {}".format(len(q_fringe)))

Removed: 2
Removed: 1
Removed: 0
Fringe length: 0


**StackFringe** instad is a LIFO (a stack)

In [14]:
s_fringe = StackFringe()
s_fringe.add(root)
s_fringe.add(second)
s_fringe.add(third)
print("Fringe length: {}".format(len(s_fringe)))

while not s_fringe.is_empty():
    print("Removed: {}".format(s_fringe.remove().state))
print("Fringe length: {}".format(len(s_fringe)))

Fringe length: 3
Removed: 0
Removed: 1
Removed: 2
Fringe length: 0


If you want to avoid inserting two **FringeNode** embedding the same state id you can perform a check to see if a specific state is already contained in the fringe

In [15]:
q_fringe = QueueFringe()
if root.state not in q_fringe:
    q_fringe.add(root)
root.state in q_fringe

True