# Planning-Lab Lesson 1: Tutorial

Welcome to the Planning-Lab! This is an introductory tutorial for you to familiarize with Jupyter Notebook and OpenAI Gym

## OpenAI Gym environments

The environment **SmallMaze** is visible in the following figure
<img src="images/maze.png" width="300">

The agent starts in cell $(0, 2)$ and has to reach the treasure in $(4, 3)$

In order to use the environment we need first to import the packages of OpenAI Gym. Notice that due to the structure of this repository, we need to add the parent directory to the path

In [2]:
import os
import sys
module_path = os.path.abspath(os.path.join('../tools'))
if module_path not in sys.path:
    sys.path.append(module_path)

import gym, envs

### Free hints:
- You can press TAB while writing code in Jupyter Notebook to open the intellisense with suggestions on how to complete your statement
- CTRL + ENTER executes a cell
- SHITF + ENTER executes a cell and goes to the next one
- CTRL + S saves the work. **Remember to do this from time to time!!!**
- SHIFT + TAB shows a function signature and docs

For other useful shorcuts check the Help menu on top

Than we create a new enviromnent **SmallMaze** and render it

In [3]:
env = gym.make("SmallMaze-v0")
env.render()

[['C' 'C' 'S' 'C']
 ['C' 'C' 'W' 'C']
 ['C' 'C' 'C' 'C']
 ['C' 'W' 'W' 'W']
 ['C' 'C' 'C' 'G']]


The render is a matrix with cells of different type:
* *S* - Start position
* *C* - Clear
* *W* - Wall
* *G* - Goal

An environment has some useful variables:
* *action_space* - space of possible actions: usually a range of integers $[0, ..., n]$
* *observation_space* - space of possible observations (states): usually a range of integers $[0, ..., n]$
* *actions* - mapping between action ids and their descriptions
* *startstate* - start state (unique)
* *goalstate* - goal state (unique)
* *grid* - flattened grid (1-dimensional array)

In **SmallMaze** we have 4 different possible actions numbered from 0 to 4

In [9]:
env.action_space.n

4

And they are *Left, Right, Up, Down*

In [10]:
env.actions

{0: 'L', 1: 'R', 2: 'U', 3: 'D'}

States are numbered from 0 to 20

In [11]:
env.observation_space.n

20

There are also some mehtods:
* *render()* - renders the environment
* *sample(state, action)* - returns a new state sampled from the ones that can be reached from *state* by performing *action* both given as ids
* *pos_to_state(x, y)* - returns the state id given its position in $x$ and $y$ coordinates
* *state_to_pos(state)* - returns the coordinates $(x, y)$ given a state id

For example, if we want to know the ids and positions for both the start and goal state

In [4]:
start = env.startstate
goal = env.goalstate
print("Start id: {}\tGoal id: {}".format(start, goal))
print("Start position: {}\tGoal position: {}".format(env.state_to_pos(start), env.state_to_pos(goal)))
print("Id of state (3, 0): {}\n".format(env.pos_to_state(3, 0)))
env.render()

Start id: 2	Goal id: 19
Start position: (0, 2)	Goal position: (4, 3)
Id of state (3, 0): 12

[['C' 'C' 'S' 'C']
 ['C' 'C' 'W' 'C']
 ['C' 'C' 'C' 'C']
 ['C' 'W' 'W' 'W']
 ['C' 'C' 'C' 'G']]


Now, what if we want to move the agent *R* from its start position? Well, he reaches state 3 $(0, 3)$ since the environment is deterministic

In [23]:
print("current postion: {} \t Moving RIGHT {}".format(env.state_to_pos(start),env.state_to_pos(env.sample(start, 1))))

current postion: (0, 2) 	 Moving RIGHT (0, 3)


And if we want to make him move *Up* or *Down* instead? Since the agent can not move out of borders or pass through walls, he stays where he is

In [17]:
print("Current position: {}\tMoving UP: {}\tMoving DOWN: {}".format(env.state_to_pos(start),
                                                                    env.state_to_pos(env.sample(start, 2)),
                                                                    env.state_to_pos(env.sample(start, 3))))

Current position: (0, 2)	Moving UP: (0, 2)	Moving DOWN: (0, 2)


Let's do something more interesting: what are all the possible next states (I bet you'll need this later on)? We need to sample every action from the current one. Remember that actions lie in range $[0,\; env.action\_space.n]$

In [24]:
for action in range(env.action_space.n):
    print("From state {} with action {} -> state {}".format(env.state_to_pos(start), env.actions[action],
                                                               env.state_to_pos(env.sample(start, action))))

From state (0, 2) with action L -> state (0, 1)
From state (0, 2) with action R -> state (0, 3)
From state (0, 2) with action U -> state (0, 2)
From state (0, 2) with action D -> state (0, 2)


## Node and Node Queue

The search algorithms you will be asked to implement make use of a **Node**. Recall the important difference between a node and a state of the environment: the former is a container of the latter, plus additional information.

A **Node** accepts the following arguments (that can also be accessed as variables after initialization):
* *state* - state embedded in the node (its id)
* *parent* - parent **Node** of the current node being constructed (optional)

If we want to create a root **Node** for the start state we can do as follows (no parent is specified since it's the root). Also, notice the required import:

In [25]:
from utils.ai_lab_functions import *

start = env.startstate
root = Node(start)
print(root.state)

2


The next step is to create other two **Node** structures forming a small path moving the agent *Left*

In [11]:
# hence, we take the startstate i.e., start variable and we perform a left movement 0
left_state = env.sample(start, 0)
print('left_state: {}'.format(left_state))
# now we create a Node(state, Node) with this information
second = Node(left_state, root)  # The parent is the root


leftleft_state = env.sample(left_state, 0)
print('leftleft_state: {}'.format(leftleft_state))
third = Node(leftleft_state, second)  # The parent is the previous node

print("State id of 'third': {}\tParent id of 'third': {}".format(third.state, third.parent.state))

left_state: 1
leftleft_state: 0
State id of 'third': 0	Parent id of 'third': 1


The next step show how to get the info from the object **Node**. From the node we can directly acess to different variables, for the first lesson the only needed are:

In [12]:
print("Node state: {}".format(second.state))
print("Node position: {}".format(env.state_to_pos(second.state)))
print("Node parent (state): {}".format(second.parent.state))
print("Node parent (position): {}".format(env.state_to_pos(second.parent.state)))

Node state: 1
Node position: (0, 1)
Node parent (state): 2
Node parent (position): (0, 2)


Now we analyze the first type of *Node List* implementations, namely **NodeQueue**. This is a FIFO queue and the operations allowed include:
* *add(node)* - adds a **Node** at the end of the queue.
* *remove()* - removes the first **Node** from the queue and returns it.
* *is_empty()* - True if the list is empty, False otherwise.
* *state **in** list* - True if a state id is contained in some node of the list, False otherwise.
* *len(queue)* - returns the length of the list (the number of nodes contained therein).

Let's see some examples with **NodeQueue**:

In [13]:
node_queue = NodeQueue()
node_queue.add(root)
node_queue.add(second)
node_queue.add(third)

print("root in node_queue?", root.state in node_queue)
print("Queue length: {}".format(len(node_queue)))
    

root in node_queue? True
Queue length: 3


The list contains 3 nodes at the moment. Pay attention to the order they are removed: a **NodeQueue** is a FIFO queue

In [14]:
while not node_queue.is_empty():
    print("Removed Node with state: {}".format(node_queue.remove().state))
print("List length: {}".format(len(node_queue)))

Removed Node with state: 2
Removed Node with state: 1
Removed Node with state: 0
List length: 0


### Now you are ready to start the first assignment in lesson_1_problem.ipynb file!