# Práctica 1: Ambiente Snake y Agente Reactivo
### [Introducción a los Sistemas Inteligentes 2019-1](https://fagonzalezo.github.io/iis-2019-1/)
### Universidad Nacional de Colombia, Bogotá

Rubén Camilo Buelvas Villa

---

Las siguientes instrucciones instalan las librerías y archivos necesarios para el notebook.

In [1]:
!rm -r snake-ai-reinforcement
!git clone https://github.com/YuriyGuts/snake-ai-reinforcement.git
!mv snake-ai-reinforcement/snakeai .
!ls


rm: cannot remove 'snake-ai-reinforcement': No such file or directory
Cloning into 'snake-ai-reinforcement'...
remote: Enumerating objects: 197, done.[K
remote: Total 197 (delta 0), reused 0 (delta 0), pack-reused 197[K
Receiving objects: 100% (197/197), 42.98 KiB | 369.00 KiB/s, done.
Resolving deltas: 100% (97/97), done.
 LICENSE    'Search agents.ipynb'   snakeai
 README.md  'Simple agents.ipynb'   snake-ai-reinforcement


Vamos a construir un agente que sea capaz de jugar el juego de *Snake*:

<img src="https://cloud.githubusercontent.com/assets/2750531/24808769/cc825424-1bc5-11e7-816f-7320f7bda2cf.gif" alt="Snake snapshot" width="320"/>

Para esto vamos a usar como base este [proyecto](https://github.com/YuriyGuts/snake-ai-reinforcement) desarrollado por [Yuriy Guts](https://github.com/YuriyGuts).

Primero definimos una clase que nos permite simular el juego:


In [2]:
from snakeai.gameplay.environment import Environment

class EnvironmentPO(Environment):
    """
    Partial observation environment. Same as base class environment, overloads 
    `get_observation` so that only the cells in front of the snake are returned. 
    (From Environment doc): Represents the RL environment for the Snake game that implements the game logic,
    provides rewards for the agent and keeps track of game statistics.
    """
    def __init__(self, config, verbose=0):
        super().__init__(config, verbose)

    @property
    def observation_shape(self):
        """ Get the shape of the state observed at each timestep. """
        return 3

    def get_observation(self):
        """ Observe the state of the environment. """
        if self.is_game_over:
            return (0, 0, 0)
        center = self.snake.head + self.snake.direction
        if self.snake.direction == Point(0,1):
            left = self.snake.head + Point(1,0)
            right = self.snake.head + Point(-1, 0)
        elif self.snake.direction == Point(0, -1):
            left = self.snake.head + Point(-1, 0)
            right = self.snake.head + Point(1, 0)
        elif self.snake.direction == Point(1, 0):
            left = self.snake.head + Point(0, -1)
            right = self.snake.head + Point(0, 1)
        else:
            left = self.snake.head + Point(0, 1)
            right = self.snake.head + Point(0, -1)
        return (self.field[left], self.field[center], self.field[right])
    
    def show_field(self):
        return self.field.__str__()

Esta clase extiende la clase `Environment` del proyecto `snakeai`:

```Python
class Environment(object):
    """
    Represents the RL environment for the Snake game that implements the game logic,
    provides rewards for the agent and keeps track of game statistics.
    """

    def __init__(self, config, verbose=1):
        """
        Create a new Snake RL environment.
        
        Args:
            config (dict): level configuration, typically found in JSON configs.  
            verbose (int): verbosity level:
                0 = do not write any debug information;
                1 = write a CSV file containing the statistics for every episode;
                2 = same as 1, but also write a full log file containing the state of each timestep.
        """
        self.field = Field(level_map=config['field'])
        self.snake = None
        self.fruit = None
        self.initial_snake_length = config['initial_snake_length']
        self.rewards = config['rewards']
        self.max_step_limit = config.get('max_step_limit', 1000)
        self.is_game_over = False

        self.timestep_index = 0
        self.current_action = None
        self.stats = EpisodeStatistics()
        self.verbose = verbose
        self.debug_file = None
        self.stats_file = None

    def seed(self, value):

    @property
    def observation_shape(self):
        """ Get the shape of the state observed at each timestep. """

    @property
    def num_actions(self):
        """ Get the number of actions the agent can take. """

    def new_episode(self):
        """ Reset the environment and begin a new episode. """
        
    def record_timestep_stats(self, result):
        """ Record environment statistics according to the verbosity level. """

    def get_observation(self):
        """ Observe the state of the environment. """

    def choose_action(self, action):
        """ Choose the action that will be taken at the next timestep. """

    def timestep(self):
        """ Execute the timestep and return the new observable state. """

    def generate_fruit(self, position=None):
        """ Generate a new fruit at a random unoccupied cell. """

    def has_hit_wall(self):
        """ True if the snake has hit a wall, False otherwise. """

    def has_hit_own_body(self):
        """ True if the snake has hit its own body, False otherwise. """

    def is_alive(self):
        """ True if the snake is still alive, False otherwise. """
```

Un agente para jugar Snake lo construimos extendiendo la clase `AgentBase`. El siguiente es un agente que ejecuta sus acciones al azar:

In [3]:
from snakeai.agent import AgentBase

class RandomActionAgent(AgentBase):
    """ Represents a Snake agent that takes a random action at every step. """

    def __init__(self):
        pass

    def begin_episode(self):
        pass

    def act(self, observation, reward):
        return random.choice(ALL_SNAKE_ACTIONS)

    def end_episode(self):
        pass

Finalmente definimos una función `play` que nos permite simular el juego:

In [4]:
from snakeai.gameplay.entities import ALL_SNAKE_ACTIONS, Point
import numpy as np
import random


def play(env, agent, num_episodes=1, verbose=1):
    """
    Play a set of episodes using the specified Snake agent.
    Use the non-interactive command-line interface and print the summary statistics afterwards.
    
    Args:
        env: an instance of Snake environment.
        agent: an instance of Snake agent.
        num_episodes (int): the number of episodes to run.
    """

    fruit_stats = []

    print()
    print('Playing:')

    for episode in range(num_episodes):
        timestep = env.new_episode()
        agent.begin_episode()
        game_over = False
        step = 0
        while not game_over:
            try:
                if verbose > 0:
                    print("------ Step ", step, " ------")
                    print (env.show_field())
                    print ("Observation:", env.get_observation())
                    print ("Head:", env.snake.head)
                    print ("Direction:", env.snake.direction)
                step += 1
                action = agent.act(timestep.observation, timestep.reward)
                env.choose_action(action)
                timestep = env.timestep()
                game_over = timestep.is_episode_end
            except:
                game_over = True

        fruit_stats.append(env.stats.fruits_eaten)

        summary = '******* Episode {:3d} / {:3d} | Timesteps {:4d} | Fruits {:2d}'
        print(summary.format(episode + 1, num_episodes, env.stats.timesteps_survived, env.stats.fruits_eaten))

    print()
    print('Fruits eaten {:.1f} +/- stddev {:.1f}'.format(np.mean(fruit_stats), np.std(fruit_stats)))

Ya tenemos todos los elementos necesarios para simular el juego. Arrancamos con un tablero inicial en el cual 
la serpiente está en el centro. Esto lo especificamos con un dictionario que indica la configuración, los campos 
que nos interesan son `field`, `initial_snake_length` y `max_step_limit`, los otros campo los podemo ignorar por el momento:

In [5]:
inicial = {
  "field": [
    "#######",
    "#.....#",
    "#.....#",
    "#..S..#",
    "#.....#",
    "#.....#",
    "#######"
  ],

  "initial_snake_length": 2,
  "max_step_limit": 1000,

  "rewards": {
    "timestep": -0.01,
    "ate_fruit": 1,
    "died": -1
  }
}

Veamos como se comporta el agente aleatorio con esta configuración:

In [6]:
env = EnvironmentPO(config=inicial, verbose=0)
agent = RandomActionAgent()
play(env, agent, num_episodes= 1, verbose=1)


Playing:
------ Step  0  ------
#######
#.....#
#.....#
#.OS..#
#..s..#
#.....#
#######
Observation: (1, 0, 0)
Head: Point(x=3, y=3)
Direction: Point(x=0, y=-1)
------ Step  1  ------
#######
#.....#
#.....#
#.OsS.#
#.....#
#.....#
#######
Observation: (0, 0, 0)
Head: Point(x=4, y=3)
Direction: Point(x=1, y=0)
------ Step  2  ------
#######
#.....#
#.....#
#.O.sS#
#.....#
#.....#
#######
Observation: (0, 4, 0)
Head: Point(x=5, y=3)
Direction: Point(x=1, y=0)
------ Step  3  ------
#######
#.....#
#....S#
#.O..s#
#.....#
#.....#
#######
Observation: (0, 0, 4)
Head: Point(x=5, y=2)
Direction: Point(x=0, y=-1)
******* Episode   1 /   1 | Timesteps    4 | Fruits  0

Fruits eaten 0.0 +/- stddev 0.0


## 1. Agente con un plan determinado

Vamos a construir un agente que partiendo del siguiente estado inicial: 


In [7]:
"""
#######
#.....#
#.###.#
#..S..#
##.s..#
#.....#
#######
"""

'\n#######\n#.....#\n#.###.#\n#..S..#\n##.s..#\n#.....#\n#######\n'

llegue al siguiente estado

In [8]:
"""
#######
#.....#
#s###.#
#S....#
##....#
#.....#
#######
"""

'\n#######\n#.....#\n#s###.#\n#S....#\n##....#\n#.....#\n#######\n'

In [9]:
class PredefinedActionAgent(AgentBase):
    """ Represents a Snake agent that takes a random action at every step. """

    def __init__(self, actions):
        self.actions = actions
        self.step = 0
        pass

    def begin_episode(self):
        pass

    def act(self, observation, reward):
        """
        The agent takes the next action in the list of actions. Increases 
        step by 1.
        """

        if self.step >= len(self.actions):
            action = ALL_SNAKE_ACTIONS[0]
        else:
            action = self.actions[self.step]
            self.step += 1
        return action

    def end_episode(self):
        pass

inicial1 = {
  "field": [
    "#######",
    "#.....#",
    "#.###.#",
    "#..S..#",
    "##....#",
    "#.....#",
    "#######"
  ],

  "initial_snake_length": 2,
  "max_step_limit": 11,

  "rewards": {
    "timestep": -0.01,
    "ate_fruit": 1,
    "died": -1
  }
}
env = EnvironmentPO(config=inicial1, verbose=1)
agent = PredefinedActionAgent((ALL_SNAKE_ACTIONS[2], 
                               ALL_SNAKE_ACTIONS[0],
                               ALL_SNAKE_ACTIONS[1],
                               ALL_SNAKE_ACTIONS[0],
                               ALL_SNAKE_ACTIONS[1],
                               ALL_SNAKE_ACTIONS[0],
                               ALL_SNAKE_ACTIONS[0],
                               ALL_SNAKE_ACTIONS[0],
                               ALL_SNAKE_ACTIONS[1],
                               ALL_SNAKE_ACTIONS[0]))

play(env, agent, num_episodes= 1, verbose=1)




Playing:
------ Step  0  ------
#######
#.....#
#.###.#
#..S.O#
##.s..#
#.....#
#######
Observation: (0, 4, 0)
Head: Point(x=3, y=3)
Direction: Point(x=0, y=-1)
------ Step  1  ------
#######
#.....#
#.###.#
#..sSO#
##....#
#.....#
#######
Observation: (4, 1, 0)
Head: Point(x=4, y=3)
Direction: Point(x=1, y=0)
------ Step  2  ------
#######
#.....#
#.###.#
#..ssS#
##....#
#...O.#
#######
Observation: (0, 4, 0)
Head: Point(x=5, y=3)
Direction: Point(x=1, y=0)
------ Step  3  ------
#######
#.....#
#.###S#
#...ss#
##....#
#...O.#
#######
Observation: (4, 0, 4)
Head: Point(x=5, y=2)
Direction: Point(x=0, y=-1)
------ Step  4  ------
#######
#....S#
#.###s#
#....s#
##....#
#...O.#
#######
Observation: (0, 4, 4)
Head: Point(x=5, y=1)
Direction: Point(x=0, y=-1)
------ Step  5  ------
#######
#...Ss#
#.###s#
#.....#
##....#
#...O.#
#######
Observation: (4, 0, 4)
Head: Point(x=4, y=1)
Direction: Point(x=-1, y=0)
------ Step  6  ------
#######
#..Sss#
#.###.#
#.....#
##....#
#...O.#
#######
O

## 2. Agente reactivo

La idea es construir un agente reactivo, es decir que sus acciones solo dependen de la observación en un momento dado y 
no tiene memoria. El agente debe procurar no estrellarse y come cuantas frutas pueda. Compare el comportamiento de este agente con el del agente
al azar. Simule cada agente por 100 episodios. Presente los resultados y discútalos.



In [10]:
class ReactiveAgent(AgentBase):
    """ 
    Represents a reactive Snake agent that dcides an action exclusively based on
    the current observation.
    """

    def __init__(self):
        pass

    def begin_episode(self):
        pass

    def act(self, observation, reward):
        """
        The agent anlizes de current observation and takes a consequent action.
        """
        # Your code here
        # Eat the fruit
        if env.get_observation()[0] == 1:
            action = ALL_SNAKE_ACTIONS[1]
        elif env.get_observation()[1] == 1:
            action = ALL_SNAKE_ACTIONS[0]
        elif env.get_observation()[2] == 1:
            action = ALL_SNAKE_ACTIONS[2]
        # Don't hit the wall
        elif env.get_observation()[1] == 4:
            if env.get_observation()[2] == 4:
                action = ALL_SNAKE_ACTIONS[1]
            elif env.get_observation()[0] == 4:
                action = ALL_SNAKE_ACTIONS[2]
          #Don't hit itself
            else:
                if env.get_observation()[0] == 3:
                    action = ALL_SNAKE_ACTIONS[2]
                elif env.get_observation()[2] == 3:
                    action = ALL_SNAKE_ACTIONS[1]
                else:
                    action = ALL_SNAKE_ACTIONS[2]
        # Don't hit itself, again
        elif env.get_observation()[1] == 3:
            if env.get_observation()[0] == 3:
                action = ALL_SNAKE_ACTIONS[2]
            elif env.get_observation()[2] == 3:
                action = ALL_SNAKE_ACTIONS[1]
          #Don't hit the wall, again
            else:
                if env.get_observation()[0] == 4:
                    action = ALL_SNAKE_ACTIONS[2]
                elif env.get_observation()[2] == 4:
                    action = ALL_SNAKE_ACTIONS[1]
                else:
                    action = ALL_SNAKE_ACTIONS[2]
        else:
            action = ALL_SNAKE_ACTIONS[0]

        return action

    def end_episode(self):
        pass
          
env = EnvironmentPO(config=inicial, verbose=1)
agent = ReactiveAgent()
play(env, agent, num_episodes= 1, verbose=1)


Playing:
------ Step  0  ------
#######
#.....#
#.....#
#..S..#
#..s..#
#..O..#
#######
Observation: (0, 0, 0)
Head: Point(x=3, y=3)
Direction: Point(x=0, y=-1)
------ Step  1  ------
#######
#.....#
#..S..#
#..s..#
#.....#
#..O..#
#######
Observation: (0, 0, 0)
Head: Point(x=3, y=2)
Direction: Point(x=0, y=-1)
------ Step  2  ------
#######
#..S..#
#..s..#
#.....#
#.....#
#..O..#
#######
Observation: (0, 4, 0)
Head: Point(x=3, y=1)
Direction: Point(x=0, y=-1)
------ Step  3  ------
#######
#..sS.#
#.....#
#.....#
#.....#
#..O..#
#######
Observation: (4, 0, 0)
Head: Point(x=4, y=1)
Direction: Point(x=1, y=0)
------ Step  4  ------
#######
#...sS#
#.....#
#.....#
#.....#
#..O..#
#######
Observation: (4, 4, 0)
Head: Point(x=5, y=1)
Direction: Point(x=1, y=0)
------ Step  5  ------
#######
#....s#
#....S#
#.....#
#.....#
#..O..#
#######
Observation: (4, 0, 0)
Head: Point(x=5, y=2)
Direction: Point(x=0, y=1)
------ Step  6  ------
#######
#.....#
#....s#
#....S#
#.....#
#..O..#
#######
Ob

******* Episode   1 /   1 | Timesteps  127 | Fruits 16

Fruits eaten 16.0 +/- stddev 0.0


In [11]:
print("AGENTE ALEATORIO")
env = EnvironmentPO(config=inicial, verbose=1)
agent = RandomActionAgent()
play(env, agent, num_episodes= 100, verbose=0)



AGENTE ALEATORIO

Playing:
******* Episode   1 / 100 | Timesteps    8 | Fruits  1
******* Episode   2 / 100 | Timesteps    5 | Fruits  0
******* Episode   3 / 100 | Timesteps    3 | Fruits  0
******* Episode   4 / 100 | Timesteps    8 | Fruits  1
******* Episode   5 / 100 | Timesteps    5 | Fruits  1
******* Episode   6 / 100 | Timesteps    5 | Fruits  0
******* Episode   7 / 100 | Timesteps    5 | Fruits  0
******* Episode   8 / 100 | Timesteps    4 | Fruits  0
******* Episode   9 / 100 | Timesteps    6 | Fruits  1
******* Episode  10 / 100 | Timesteps    4 | Fruits  0
******* Episode  11 / 100 | Timesteps    6 | Fruits  0
******* Episode  12 / 100 | Timesteps   11 | Fruits  0
******* Episode  13 / 100 | Timesteps    9 | Fruits  0
******* Episode  14 / 100 | Timesteps    4 | Fruits  0
******* Episode  15 / 100 | Timesteps    4 | Fruits  1
******* Episode  16 / 100 | Timesteps    7 | Fruits  0
******* Episode  17 / 100 | Timesteps    4 | Fruits  0
******* Episode  18 / 100 | Timesteps 

In [12]:
print("AGENTE REACTIVO")
env = EnvironmentPO(config=inicial, verbose=1)
agent = ReactiveAgent()
play(env, agent, num_episodes= 100, verbose=0)

AGENTE REACTIVO

Playing:
******* Episode   1 / 100 | Timesteps 1000 | Fruits  9
******* Episode   2 / 100 | Timesteps   80 | Fruits 15
******* Episode   3 / 100 | Timesteps   87 | Fruits 10
******* Episode   4 / 100 | Timesteps   93 | Fruits 11
******* Episode   5 / 100 | Timesteps 1000 | Fruits  9
******* Episode   6 / 100 | Timesteps   47 | Fruits 14
******* Episode   7 / 100 | Timesteps 1000 | Fruits  3
******* Episode   8 / 100 | Timesteps  135 | Fruits 22
******* Episode   9 / 100 | Timesteps 1000 | Fruits  2
******* Episode  10 / 100 | Timesteps   73 | Fruits 10
******* Episode  11 / 100 | Timesteps   63 | Fruits 10
******* Episode  12 / 100 | Timesteps   75 | Fruits 10
******* Episode  13 / 100 | Timesteps   57 | Fruits  8
******* Episode  14 / 100 | Timesteps   81 | Fruits 12
******* Episode  15 / 100 | Timesteps   68 | Fruits 11
******* Episode  16 / 100 | Timesteps  101 | Fruits 13
******* Episode  17 / 100 | Timesteps  113 | Fruits 16
******* Episode  18 / 100 | Timesteps 1

**Resultados:**

Se evidencia que el agente reactivo es mucho más efectivo que el agente aleatorio. El promedio de frutas que comió el agente reactivo es mucho mayor al del agente aleatorio. 

No hubo ningún caso en que el agente reactivo no hubiera comido al menos una fruta antes de morir, a diferencia del agente aleatorio.
