# Basic Grid World Reinforcement Learning

Welcome to the "Basic Grid World Reinforcement Learning" project notebook! In this project, we'll explore a basic grid world environment and implement reinforcement learning techniques to train an agent to navigate this world.

## Project Overview

- **Objective**: Train an agent to learn optimal strategies for moving through a grid world, avoiding obstacles, and reaching predefined goals.

- **Techniques**: We'll use Q-learning, a popular reinforcement learning algorithm, to achieve our goal.

- **Implementation**: Our project is organized into classes, making it easy to understand and modify different components.

## Notebook Sections

This notebook is divided into several sections:

1. **Environment Setup**: We'll define the grid world environment and the agent's actions.
2. **Agent Training**: We'll implement the Q-learning algorithm and train the agent.
3. **Visualization**: We'll visualize the agent's behavior and training progress.
4. **Customization**: You can customize the grid layout and experiment with different configurations.
5. **Conclusion**: We'll summarize the project's key takeaways and potential future improvements.

## Getting Started

Before we begin, make sure you have the required dependencies installed.

Let's get started with the environment setup and training the agent!



In [69]:
import numpy as np
from random import randint, seed

class EnvGrid(object):
    """
    docstring for EnvGrid.
    """
    def __init__(self):
        super(EnvGrid, self).__init__()

        self.grid = [
            [0, 0, 1, 0, -1],
            [0, -1, 0, 0, 0],
            [0, 0, 0, -1, 0],
            [0, -1, 0, 0, 0],
            [1, 0, 0, -1, 0]
        ]

        # Starting position/ Here we start from the bottom left of the grid
        self.y = 4
        self.x = 0
        # Define goals and obstacles, we can also use create_random_grid bellow to initialize our goals position randomly
        self.goals = [(0, 2), (4, 0), (4, 4)]
        self.obstacles = [(0, 4), (1, 1), (2, 3), (3, 1), (4, 3)]
        # Dimensions of the grid
        self.num_rows = len(self.grid)
        self.num_cols = len(self.grid[0])
        # Possible actions
        self.actions = [
            [-1, 0],  # Up
            [1, 0],   # Down
            [0, -1],  # Left
            [0, 1]    # Right
        ]

    def reset(self):
        """
        Reset the agent's position and return the initial state.
        """
        self.y = 4
        self.x = 0
        return self.get_state()

    def get_state(self):
        """
        Convert the current position to a state representation.
        """
        return self.y * self.num_cols + self.x

    def step(self, action):
        """
        Take an action in the environment and return the next state and reward.
        """
        self.y = max(0, min(self.y + self.actions[action][0], 4))
        self.x = max(0, min(self.x + self.actions[action][1], 4))

        state = self.get_state()

        reward = 0
        if (self.y, self.x) in self.goals:
            reward = 1
        elif (self.y, self.x) in self.obstacles:
            reward = -1

        return state, reward

    def create_random_grid(self, num_goals=2):
      """
      Create a random grid with a specified number of goals.
      """
      grid = [[0] * 5 for _ in range(5)]
      obstacles = [(0, 4), (1, 1), (2, 3), (3, 1), (4, 3)]

      self.goals = random.sample([(i, j) for i in range(5) for j in range(5)], num_goals)

      for g in self.goals:
          grid[g[0]][g[1]] = 1

      for o in obstacles:
          grid[o[0]][o[1]] = -1
      self.grid= grid


    def show(self):
        """
        Display the current state of the grid.
        """
        print("---------------------")
        for y in range(self.num_rows):
            for x in range(self.num_cols):
                print("%s\t" % ( "X" if y == self.y and x == self.x else "G" if (y, x) in self.goals else self.grid[y][x]), end="")
            print("")

    def is_finished(self):
        """
        Check if the agent has reached a goal state.
        """
        return (self.y, self.x) in self.goals

def take_action(st, Q, eps):
    """
    Choose an action using an epsilon-greedy strategy.
    """
    if random.uniform(0, 1) < eps:
        action = randint(0, 3)
    else:  # Or greedy action
        action = np.argmax(Q[st])
    return action



if __name__ == '__main__':
    # Uncomment the following line if you want to use a fixed random seed for reproducibility
    #seed(0)
    env = EnvGrid()
    env.create_random_grid()

    st = env.reset()

    num_states = env.num_rows * env.num_cols
    Q = np.zeros((num_states, 4))

    learning_rate = 0.1
    discount_factor = 0.9
    num_episodes = 10000

    for episode in range(num_episodes):
        st = env.reset()
        done = False
        total_reward = 0
        iteration = 0  # Initialize the iteration counter

        while not done:
            at = take_action(st, Q, 0.4)
            stp1, r = env.step(at)
            atp1 = take_action(stp1, Q, 0.0)
            Q[st][at] = Q[st][at] + learning_rate * (r + discount_factor * Q[stp1][atp1] - Q[st][at])
            st = stp1

            total_reward += r

            done = env.is_finished()
    st = env.reset()

    # Render the agent's behavior on the random grid
    while not env.is_finished():
        env.show()
        #print(env.y, env.x)
        #print(env.is_finished())
        at = take_action(st, Q, 0.1)  #10% Exploration
        #print(at)
        stp1, r = env.step(at)
        st = stp1
    env.show()


---------------------
0	0	0	0	-1	
0	-1	G	0	0	
0	0	0	-1	G	
0	-1	0	0	0	
X	0	0	-1	0	
---------------------
0	0	0	0	-1	
0	-1	G	0	0	
0	0	0	-1	G	
X	-1	0	0	0	
0	0	0	-1	0	
---------------------
0	0	0	0	-1	
0	-1	G	0	0	
X	0	0	-1	G	
0	-1	0	0	0	
0	0	0	-1	0	
---------------------
0	0	0	0	-1	
0	-1	G	0	0	
0	X	0	-1	G	
0	-1	0	0	0	
0	0	0	-1	0	
---------------------
0	0	0	0	-1	
0	-1	G	0	0	
0	0	X	-1	G	
0	-1	0	0	0	
0	0	0	-1	0	
---------------------
0	0	0	0	-1	
0	-1	X	0	0	
0	0	0	-1	G	
0	-1	0	0	0	
0	0	0	-1	0	
