# Tutorial: Reinforcement Learning with Python

## Introduction

Reinforcement Learning (RL) is a type of machine learning where an agent learns to make decisions by interacting with an environment. The agent receives rewards or penalties for its actions, and the goal is to learn a policy that maximizes the cumulative reward over time. In this tutorial, we'll implement a basic RL algorithm called Q-Learning using Python.

## Prerequisites

1. Python (3.6 or higher)
2. Basic understanding of Python programming
3. Familiarity with numpy and matplotlib is helpful but not required.

## Step 1: Setting up the Environment

For this tutorial, we'll use a simple grid-world environment. Each cell in the grid represents a state, and the agent can take actions to move around the grid.


In [3]:
# Define the environment
class GridWorld:
    def __init__(self):
        self.grid_size = (6, 6)
        self.start_state = (0, 0)
        self.goal_state = (5, 5)
        self.current_state = self.start_state
        self.actions = ['up', 'down', 'left', 'right']

    def reset(self, start_state = (0,0)):
        self.current_state = start_state
        return self.current_state

    def step(self, action):
        if action == 'up' and self.current_state[0] > 0:
            self.current_state = (self.current_state[0] - 1, self.current_state[1])
        elif action == 'down' and self.current_state[0] < self.grid_size[0] - 1:
            self.current_state = (self.current_state[0] + 1, self.current_state[1])
        elif action == 'left' and self.current_state[1] > 0:
            self.current_state = (self.current_state[0], self.current_state[1] - 1)
        elif action == 'right' and self.current_state[1] < self.grid_size[1] - 1:
            self.current_state = (self.current_state[0], self.current_state[1] + 1)

        if self.current_state == self.goal_state:
            return self.current_state, 1.0, True
        else:
            return self.current_state, -0.1, False


## Step 2: Q-Learning Algorithm

Next, let's implement the Q-Learning algorithm.


In [4]:
import random
import numpy as np

class QLearningAgent:
    def __init__(self, env, learning_rate=0.1, discount_factor=0.9, exploration_prob=0.1):
        self.env = env
        self.learning_rate = learning_rate
        self.discount_factor = discount_factor
        self.exploration_prob = exploration_prob
        self.q_table = np.zeros((len(env.actions), env.grid_size[0], env.grid_size[1]))

    def select_action(self, state):
        if random.uniform(0, 1) < self.exploration_prob:
            return random.choice(self.env.actions)
        else:
            action_values = self.q_table[:, state[0], state[1]]
            return self.env.actions[np.argmax(action_values)]

    def update_q_table(self, state, action, reward, next_state):
        action_index = self.env.actions.index(action)
        current_value = self.q_table[action_index, state[0], state[1]]
        next_max_value = np.max(self.q_table[:, next_state[0], next_state[1]])
        updated_value = (1 - self.learning_rate) * current_value + self.learning_rate * (reward + self.discount_factor * next_max_value)
        self.q_table[action_index, state[0], state[1]] = updated_value

    def train(self, num_episodes):
        for episode in range(num_episodes):
            state = self.env.reset()
            done = False
            while not done:
                action = self.select_action(state)
                next_state, reward, done = self.env.step(action)
                self.update_q_table(state, action, reward, next_state)
                state = next_state

## Step 3: Training the Agent

In [5]:
# Create the environment and agent
env = GridWorld()
agent = QLearningAgent(env)

# Train the agent
num_episodes = 1000
agent.train(num_episodes)

## Step 4: Evaluating the Agent

You can now evaluate the agent's performance using the learned Q-values.

In [19]:
def evaluate_agent(agent, num_episodes):
    total_rewards = 0
    for _ in range(num_episodes):
        state = env.reset()
        done = False
        while not done:
            action = agent.select_action(state)
            next_state, reward, done = env.step(action)
            total_rewards += reward
            state = next_state
    return total_rewards / num_episodes

# Evaluate the agent
num_eval_episodes = 100
avg_reward = evaluate_agent(agent, num_eval_episodes)
print(f'Average Reward: {avg_reward}')


Average Reward: 0.0019999999999999606


In [25]:
def print_grid_world(env):
    print("")
    for i in range(env.grid_size[0]):
        for j in range(env.grid_size[1]):
            if (i, j) == env.current_state:
                print("A", end=" ")  # Agent's position
            elif (i, j) == env.goal_state:
                print("G", end=" ")  # Goal position
            else:
                print(".", end=" ")  # Empty cell
        print()

        
def play_trained_agent(agent, env):
    start_state = (0,0)
    state = env.reset(start_state)
    done = False
    
    while not done:
        print_grid_world(env)  # Print the grid world
        action = agent.select_action(state)
        next_state, _, done = env.step(action)
        state = next_state
        
        
# Usage:
play_trained_agent(agent, env)



A . . . . . 
. . . . . . 
. . . . . . 
. . . . . . 
. . . . . . 
. . . . . G 

. A . . . . 
. . . . . . 
. . . . . . 
. . . . . . 
. . . . . . 
. . . . . G 

. . A . . . 
. . . . . . 
. . . . . . 
. . . . . . 
. . . . . . 
. . . . . G 

. . . . . . 
. . A . . . 
. . . . . . 
. . . . . . 
. . . . . . 
. . . . . G 

. . . . . . 
. . . A . . 
. . . . . . 
. . . . . . 
. . . . . . 
. . . . . G 

. . . . . . 
. . . . . . 
. . . A . . 
. . . . . . 
. . . . . . 
. . . . . G 

. . . . . . 
. . . . . . 
. . . . . . 
. . . A . . 
. . . . . . 
. . . . . G 

. . . . . . 
. . . . . . 
. . . . . . 
. . . . A . 
. . . . . . 
. . . . . G 

. . . . . . 
. . . . . . 
. . . . . . 
. . . . . . 
. . . . A . 
. . . . . G 

. . . . . . 
. . . . . . 
. . . . . . 
. . . . . . 
. . . . . . 
. . . . A G 


## Conclusion

Congratulations! You've now implemented a basic Q-Learning agent in Python. This is just a starting point, and there are many extensions and improvements you can make to this algorithm. Further exploration could involve more complex environments, different RL algorithms, or applying RL to real-world problems. Happy coding!