# Reinforcement Learning with LionAGI

This notebook demonstrates how to use the reinforcement learning functionality in lionagi with a DQN agent learning in a GridWorld environment.

In [None]:
import sys

sys.path.append("../..")

import numpy as np
import matplotlib.pyplot as plt
from lionagi import Branch
from lionagi.operations.rl.implementations.dqn import DQNAgent
from lionagi.operations.rl.implementations.gridworld import GridWorldEnv

## Create Environment

First, let's create a GridWorld environment instance:

In [None]:
env = GridWorldEnv(
    size=(8, 8), max_steps=100, random_obstacles=True, n_obstacles=10
)

# Let's look at the initial grid
state = await env.reset()
env.render()

## Create Agent

Now we'll create a DQN agent to learn in this environment:

In [None]:
agent = DQNAgent(
    name="DQNGridWorld",
    state_space=env.state_space,
    action_space=env.action_space,
    hidden_dim=128,
    learning_rate=0.001,
    memory_capacity=10000,
    batch_size=32,
)

## Create Branch and Train

Now we'll create a Branch to manage the training process:

In [None]:
branch = Branch(name="RLTraining")

# Train the agent
metrics = await branch.operate_rl(
    agent=agent,
    environment=env,
    max_episodes=1000,
    max_steps_per_episode=100,
    target_reward=0.95,  # Stop when average reward exceeds this
    log_interval=10,
    track_metrics=True,
    verbose=True,
)

## Plot Training Progress

Let's visualize how the agent learned over time:

In [None]:
plt.figure(figsize=(12, 4))

# Plot episode rewards
plt.subplot(1, 2, 1)
plt.plot(metrics["episode_rewards"])
plt.title("Episode Rewards")
plt.xlabel("Episode")
plt.ylabel("Total Reward")

# Plot moving average
plt.subplot(1, 2, 2)
plt.plot(metrics["avg_rewards"])
plt.title("Average Reward (last 100 episodes)")
plt.xlabel("Episode")
plt.ylabel("Average Reward")

plt.tight_layout()
plt.show()

## Test Trained Agent

Let's watch how the trained agent performs:

In [None]:
# Run a test episode
state = await env.reset()
done = False
total_reward = 0

while not done:
    env.render()
    action = await agent.act(state, training=False)
    state, reward, done, info = await env.step(action)
    total_reward += reward

print(f"\nTest episode completed with total reward: {total_reward:.2f}")

## Save Trained Agent

Finally, let's save the trained agent for later use:

In [None]:
agent.save("dqn_gridworld.pt")
print("Agent saved successfully!")