# Lunar Lander with Reinforcement Learning

This notebook demonstrates how to train an agent to land a lunar module on the moon's surface using reinforcement learning. We'll use the Proximal Policy Optimization (PPO) algorithm from the Stable-Baselines3 library.

## What is Reinforcement Learning?

Reinforcement Learning (RL) is a type of machine learning where an agent learns to make decisions by taking actions in an environment to maximize a reward. The agent learns through trial and error, receiving feedback in the form of rewards or penalties.

## The Lunar Lander Environment

In this environment, the agent controls a lunar lander with four actions:
- Do nothing
- Fire left engine
- Fire main engine
- Fire right engine

The goal is to land safely between the flags without crashing. The agent receives:
- Positive rewards for moving toward the landing pad and for a safe landing
- Negative rewards for crashing or using fuel

Let's get started!

## Step 1: Install Required Dependencies

First, we need to install the necessary libraries. Uncomment and run the cell below if this is your first time running the notebook.

In [None]:
### Uncomment and run the first time to install the dependencies...
#!pip3 install gymnasium stable_baselines3[extra] box2d ipywidgets ffmpeg imageio

## Step 2: Basic Training Setup

In this cell, we:
1. Import the necessary libraries
2. Create the Lunar Lander environment
3. Set up a PPO (Proximal Policy Optimization) agent
4. Train the agent for 100,000 timesteps
5. Save the trained model

PPO is a popular RL algorithm that balances exploration and exploitation while being relatively sample-efficient.

In [None]:
import gymnasium as gym
from stable_baselines3 import PPO
from stable_baselines3.common.evaluation import evaluate_policy

# Create environment
env = gym.make("LunarLander-v3", render_mode="rgb_array")

# Instantiate the agent
model = PPO(
    'MlpPolicy',
    env,
    verbose=1,
    ##### YOUR HYPERPARAMETERS HERE!!!!
    learning_rate=0.001,
    batch_size=32,
    )

# Train the agent and display a progress bar
model.learn(total_timesteps=int(100000), progress_bar=True)

# Save the agent
model.save("ppo_lunar")
#del model  # delete trained model to demonstrate loading

## Step 3: Advanced Training with Custom Progress Tracking

This cell provides a more sophisticated training approach with:

1. A custom progress bar callback to track training progress
2. TensorBoard logging for visualizing training metrics
3. Evaluation of the trained agent's performance

The custom callback allows us to see both the progress bar and the training metrics simultaneously, giving us better insight into how the training is progressing.

In [None]:
import gymnasium as gym
from stable_baselines3 import PPO
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.callbacks import BaseCallback
from tqdm.notebook import tqdm

class CustomProgressBarCallback(BaseCallback):
    """
    Custom callback that combines progress bar with training metrics.
    """
    def __init__(self, total_timesteps):
        super().__init__()
        self.pbar = None
        self.total_timesteps = total_timesteps
        self.n_calls = 0

    def _on_training_start(self):
        self.pbar = tqdm(total=self.total_timesteps)

    def _on_step(self):
        n_steps = self.locals.get('n_steps', 1)
        self.pbar.update(n_steps)
        return True

    def _on_training_end(self):
        self.pbar.close()
        self.pbar = None

# Create environment
env = gym.make("LunarLander-v3", render_mode="rgb_array")

# Instantiate the agent
model = PPO(
    'MlpPolicy',
    env,
    verbose=1,  # Keep verbose=1 to see the training metrics
    learning_rate=0.001,
    batch_size=32,
    tensorboard_log="./lunar_lander_tensorboard/"
)

# Total timesteps for training
total_timesteps = 10000

# Create and use the custom callback
callback = CustomProgressBarCallback(total_timesteps)

# Train the agent
model.learn(
    total_timesteps=total_timesteps,
    callback=callback,
    progress_bar=False  # Disable default progress bar to use our custom one
)

# Save the agent
model.save("ppo_lunar")

# Evaluate the trained agent
mean_reward, std_reward = evaluate_policy(model, model.get_env(), n_eval_episodes=10)
print('\nFinal Evaluation:')
print(f'Mean reward: {mean_reward:.2f} +/- {std_reward:.2f}')

## Step 4: Visualizing the Trained Agent

Now that we have a trained agent, let's see how it performs! This cell:

1. Loads the trained model
2. Runs the agent in the environment for 5 episodes
3. Captures frames from each episode
4. Saves the episodes as GIF animations

This visualization helps us understand how well our agent has learned to land the lunar module.

In [None]:
import imageio
import numpy as np
import os

# Create images directory if it doesn't exist
if not os.path.exists("images"):
    os.makedirs("images")

# Load the trained agent
env = gym.make("LunarLander-v3", render_mode="rgb_array")
model = PPO.load("ppo_lunar", env=env)

# Evaluate the agent
mean_reward, std_reward = evaluate_policy(model, model.get_env(), n_eval_episodes=10)
print('Mean reward:', mean_reward, 'Std. reward:', std_reward)

# Test the trained agent and save visualization
images = []
episodes = 0
obs, _ = env.reset()  # Updated reset call syntax

while episodes < 5:  # Limit to 5 episodes for reasonable file sizes
    img = env.render()
    images.append(img)
    
    action, _states = model.predict(obs, deterministic=True)
    obs, reward, terminated, truncated, info = env.step(action)  # Updated step call syntax
    
    if terminated or truncated:
        episodes += 1
        print(f'Episode {episodes} finished with reward {reward}')
        
        # Save episode as GIF
        if len(images) > 0:
            print(f'Saving episode {episodes} animation...')
            imageio.mimsave(
                f'images/lunar_lander_episode_{episodes}.gif',
                images,
                fps=30
            )
        
        # Reset for next episode
        images = []
        obs, _ = env.reset()

env.close()
print("Done! Check the 'images' directory for the animation files.")

## Step 5: Displaying the Results

Finally, let's display the GIF animations we created to see our agent in action. This cell:

1. Checks if the images directory exists
2. Finds all the GIF files we created
3. Displays each animation in the notebook

This gives us a visual representation of how our agent performs after training.

In [None]:
#Load and show GIF animations from the images directory

from IPython.display import Image, display
import os
import glob

# Check if images directory exists and contains GIF files
if not os.path.exists('images'):
    print('Error: images directory not found. Please run the previous cell to generate the GIF files.')
else:
    gif_files = glob.glob('images/lunar_lander_episode_*.gif')
    if not gif_files:
        print('Error: No GIF files found in the images directory. Please run the previous cell to generate the GIF files.')
    else:
        print(f'Found {len(gif_files)} GIF files in the images directory.')
        
        # Display last GIF
        for i, gif_file in enumerate(sorted(gif_files), 1):
            print(f'\nEpisode {i}:')
            display(Image(filename=gif_file))

## Conclusion

Congratulations! You've successfully:

1. Set up a reinforcement learning environment (Lunar Lander)
2. Trained an agent using the PPO algorithm
3. Evaluated the agent's performance
4. Visualized the agent's behavior

### Key Concepts Learned:

- **States**: The position, velocity, and orientation of the lunar lander
- **Actions**: The four possible controls (do nothing, fire left/right/main engines)
- **Rewards**: Feedback based on landing position, fuel usage, and crash/success
- **Policy**: The strategy the agent learns to maximize rewards

### Next Steps:

- Try modifying the hyperparameters to see if you can improve performance
- Experiment with different RL algorithms (like A2C, DQN, or SAC)
- Increase the training time to see if the agent can achieve better results
- Try more complex environments from the Gymnasium library