# Reinforcement Learning Demo: Lunar Lander

## 🚀 Welcome to Reinforcement Learning!

In this interactive demo, we'll teach an AI agent to land a lunar module safely on the moon using **Reinforcement Learning (RL)**.

### What is Reinforcement Learning?
- An agent learns through **trial and error**
- Gets **rewards** for good actions, **penalties** for bad ones
- Gradually improves its **policy** (decision-making strategy)

### The Lunar Lander Problem
- **Goal**: Land the spacecraft safely between the flags
- **Actions**: Do nothing, fire left engine, fire main engine, fire right engine
- **Rewards**: +100 for landing, -100 for crashing, fuel penalties

Let's dive in! 🌙

## Step 1: Setup and Environment

In [None]:
# Install required packages
!pip install gymnasium[box2d] numpy matplotlib seaborn pandas tqdm

import gymnasium as gym
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from collections import defaultdict, deque
import random
from tqdm import tqdm
import pandas as pd

# Set style for better plots
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("✅ All packages installed successfully!")

## Step 2: Understanding the Environment

In [None]:
# Create the Lunar Lander environment
env = gym.make('LunarLander-v2', render_mode='rgb_array')

print("🌙 Lunar Lander Environment Created!")
print(f"📊 Observation Space: {env.observation_space}")
print(f"🎮 Action Space: {env.action_space}")
print(f"🎯 Number of Actions: {env.action_space.n}")

# Action meanings
actions = {
    0: "Do nothing",
    1: "Fire left orientation engine", 
    2: "Fire main engine",
    3: "Fire right orientation engine"
}

print("\n🚀 Available Actions:")
for action, description in actions.items():
    print(f"  {action}: {description}")

## Step 3: Random Agent (Before Learning)

In [None]:
def test_random_agent(episodes=5):
    """Test how a random agent performs"""
    scores = []
    
    for episode in range(episodes):
        state, _ = env.reset()
        total_reward = 0
        steps = 0
        
        while True:
            # Random action
            action = env.action_space.sample()
            state, reward, terminated, truncated, _ = env.step(action)
            total_reward += reward
            steps += 1
            
            if terminated or truncated:
                break
        
        scores.append(total_reward)
        print(f"Episode {episode + 1}: Score = {total_reward:.2f}, Steps = {steps}")
    
    avg_score = np.mean(scores)
    print(f"\n🎲 Random Agent Average Score: {avg_score:.2f}")
    return scores

print("Testing Random Agent (No Learning):")
random_scores = test_random_agent()