
# LunarLander — REINFORCE (Demo Notebook)

This notebook **calls the functions from your code** in
`src/reinforcement_learning/lunar_lander/` to train a simple REINFORCE agent
and **show the results** (reward curve, quick stats).

> Expected repo layout:
> `src/reinforcement_learning/lunar_lander/{train.py, eval.py, models.py, reinforce.py, utils.py}`  
> Figures are saved under `docs/images/` by `train.py`.


In [None]:

from pathlib import Path
import sys

# Try to add the repo root (parent of 'src') to sys.path
cwd = Path.cwd()
root = cwd
# If notebook lives in repo/notebooks/, go one level up
if (cwd / "src").exists() is False and (cwd.name == "notebooks"):
    root = cwd.parent

if not (root / "src").exists():
    # walk up until we find 'src' or give up
    for p in cwd.parents:
        if (p / "src").exists():
            root = p
            break

sys.path.insert(0, str(root))
print(f"Repo root resolved to: {root}")
print("sys.path[0] ->", sys.path[0])

# Optional: verify the expected package exists
expected = root / "src" / "reinforcement_learning" / "lunar_lander"
print("lunar_lander path exists:", expected.exists())


In [None]:

# Import the training and evaluation entry points
try:
    from src.reinforcement_learning.lunar_lander.train import train
    from src.reinforcement_learning.lunar_lander.eval import play
    print("Imported train/play from src.reinforcement_learning.lunar_lander")
except Exception as e:
    print("Failed to import train/play. Please check that your files exist:")
    print("src/reinforcement_learning/lunar_lander/train.py and eval.py")
    raise


## Train the agent

In [None]:

# WARNING: training for many episodes can take time.
# Start small (e.g., 200–300 episodes) just to produce a visible reward curve.
returns = train(episodes=300, gamma=0.99, lr=3e-4, seed=42)
len(returns), returns[-5:]


## Plot training rewards

In [None]:

import matplotlib.pyplot as plt
import numpy as np
from pathlib import Path

plt.figure()
plt.plot(np.arange(len(returns)), returns, linewidth=1.0)
plt.xlabel("Episode"); plt.ylabel("Return")
plt.title("LunarLander — REINFORCE (training run)")
plt.tight_layout()
plt.show()


## Display saved figure (if generated by `train.py`)

In [None]:

img = Path(root) / "docs" / "images" / "lunarlander_rewards.png"
if img.exists():
    from IPython.display import Image, display
    display(Image(filename=str(img)))
else:
    print("No saved plot found at", img)



## Evaluate the policy (headless)

This uses your `eval.py`. If `eval.py` is set to use `render_mode="human"`,
it will try to open a window (which notebooks can't show).  
For a headless preview inside a notebook, you would need a version of `eval`
that creates the environment with `render_mode="rgb_array"` and returns frames.


In [None]:

# Run evaluation for a few episodes. If your eval() uses a GUI window,
# consider changing it to 'rgb_array' for notebook previews.
try:
    play(episodes=3)
    print("Evaluation run completed.")
except Exception as e:
    print("Evaluation failed (likely due to render mode). Error:")
    print(e)
