<a href="https://colab.research.google.com/github/luizguilhermedev/myportfolio/blob/main/lunar_lander.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Deep reinforcement Learning agent 🤖 🧠

A lunar lander that will learn to land correctly on the Moon 🌝

### Environment

* We're going to use `LunarLander-v2`
* `Stable-Baselines3`for the library


### Dependencies

* `gymnasium[box2d]`: Contains the LunarLander-v2 environment 🌛

* `stable-baselines3[extra]`: The deep reinforcement learning library.

* `huggingface_sb3`: Additional code for Stable-baselines3 to load and upload models from the Hugging Face 🤗 Hub.


## Problem

**Being able to land the Lunar Lander to the Landing Pad correctly by controlling left, right and main orientation engine.**


## Evaluation

Obtain 200+ with our Agent


In [None]:
!apt install swig cmake

In [None]:
!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit1/requirements-unit1.txt

In [None]:
!sudo apt-get update
!sudo apt-get install -y python3-opengl
!apt install ffmpeg
!apt install xvfb
!pip3 install pyvirtualdisplay

In [None]:
# This cell will force the runtime to crash, so we'll need to connect again and run the code starting from here

import os
os.kill(os.getpid(),9)

In [None]:
# Virtual display

from pyvirtualdisplay import Display

virtual_display = Display(visible=0, size=(1400, 900))
virtual_display.start()

In [None]:
import gymnasium

from huggingface_sb3 import load_from_hub, package_to_hub
from huggingface_hub import notebook_login # To log to our Hugging Face account to be able to upload models to the Hub.

from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.monitor import Monitor

## Gymnasium 🤖 🏋

This library contains our environment

Gymnasium library provides two things:

* An interface that allows us to create **RL environments**
* A collection of environments(such as gym-control, atari, box2D...)

Steps:

* Agent receives state (S0) from the environment
* Based on that stat (S0), the Agent taken an action (A0)
* Environment transitions to new state (S1)
* Environment gives some reward(R1)

Gymnasium:

* Create our environment using `gymnasium.make()`
* Reset the environment to its initial state with `observation = env.reset()`
* Get an action using our model
* `env.step(action)`

In [None]:
import gymnasium as gym

# Creating environment called LunarLander-v2

env = gym.make("LunarLander-v2")

# Reset this environment

observation, info = env.reset()

for _ in range(20):
  # Take a random action
  action = env.action_space.sample()
  print("Action taken:", action)

  # Do this action in the environment and get next_state, reward, terminated, truncated and info
  observation, reward, terminated, truncated, info = env.step(action)

  # if the game is terminated (landed, crashed) or truncated (timeout)

  if terminated or truncated:
    # reset the environment
    print("Environment is reset")
    observation, info = env.reset()

env.close()


Let's take a quick look what our Environment looks like:

In [None]:
env = gym.make("LunarLander-v2")
env.reset()
print("_____OBSERVATION SPACE_____ \n")
print("Observation Space Shape", env.observation_space.shape)
print("Sample observation", env.observation_space.sample()) # Get a random observation

In [None]:
print("\n _____ACTION SPACE_____ \n")
print("Action Space Shape", env.action_space.n)
print("Action Space Sample", env.action_space.sample()) # Take a random action

## Vectorized Environment

Stacking multiple independent environments into a single environment

In [None]:
env = make_vec_env('LunarLander-v2', n_envs=16)

## Creating our Model 🤖


* Deep RL library: Stable Baselines3(SB3)
* SB3 is  set of reliable implementations of reinforcement learning algotithms in PyTorch.

SB3 basic steps:

* 1. Create an environment
* 2. Define the model we want to use `model = PPO("MlpPolicy)`
* 3. Train the agent with `model.learn`




In [None]:
# Create environment

env = gym.make("LunarLander-v2")

# Instantiate the agent

model = PPO('MlpPolicy', env, verbose=1)

# Train the agent

# model.learn(total_timesteps=int(2e5))

In [None]:
# Fine-tuning to accelerate the trainig

model = PPO(
    policy = 'MlpPolicy',
    env = env,
    n_steps = 1024,
    batch_size = 64,
    n_epochs = 4,
    gamma = 0.999,
    gae_lambda = 0.98,
    ent_coef = 0.01,
    verbose=1)

In [None]:
# Let's train it for 2,000,000 timesteps
model.learn(total_timesteps=200000, progress_bar=True)

# and let's save the model

model_name = "ppo-LunarLander-v2"
model.save(model_name)

### Evaluating our Agent 🤖 🦾

* Wrap the environment in a Monitor
* SB3 provides a method to do that: `evaluate_policy`


In [None]:
# Evaluate the agent
# NOTE: If you use wrappers with your environment that modify rewards,
#       this will be reflected here. To evaluate with original rewards,
#       wrap environment in a "Monitor" wrapper before other wrappers.

eval_env = Monitor(gym.make("LunarLander-v2"))
mean_reward, std_reward = evaluate_policy(
    model, eval_env, n_eval_episodes=10, deterministic=True)

print(f'Mean Reward = {mean_reward:.2f} +/- {std_reward}')

## WELL DONE!

Our agent has trained 1500000 times and got 227 points

**Note:** It's possible to improve our model even more, trying different hyperparameters

In [None]:
notebook_login()
!git config --global credential.helper store

In [None]:
import gymnasium as gym

from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.common.env_util import make_vec_env

from huggingface_sb3 import package_to_hub

# PLACE the variables you've just defined two cells above
# Define the name of the environment
env_id = "LunarLander-v2"

# TODO: Define the model architecture we used
model_architecture = "PPO"

## Define a repo_id
## repo_id is the id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v2
## CHANGE WITH YOUR REPO ID
repo_id = "luizguilherme/ppo-LunarLander-v2"

## Define the commit message
commit_message = "Upload PPO LunarLander-v2 trained agent"

# Create the evaluation env and set the render_mode="rgb_array"
eval_env = DummyVecEnv([lambda: gym.make(env_id, render_mode="rgb_array")])

# PLACE the package_to_hub function you've just filled here
package_to_hub(model=model, # Our trained model
               model_name=model_name, # The name of our trained model
               model_architecture=model_architecture, # The model architecture we used: in our case PPO
               env_id=env_id, # Name of the environment
               eval_env=eval_env, # Evaluation Environment
               repo_id=repo_id, # id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v2
               commit_message=commit_message)


In [None]:
!pip install shimmy

In [None]:
from huggingface_sb3 import load_from_hub
repo_id = "luizguilherme/ppo-LunarLander-v2" # The repo_id
filename = "ppo-LunarLander-v2.zip" # The model filename.zip

# When the model was trained on Python 3.8 the pickle protocol is 5
# But Python 3.6, 3.7 use protocol 4
# In order to get compatibility we need to:
# 1. Install pickle5 (we done it at the beginning of the colab)
# 2. Create a custom empty object we pass as parameter to PPO.load()
custom_objects = {
            "learning_rate": 0.0,
            "lr_schedule": lambda _: 0.0,
            "clip_range": lambda _: 0.0,
}

checkpoint = load_from_hub(repo_id, filename)
model = PPO.load(checkpoint, custom_objects=custom_objects, print_system_info=True)

In [None]:
eval_env = Monitor(gym.make("LunarLander-v2"))
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")