<a href="https://colab.research.google.com/github/mohamedyosef101/101_learning_area/blob/area/Reinforcement%20Learning/01_land_on_the_moon.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

> *This project is part of the Hugging Face Deep Reinforcement Learning course. Check the [source code](https://colab.research.google.com/github/huggingface/deep-rl-class/blob/master/notebooks/unit1/unit1.ipynb#scrollTo=BE5JWP5rQIKf) if you want to learn more.*

# Step 0. Install & load **the libraries**.

In [None]:
!apt install swig cmake

!pip install stable-baselines3
!pip install swig
!pip install gymnasium[box2d]
!pip install huggingface_sb3

!sudo apt-get update
!apt install python-opengl
!apt install ffmpeg
!apt install xvfb
!pip install pyvirtualdisplay
!pip install IPython --upgrade

In [13]:
# Virtual display
from pyvirtualdisplay import Display

virtual_display = Display(visible=0, size=(1400, 900))
virtual_display.start()

<pyvirtualdisplay.display.Display at 0x7f03befceec0>

### Import the **packages**

In [14]:
import gymnasium as gym
from huggingface_sb3 import load_from_hub, package_to_hub
from huggingface_hub import notebook_login

from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.monitor import Monitor

# Step 1. Work with **Gymnasium**

With Gymnasium:
1. We create our environment using `gymnasium.make()`
2. We reset the environment to initial state with `observation = env.reset()`
<br>

At each step:
3. Get an action using our model (in our example we take a random action)
4. Using `env.step(action)`, we perform this action in the environment and get:
- `observation`: The new state ($s_{t+1}$)
- `reward`: The reward we get after executing the action.
- `terminated`: Indicates if the episode terminated (agent reach the end)
- `truncated`: Introduced with this new version, it indicates a timelimit or if an agent go out of bounds of the environment for instance.
- `info`: A dictionary that provides additional information (depends on the environment).

<br>

If the episode is terminated:
- We reset the environment to its initial state with `observation = env.reset()`

---


💡 A good habit when you start to use an environment is to check its documentation

👉 https://gymnasium.farama.org/environments/box2d/lunar_lander/

---


In [15]:
# 1. Creating the environment
env = gym.make("LunarLander-v2")

# 2. Reset the environment
observation, info = env.reset()

for _ in range(20):
  # Take a random action
  action = env.action_space.sample()
  print(f"Action taken: {action}")

  # do the action and get the parameters
  observation, reward, terminated, truncated, info = env.step(action)

  # The game is ...
  # Terminated when we land or crashed
  # Truncated when the timeout

  if terminated or truncated:
    # reset the environment
    print("Environment is reset")
    obs, info = env.reset()
env.close()

Action taken: 0
Action taken: 2
Action taken: 1
Action taken: 0
Action taken: 0
Action taken: 1
Action taken: 3
Action taken: 3
Action taken: 2
Action taken: 3
Action taken: 3
Action taken: 2
Action taken: 3
Action taken: 3
Action taken: 3
Action taken: 2
Action taken: 2
Action taken: 2
Action taken: 2
Action taken: 2


There are four discrete actions available:

* **0:** do nothing
* **1:** fire left orientation engine
* **2:** fire main engine
* **3:** fire right orientation engine

### Create the **LunarLander** environment

In [16]:
# See how the environment looks like...
env = gym.make("LunarLander-v2")
env.reset()
print(f"""_________OBSERVAVTION SPACE____________
      \nObseravation Space Shape {env.observation_space.shape}
      \nSample observations \n{env.observation_space.sample()}
""")

_________OBSERVAVTION SPACE____________
      
Obseravation Space Shape (8,)
      
Sample observations 
[-0.958858    0.12750159  2.553836   -4.3635683  -2.2697616   1.8023213
  0.83070457  0.12967934]



**The observation is a vector of size 8**, where each value contains different information about the lander:
* Horizontal pad coordinate (x)
* Vertical pad coordinate (y)
* Horizontal speed (x)
* Vertical speed (y)
* Angle
* Angular speed
* If the left leg contact point has touched the land (boolean)
* If the right leg contact point has touched the land (boolean)


In [17]:
print(f"""_________ACTION SPACE____________
      \nAction Space Shape {env.action_space.n}
      \nAction Space Sample {env.action_space.sample()}
""")
# take a random action

_________ACTION SPACE____________
      
Action Space Shape 4
      
Action Space Sample 0



### Vectorized Environment

In [18]:
env = make_vec_env('LunarLander-v2', n_envs=16)

# Step 2. **Create** the model

In [19]:
# create environment
env = gym.make("LunarLander-v2")

# Instantiate the agent
model = PPO('MlpPolicy', env, verbose=0)

# Train the agent
model.learn(total_timesteps=int(2e5))

<stable_baselines3.ppo.ppo.PPO at 0x7f03be332dd0>

In [20]:
# Add some parameters to accelerate the training
model_0 = PPO(
    policy = "MlpPolicy",
    env = env,
    n_steps= 1024,
    batch_size = 64,
    n_epochs = 4,
    gamma = 0.999,
    gae_lambda = 0.98,
    ent_coef = 0.01,
    verbose=0,
)

In [21]:
# train the model for 1.5M timesteps
model_0.learn(total_timesteps=1500000)

<stable_baselines3.ppo.ppo.PPO at 0x7f03be330340>

In [22]:
# Evaluate the model
eval_env = Monitor(gym.make("LunarLander-v2"))
mean_reward, std_reward = evaluate_policy(model_0, eval_env,
                             n_eval_episodes=10,
                             deterministic=True)

print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

mean_reward=217.47 +/- 79.07264980161143


# Step 3. **Save** & **publish** the model.

In [23]:
# Save the model
model_name = "mohamedyosef101/ppo-LunarLander-v2"
model.save(model_name)



In [24]:
# authorize the token
notebook_login()
!git config --global credential.helper store

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [26]:
# Publish the model

repo_id = "MohamedYosef101/PPO-LunarLander-v2"
env_id = "LunarLander-v2"
model_architecture = "PPO"

commit_message = "Upload PPO LunarLander-v2 agent"

# Evaluation Environment
from stable_baselines3.common.vec_env import DummyVecEnv
eval_env = DummyVecEnv([lambda: Monitor(gym.make(
    env_id,
    render_mode="rgb_array"
))])


package_to_hub(
    model=model,
    model_name=model_name,
    model_architecture=model_architecture,
    env_id=env_id,
    eval_env=eval_env,
    repo_id=repo_id,
    commit_message=commit_message,
)

[38;5;4mℹ This function will save, evaluate, generate a video of your agent,
create a model card and push everything to the hub. It might take up to 1min.
This is a work in progress: if you encounter a bug, please open an issue.[0m


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Saving video to /tmp/tmp60z4jsk_/-step-0-to-step-1000.mp4
Moviepy - Building video /tmp/tmp60z4jsk_/-step-0-to-step-1000.mp4.
Moviepy - Writing video /tmp/tmp60z4jsk_/-step-0-to-step-1000.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmp60z4jsk_/-step-0-to-step-1000.mp4
[38;5;4mℹ Pushing repo MohamedYosef101/PPO-LunarLander-v2 to the Hugging Face
Hub[0m


policy.pth:   0%|          | 0.00/43.8k [00:00<?, ?B/s]

policy.optimizer.pth:   0%|          | 0.00/88.4k [00:00<?, ?B/s]

Upload 4 LFS files:   0%|          | 0/4 [00:00<?, ?it/s]

pytorch_variables.pth:   0%|          | 0.00/864 [00:00<?, ?B/s]

ppo-LunarLander-v2.zip:   0%|          | 0.00/150k [00:00<?, ?B/s]

[38;5;4mℹ Your model is pushed to the Hub. You can view your model here:
https://huggingface.co/MohamedYosef101/PPO-LunarLander-v2/tree/main/[0m


CommitInfo(commit_url='https://huggingface.co/MohamedYosef101/PPO-LunarLander-v2/commit/078e8546fb61c70c65fae423ae2e26f457c48aa4', commit_message='Upload PPO LunarLander-v2 agent', commit_description='', oid='078e8546fb61c70c65fae423ae2e26f457c48aa4', pr_url=None, pr_revision=None, pr_num=None)

# تم بحمد الله