# Lunar Lander

## Install Packages

In [1]:
%pip install tensorflow stable-baselines3 gymnasium 'gymnasium[box2d]'

Note: you may need to restart the kernel to use updated packages.


## Import Dependencies

In [2]:
import gymnasium as gym
from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.evaluation import evaluate_policy

2024-05-14 23:34:25.014620: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


## Test Random Environment

In [3]:
environment_name = 'LunarLander-v2'
env = gym.make(environment_name, render_mode="human")
observation, info = env.reset(seed=42)

for _ in range(5):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)

    if terminated or truncated:
        observation, info = env.reset()
        
env.close()

## Build Model

In [4]:
vec_env = make_vec_env(environment_name, n_envs=4)

model = PPO("MlpPolicy", vec_env, verbose=1)
model.learn(total_timesteps=200000)

Using cpu device
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 91.4     |
|    ep_rew_mean     | -182     |
| time/              |          |
|    fps             | 1609     |
|    iterations      | 1        |
|    time_elapsed    | 5        |
|    total_timesteps | 8192     |
---------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 95.7         |
|    ep_rew_mean          | -165         |
| time/                   |              |
|    fps                  | 919          |
|    iterations           | 2            |
|    time_elapsed         | 17           |
|    total_timesteps      | 16384        |
| train/                  |              |
|    approx_kl            | 0.0086719375 |
|    clip_fraction        | 0.0557       |
|    clip_range           | 0.2          |
|    entropy_loss         | -1.38        |
|    explained_variance   | 0.00227      

<stable_baselines3.ppo.ppo.PPO at 0x7fe7ade3bd40>

## Evaluate Model

In [5]:
evaluate_policy(model, vec_env, n_eval_episodes=10, render=True)
vec_env.close()

## Save Model

In [6]:
model.save("lunar_lander.keras")