# Lunar Lander

## Install Packages

In [1]:
%pip install tensorflow stable-baselines3 gymnasium 'gymnasium[box2d]'

Note: you may need to restart the kernel to use updated packages.


## Import Dependencies

In [2]:
import gymnasium as gym
from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.evaluation import evaluate_policy

2024-05-14 23:11:35.347958: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


## Test Random Environment

In [3]:
environment_name = 'LunarLander-v2'
env = gym.make(environment_name, render_mode="human")
observation, info = env.reset(seed=42)

for _ in range(5):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)

    if terminated or truncated:
        observation, info = env.reset()
        
env.close()

## Build Model

In [4]:
vec_env = make_vec_env(environment_name, n_envs=4)

model = PPO("MlpPolicy", vec_env, verbose=1)
model.learn(total_timesteps=25000)

Using cpu device
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 90.3     |
|    ep_rew_mean     | -168     |
| time/              |          |
|    fps             | 1514     |
|    iterations      | 1        |
|    time_elapsed    | 5        |
|    total_timesteps | 8192     |
---------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 96.7        |
|    ep_rew_mean          | -170        |
| time/                   |             |
|    fps                  | 995         |
|    iterations           | 2           |
|    time_elapsed         | 16          |
|    total_timesteps      | 16384       |
| train/                  |             |
|    approx_kl            | 0.006410094 |
|    clip_fraction        | 0.0324      |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.38       |
|    explained_variance   | -0.000416   |
|    learning

<stable_baselines3.ppo.ppo.PPO at 0x7f9f986378c0>

## Evaluate Model

In [5]:
evaluate_policy(model, vec_env, n_eval_episodes=10, render=True)
vec_env.close()

## Save Model

In [6]:
model.save("lunar_lander.keras")