# Stable Baselines3 - PyBullet: Normalizing Features and Reward

Github Repo: [https://github.com/DLR-RM/stable-baselines3](https://github.com/DLR-RM/stable-baselines3)


[RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo) is a collection of pre-trained Reinforcement Learning agents using Stable-Baselines3.

It also provides basic scripts for training, evaluating agents, tuning hyperparameters and recording videos.

Documentation is available online: [https://stable-baselines3.readthedocs.io/](https://stable-baselines3.readthedocs.io/)

Pybullet source code: https://github.com/bulletphysics/bullet3/tree/master/examples/pybullet/

## Install Dependencies and Stable Baselines Using Pip


```
pip install stable-baselines3[extra]
```

In [1]:
!pip install stable-baselines3[extra] pybullet

Successfully installed pybullet-3.2.0 stable-baselines3-1.3.0


## Import policy, RL agent, Wrappers

In [2]:
import os 

import pybullet_envs

from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.vec_env import VecNormalize

## Create and wrap the environment with `VecNormalize`

Normalizing input features may be essential to successful training of an RL agent (by default, images are scaled but not other types of input), for instance when training on [PyBullet](https://github.com/bulletphysics/bullet3/) environments. For that, a wrapper exists and will compute a running average and standard deviation of input features (it can do the same for rewards).

More information about `VecNormalize`:
- [Documentation](https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#stable_baselines3.common.vec_env.VecNormalize)
- [Discussion](https://github.com/hill-a/stable-baselines/issues/698)

In [3]:
env = make_vec_env("HalfCheetahBulletEnv-v0", n_envs=1)

env = VecNormalize(env, norm_obs=True, norm_reward=True, clip_obs=10.)



### Train the agent

In [4]:
model = PPO('MlpPolicy', env, verbose=1)
model.learn(total_timesteps=2000)

Using cuda device
----------------------------------
| rollout/           |           |
|    ep_len_mean     | 1e+03     |
|    ep_rew_mean     | -1.23e+03 |
| time/              |           |
|    fps             | 313       |
|    iterations      | 1         |
|    time_elapsed    | 6         |
|    total_timesteps | 2048      |
----------------------------------


<stable_baselines3.ppo.ppo.PPO at 0x7fc079d72090>

### Save the agent and the normalization

In [5]:
# Don't forget to save the VecNormalize statistics when saving the agent
log_dir = "/tmp/"
model.save(log_dir + "ppo_halfcheetah")
stats_path = os.path.join(log_dir, "vec_normalize.pkl")
env.save(stats_path)

### Test model: load the saved agent and normalization

In [6]:
# Load the agent
model = PPO.load(log_dir + "ppo_halfcheetah")

# Load the saved statistics
env = make_vec_env("HalfCheetahBulletEnv-v0", n_envs=1)
env = VecNormalize.load(stats_path, env)
#  do not update them at test time
env.training = False
# reward normalization is not needed at test time
env.norm_reward = False



In [7]:
from stable_baselines3.common.evaluation import evaluate_policy

In [8]:
mean_reward, std_reward = evaluate_policy(model, env)

print(f"Mean reward = {mean_reward:.2f} +/- {std_reward:.2f}")

Mean reward = -1218.67 +/- 93.55
