# Training PPO Model on CarRacing-v3 Environment

This notebook demonstrates how to train a Proximal Policy Optimization (PPO) agent with a CNN policy on the CarRacing-v3 environment from Gymnasium using vectorized environments for faster training.

## 1. Import Required Libraries

Import the necessary libraries, including Gymnasium, Stable-Baselines3, and other dependencies for vectorized environments and PPO training.

In [2]:
import gymnasium as gym
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv, SubprocVecEnv, VecFrameStack
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.callbacks import CheckpointCallback, EvalCallback
import os
import numpy as np
import torch

print(f"Gymnasium version: {gym.__version__}")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

Gymnasium version: 1.2.2
PyTorch version: 2.6.0+cu124
CUDA available: True


## 2. Create Vectorized CarRacing-v3 Environments

Use Gymnasium's make_vec_env to create multiple parallel instances of the CarRacing-v3 environment for faster training.

In [3]:
# Configuration parameters
NUM_ENVS = 6  # Number of parallel environments
ENV_ID = "CarRacing-v3"

# Training configuration
TOTAL_TIMESTEPS = 2_000_000  # Total number of timesteps to train
SAVE_FREQ = 50_000  # Save model every N steps

# Create vectorized environments
# Using SubprocVecEnv for true parallelization (separate processes)
# Use DummyVecEnv if you encounter issues with SubprocVecEnv
env = make_vec_env(
    ENV_ID,
    n_envs=NUM_ENVS,
    vec_env_cls=SubprocVecEnv,
    seed=42
)

# Optional: Stack frames for temporal information
# env = VecFrameStack(env, n_stack=4)

print(f"Created {NUM_ENVS} vectorized environments")
print(f"Observation space: {env.observation_space}")
print(f"Action space: {env.action_space}")

Created 6 vectorized environments
Observation space: Box(0, 255, (96, 96, 3), uint8)
Action space: Box([-1.  0.  0.], 1.0, (3,), float32)


## 3. Initialize PPO Model with CNN Policy

Create a PPO model from Stable-Baselines3 with a CnnPolicy, which is suitable for processing the visual observations from the CarRacing environment.

In [5]:
# Create directories for saving models and logs
os.makedirs("base_models", exist_ok=True)
os.makedirs("logs", exist_ok=True)

# Default PPO hyperparameters
model = PPO(
    policy="CnnPolicy",
    env=env,
    learning_rate=3e-4,
    n_steps=2048,  # Number of steps to run for each environment per update
    batch_size=64,  # Minibatch size
    n_epochs=10,  # Number of epoch when optimizing the surrogate loss
    gamma=0.99,  # Discount factor
    verbose=1,
    tensorboard_log="./logs/",
    device="cuda" if torch.cuda.is_available() else "cpu"
)

print(f"Model initialized with device: {model.device}")
print(f"Policy architecture: CnnPolicy")

Using cuda device
Wrapping the env in a VecTransposeImage.
Model initialized with device: cuda
Policy architecture: CnnPolicy


## 4. Train the PPO Model

Train the PPO model on the vectorized environments using the learn() method with specified timesteps and logging parameters.

In [6]:


# Setup callbacks
checkpoint_callback = CheckpointCallback(
    save_freq=SAVE_FREQ // NUM_ENVS,  # Adjust for vectorized envs
    save_path="./base_models/checkpoints/",
    name_prefix="ppo_carracing"
)

# Optional: Create evaluation environment for monitoring progress
eval_env = make_vec_env(ENV_ID, n_envs=1, seed=100)
eval_callback = EvalCallback(
    eval_env,
    best_model_save_path="./base_models/best_model/",
    log_path="./logs/eval/",
    eval_freq = max(SAVE_FREQ // NUM_ENVS, 1),  # Evaluate every N steps
    n_eval_episodes=5,
    deterministic=True,
    render=False
)


  from pkg_resources import resource_stream, resource_exists


In [None]:

# Train the model
print("Starting training...")
model.learn(
    total_timesteps=TOTAL_TIMESTEPS,
    callback=[checkpoint_callback, eval_callback],
    progress_bar=True
)

print("Training completed!")

## 4.2 Restore a checkpoint

In [7]:
import glob
import os
import re

# Find the latest checkpoint in the checkpoints directory
checkpoint_dir = "./base_models/checkpoints/"
checkpoint_files = glob.glob(os.path.join(checkpoint_dir, "ppo_carracing_*.zip"))

if checkpoint_files:
    # Sort by modification time to get the latest
    latest_checkpoint = max(checkpoint_files, key=os.path.getmtime)
    print(f"Loading checkpoint: {latest_checkpoint}")
    
    # Extract the step number from the checkpoint filename
    # Format: ppo_carracing_XXXXXX_steps.zip
    match = re.search(r'ppo_carracing_(\d+)_steps\.zip', latest_checkpoint)
    if match:
        current_steps = int(match.group(1))
        print(f"Checkpoint is at step: {current_steps}")
        
        # Calculate remaining steps to reach TOTAL_TIMESTEPS
        remaining_steps = TOTAL_TIMESTEPS - current_steps
        
        if remaining_steps <= 0:
            print(f"Training already completed! Current steps ({current_steps}) >= Target ({TOTAL_TIMESTEPS})")
        else:
            print(f"Remaining steps to train: {remaining_steps}")
            
            # Create NEW vectorized environment with different number of workers
            NUM_ENVS_RESUME = 4  # Different from original NUM_ENVS
            env_resume = make_vec_env(
                ENV_ID,
                n_envs=NUM_ENVS_RESUME,
                vec_env_cls=SubprocVecEnv,
                seed=42
            )
            print(f"Created {NUM_ENVS_RESUME} vectorized environments for resumed training")
            
            # Load the model from checkpoint with NEW environment
            model = PPO.load(latest_checkpoint, env=env_resume)
            print(f"Model loaded successfully from: {latest_checkpoint}")
            print(f"Resuming training with {NUM_ENVS_RESUME} workers...")
            
            # Update callbacks for new number of environments
            checkpoint_callback_resume = CheckpointCallback(
                save_freq=SAVE_FREQ // NUM_ENVS_RESUME,
                save_path="./base_models/checkpoints/",
                name_prefix="ppo_carracing"
            )
            
            eval_callback_resume = EvalCallback(
                eval_env,
                best_model_save_path="./base_models/best_model/",
                log_path="./logs/eval/",
                eval_freq=max(SAVE_FREQ // NUM_ENVS_RESUME, 1),
                n_eval_episodes=5,
                deterministic=True,
                render=False
            )
            
            # Continue training from this checkpoint for REMAINING steps only
            model.learn(
                total_timesteps=remaining_steps,  # Train only remaining steps
                callback=[checkpoint_callback_resume, eval_callback_resume],
                progress_bar=True,
                reset_num_timesteps=False  # Continue counting timesteps from checkpoint
            )
            
            print(f"Training completed! Total steps reached: {TOTAL_TIMESTEPS}")
            
            # Close the resumed environment
            env_resume.close()
    else:
        print("Could not extract step number from checkpoint filename.")
else:
    print("No checkpoint files found. Please train from scratch first.")


Loading checkpoint: ./models/checkpoints\ppo_carracing_1249950_steps.zip
Checkpoint is at step: 1249950
Remaining steps to train: 750050
Created 4 vectorized environments for resumed training
Wrapping the env in a VecTransposeImage.
Model loaded successfully from: ./models/checkpoints\ppo_carracing_1249950_steps.zip
Resuming training with 4 workers...
Logging to ./logs/PPO_2
Created 4 vectorized environments for resumed training
Wrapping the env in a VecTransposeImage.
Model loaded successfully from: ./models/checkpoints\ppo_carracing_1249950_steps.zip
Resuming training with 4 workers...
Logging to ./logs/PPO_2


Output()



---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1e+03    |
|    ep_rew_mean     | 5.05     |
| time/              |          |
|    fps             | 412      |
|    iterations      | 1        |
|    time_elapsed    | 19       |
|    total_timesteps | 1258142  |
---------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 1e+03     |
|    ep_rew_mean          | 12.6      |
| time/                   |           |
|    fps                  | 335       |
|    iterations           | 2         |
|    time_elapsed         | 48        |
|    total_timesteps      | 1266334   |
| train/                  |           |
|    approx_kl            | 40.659527 |
|    clip_fraction        | 0.966     |
|    clip_range           | 0.2       |
|    entropy_loss         | 3.88      |
|    explained_variance   | 0.942     |
|    learning_rate        | 0.0003    |
|    loss                 | 0.189     |
|    n_updates            | 1020      |
|    policy_gradient_loss | 0.0833    |
|    std                  | 0.066     |
|    value_loss           | 0.734     |
---------------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 1e+03     |
|    ep_rew_mean          | 17.4      |
| time/                   |           |
|    fps                  | 316       |
|    iterations           | 3         |
|    time_elapsed         | 77        |
|    total_timesteps      | 1274526   |
| train/                  |           |
|    approx_kl            | 19.640833 |
|    clip_fraction        | 0.942     |
|    clip_range           | 0.2       |
|    entropy_loss         | 3.88      |
|    explained_variance   | 0.955     |
|    learning_rate        | 0.0003    |
|    loss                 | 0.0909    |
|    n_updates            | 1030      |
|    policy_gradient_loss | 0.0705    |
|    std                  | 0.0655    |
|    value_loss           | 1.4       |
---------------------------------------


--------------------------------------
| rollout/                |          |
|    ep_len_mean          | 1e+03    |
|    ep_rew_mean          | 16.5     |
| time/                   |          |
|    fps                  | 301      |
|    iterations           | 4        |
|    time_elapsed         | 108      |
|    total_timesteps      | 1282718  |
| train/                  |          |
|    approx_kl            | 18.74691 |
|    clip_fraction        | 0.953    |
|    clip_range           | 0.2      |
|    entropy_loss         | 3.91     |
|    explained_variance   | 0.967    |
|    learning_rate        | 0.0003   |
|    loss                 | 0.143    |
|    n_updates            | 1040     |
|    policy_gradient_loss | 0.0649   |
|    std                  | 0.0646   |
|    value_loss           | 0.739    |
--------------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 1e+03     |
|    ep_rew_mean          | 14.3      |
| time/                   |           |
|    fps                  | 284       |
|    iterations           | 5         |
|    time_elapsed         | 143       |
|    total_timesteps      | 1290910   |
| train/                  |           |
|    approx_kl            | 28.652756 |
|    clip_fraction        | 0.908     |
|    clip_range           | 0.2       |
|    entropy_loss         | 3.97      |
|    explained_variance   | 0.964     |
|    learning_rate        | 0.0003    |
|    loss                 | 0.26      |
|    n_updates            | 1050      |
|    policy_gradient_loss | 0.0545    |
|    std                  | 0.0631    |
|    value_loss           | 1.01      |
---------------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 1e+03     |
|    ep_rew_mean          | 17.2      |
| time/                   |           |
|    fps                  | 273       |
|    iterations           | 6         |
|    time_elapsed         | 179       |
|    total_timesteps      | 1299102   |
| train/                  |           |
|    approx_kl            | 41.767063 |
|    clip_fraction        | 0.981     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4         |
|    explained_variance   | 0.919     |
|    learning_rate        | 0.0003    |
|    loss                 | 0.148     |
|    n_updates            | 1060      |
|    policy_gradient_loss | 0.18      |
|    std                  | 0.0634    |
|    value_loss           | 0.19      |
---------------------------------------


---------------------------------------
| eval/                   |           |
|    mean_ep_length       | 1e+03     |
|    mean_reward          | 55.8      |
| time/                   |           |
|    total_timesteps      | 1299950   |
| train/                  |           |
|    approx_kl            | 17.070412 |
|    clip_fraction        | 0.94      |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.04      |
|    explained_variance   | 0.879     |
|    learning_rate        | 0.0003    |
|    loss                 | 0.137     |
|    n_updates            | 1070      |
|    policy_gradient_loss | 0.0551    |
|    std                  | 0.0617    |
|    value_loss           | 0.36      |
---------------------------------------


---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1e+03    |
|    ep_rew_mean     | 25.7     |
| time/              |          |
|    fps             | 222      |
|    iterations      | 7        |
|    time_elapsed    | 257      |
|    total_timesteps | 1307294  |
---------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 1e+03     |
|    ep_rew_mean          | 18.1      |
| time/                   |           |
|    fps                  | 222       |
|    iterations           | 8         |
|    time_elapsed         | 294       |
|    total_timesteps      | 1315486   |
| train/                  |           |
|    approx_kl            | 13.815698 |
|    clip_fraction        | 0.939     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.1       |
|    explained_variance   | 0.937     |
|    learning_rate        | 0.0003    |
|    loss                 | 0.157     |
|    n_updates            | 1080      |
|    policy_gradient_loss | 0.0367    |
|    std                  | 0.0606    |
|    value_loss           | 0.584     |
---------------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 1e+03     |
|    ep_rew_mean          | 8.39      |
| time/                   |           |
|    fps                  | 223       |
|    iterations           | 9         |
|    time_elapsed         | 330       |
|    total_timesteps      | 1323678   |
| train/                  |           |
|    approx_kl            | 13.420033 |
|    clip_fraction        | 0.919     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.14      |
|    explained_variance   | 0.954     |
|    learning_rate        | 0.0003    |
|    loss                 | 0.0631    |
|    n_updates            | 1090      |
|    policy_gradient_loss | 0.0488    |
|    std                  | 0.0601    |
|    value_loss           | 0.764     |
---------------------------------------


--------------------------------------
| rollout/                |          |
|    ep_len_mean          | 1e+03    |
|    ep_rew_mean          | 3.54     |
| time/                   |          |
|    fps                  | 216      |
|    iterations           | 10       |
|    time_elapsed         | 379      |
|    total_timesteps      | 1331870  |
| train/                  |          |
|    approx_kl            | 8.402934 |
|    clip_fraction        | 0.907    |
|    clip_range           | 0.2      |
|    entropy_loss         | 4.16     |
|    explained_variance   | 0.956    |
|    learning_rate        | 0.0003   |
|    loss                 | 0.111    |
|    n_updates            | 1100     |
|    policy_gradient_loss | 0.0503   |
|    std                  | 0.0596   |
|    value_loss           | 0.599    |
--------------------------------------


--------------------------------------
| rollout/                |          |
|    ep_len_mean          | 1e+03    |
|    ep_rew_mean          | -2.68    |
| time/                   |          |
|    fps                  | 210      |
|    iterations           | 11       |
|    time_elapsed         | 427      |
|    total_timesteps      | 1340062  |
| train/                  |          |
|    approx_kl            | 12.30982 |
|    clip_fraction        | 0.931    |
|    clip_range           | 0.2      |
|    entropy_loss         | 4.2      |
|    explained_variance   | 0.95     |
|    learning_rate        | 0.0003   |
|    loss                 | 0.00421  |
|    n_updates            | 1110     |
|    policy_gradient_loss | 0.0589   |
|    std                  | 0.059    |
|    value_loss           | 0.275    |
--------------------------------------


--------------------------------------
| rollout/                |          |
|    ep_len_mean          | 1e+03    |
|    ep_rew_mean          | -11.6    |
| time/                   |          |
|    fps                  | 212      |
|    iterations           | 12       |
|    time_elapsed         | 462      |
|    total_timesteps      | 1348254  |
| train/                  |          |
|    approx_kl            | 4.773471 |
|    clip_fraction        | 0.905    |
|    clip_range           | 0.2      |
|    entropy_loss         | 4.19     |
|    explained_variance   | 0.881    |
|    learning_rate        | 0.0003   |
|    loss                 | -0.0123  |
|    n_updates            | 1120     |
|    policy_gradient_loss | 0.126    |
|    std                  | 0.0592   |
|    value_loss           | 0.118    |
--------------------------------------


---------------------------------------
| eval/                   |           |
|    mean_ep_length       | 1e+03     |
|    mean_reward          | -16.5     |
| time/                   |           |
|    total_timesteps      | 1349950   |
| train/                  |           |
|    approx_kl            | 11.290407 |
|    clip_fraction        | 0.874     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.2       |
|    explained_variance   | 0.917     |
|    learning_rate        | 0.0003    |
|    loss                 | 0.0211    |
|    n_updates            | 1130      |
|    policy_gradient_loss | 0.129     |
|    std                  | 0.0595    |
|    value_loss           | 0.0985    |
---------------------------------------


---------------------------------
| rollout/           |          |
|    ep_len_mean     | 998      |
|    ep_rew_mean     | -14.7    |
| time/              |          |
|    fps             | 197      |
|    iterations      | 13       |
|    time_elapsed    | 539      |
|    total_timesteps | 1356446  |
---------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 998       |
|    ep_rew_mean          | -19       |
| time/                   |           |
|    fps                  | 199       |
|    iterations           | 14        |
|    time_elapsed         | 574       |
|    total_timesteps      | 1364638   |
| train/                  |           |
|    approx_kl            | 19.482695 |
|    clip_fraction        | 0.856     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.19      |
|    explained_variance   | 0.86      |
|    learning_rate        | 0.0003    |
|    loss                 | 0.143     |
|    n_updates            | 1140      |
|    policy_gradient_loss | 0.0635    |
|    std                  | 0.0593    |
|    value_loss           | 3.69      |
---------------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 998       |
|    ep_rew_mean          | -28.5     |
| time/                   |           |
|    fps                  | 201       |
|    iterations           | 15        |
|    time_elapsed         | 609       |
|    total_timesteps      | 1372830   |
| train/                  |           |
|    approx_kl            | 2.0217662 |
|    clip_fraction        | 0.703     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.24      |
|    explained_variance   | 0.817     |
|    learning_rate        | 0.0003    |
|    loss                 | -0.0165   |
|    n_updates            | 1150      |
|    policy_gradient_loss | -0.0024   |
|    std                  | 0.0583    |
|    value_loss           | 0.23      |
---------------------------------------


--------------------------------------
| rollout/                |          |
|    ep_len_mean          | 998      |
|    ep_rew_mean          | -36.9    |
| time/                   |          |
|    fps                  | 198      |
|    iterations           | 16       |
|    time_elapsed         | 661      |
|    total_timesteps      | 1381022  |
| train/                  |          |
|    approx_kl            | 2.044558 |
|    clip_fraction        | 0.676    |
|    clip_range           | 0.2      |
|    entropy_loss         | 4.31     |
|    explained_variance   | 0.817    |
|    learning_rate        | 0.0003   |
|    loss                 | 0.0548   |
|    n_updates            | 1160     |
|    policy_gradient_loss | -0.00665 |
|    std                  | 0.0569   |
|    value_loss           | 0.129    |
--------------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 998       |
|    ep_rew_mean          | -40.9     |
| time/                   |           |
|    fps                  | 199       |
|    iterations           | 17        |
|    time_elapsed         | 698       |
|    total_timesteps      | 1389214   |
| train/                  |           |
|    approx_kl            | 1.9611211 |
|    clip_fraction        | 0.677     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.4       |
|    explained_variance   | 0.886     |
|    learning_rate        | 0.0003    |
|    loss                 | 0.0336    |
|    n_updates            | 1170      |
|    policy_gradient_loss | -0.012    |
|    std                  | 0.0553    |
|    value_loss           | 0.0918    |
---------------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 998       |
|    ep_rew_mean          | -43.7     |
| time/                   |           |
|    fps                  | 198       |
|    iterations           | 18        |
|    time_elapsed         | 741       |
|    total_timesteps      | 1397406   |
| train/                  |           |
|    approx_kl            | 3.0225925 |
|    clip_fraction        | 0.712     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.43      |
|    explained_variance   | 0.942     |
|    learning_rate        | 0.0003    |
|    loss                 | 0.108     |
|    n_updates            | 1180      |
|    policy_gradient_loss | 0.0326    |
|    std                  | 0.0545    |
|    value_loss           | 0.0516    |
---------------------------------------


---------------------------------------
| eval/                   |           |
|    mean_ep_length       | 1e+03     |
|    mean_reward          | -79.3     |
| time/                   |           |
|    total_timesteps      | 1399950   |
| train/                  |           |
|    approx_kl            | 1.9046829 |
|    clip_fraction        | 0.689     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.47      |
|    explained_variance   | 0.954     |
|    learning_rate        | 0.0003    |
|    loss                 | 0.0318    |
|    n_updates            | 1190      |
|    policy_gradient_loss | 0.0238    |
|    std                  | 0.0544    |
|    value_loss           | 0.0619    |
---------------------------------------


---------------------------------
| rollout/           |          |
|    ep_len_mean     | 998      |
|    ep_rew_mean     | -46.2    |
| time/              |          |
|    fps             | 192      |
|    iterations      | 19       |
|    time_elapsed    | 809      |
|    total_timesteps | 1405598  |
---------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 998       |
|    ep_rew_mean          | -58.2     |
| time/                   |           |
|    fps                  | 192       |
|    iterations           | 20        |
|    time_elapsed         | 849       |
|    total_timesteps      | 1413790   |
| train/                  |           |
|    approx_kl            | 1.8652523 |
|    clip_fraction        | 0.661     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.48      |
|    explained_variance   | 0.961     |
|    learning_rate        | 0.0003    |
|    loss                 | -0.0472   |
|    n_updates            | 1200      |
|    policy_gradient_loss | 0.0289    |
|    std                  | 0.0539    |
|    value_loss           | 0.0588    |
---------------------------------------


--------------------------------------
| rollout/                |          |
|    ep_len_mean          | 998      |
|    ep_rew_mean          | -66.3    |
| time/                   |          |
|    fps                  | 195      |
|    iterations           | 21       |
|    time_elapsed         | 879      |
|    total_timesteps      | 1421982  |
| train/                  |          |
|    approx_kl            | 2.111186 |
|    clip_fraction        | 0.637    |
|    clip_range           | 0.2      |
|    entropy_loss         | 4.55     |
|    explained_variance   | 0.975    |
|    learning_rate        | 0.0003   |
|    loss                 | -0.0385  |
|    n_updates            | 1210     |
|    policy_gradient_loss | 0.0129   |
|    std                  | 0.0527   |
|    value_loss           | 0.0429   |
--------------------------------------


--------------------------------------
| rollout/                |          |
|    ep_len_mean          | 998      |
|    ep_rew_mean          | -68      |
| time/                   |          |
|    fps                  | 196      |
|    iterations           | 22       |
|    time_elapsed         | 916      |
|    total_timesteps      | 1430174  |
| train/                  |          |
|    approx_kl            | 1.425946 |
|    clip_fraction        | 0.737    |
|    clip_range           | 0.2      |
|    entropy_loss         | 4.55     |
|    explained_variance   | 0.983    |
|    learning_rate        | 0.0003   |
|    loss                 | -0.00807 |
|    n_updates            | 1220     |
|    policy_gradient_loss | 0.0439   |
|    std                  | 0.0528   |
|    value_loss           | 0.0461   |
--------------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 998       |
|    ep_rew_mean          | -67.6     |
| time/                   |           |
|    fps                  | 198       |
|    iterations           | 23        |
|    time_elapsed         | 947       |
|    total_timesteps      | 1438366   |
| train/                  |           |
|    approx_kl            | 1.2212979 |
|    clip_fraction        | 0.74      |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.51      |
|    explained_variance   | 0.987     |
|    learning_rate        | 0.0003    |
|    loss                 | 0.0657    |
|    n_updates            | 1230      |
|    policy_gradient_loss | 0.0543    |
|    std                  | 0.0533    |
|    value_loss           | 0.0365    |
---------------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 998       |
|    ep_rew_mean          | -67.3     |
| time/                   |           |
|    fps                  | 198       |
|    iterations           | 24        |
|    time_elapsed         | 991       |
|    total_timesteps      | 1446558   |
| train/                  |           |
|    approx_kl            | 0.7374873 |
|    clip_fraction        | 0.726     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.46      |
|    explained_variance   | 0.976     |
|    learning_rate        | 0.0003    |
|    loss                 | 0.0559    |
|    n_updates            | 1240      |
|    policy_gradient_loss | 0.0669    |
|    std                  | 0.0553    |
|    value_loss           | 0.0332    |
---------------------------------------


---------------------------------------
| eval/                   |           |
|    mean_ep_length       | 1e+03     |
|    mean_reward          | -78.4     |
| time/                   |           |
|    total_timesteps      | 1449950   |
| train/                  |           |
|    approx_kl            | 0.8719834 |
|    clip_fraction        | 0.708     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.37      |
|    explained_variance   | 0.977     |
|    learning_rate        | 0.0003    |
|    loss                 | -0.00255  |
|    n_updates            | 1250      |
|    policy_gradient_loss | 0.0425    |
|    std                  | 0.0566    |
|    value_loss           | 0.0342    |
---------------------------------------


---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1e+03    |
|    ep_rew_mean     | -71.4    |
| time/              |          |
|    fps             | 191      |
|    iterations      | 25       |
|    time_elapsed    | 1068     |
|    total_timesteps | 1454750  |
---------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 1e+03     |
|    ep_rew_mean          | -73.1     |
| time/                   |           |
|    fps                  | 191       |
|    iterations           | 26        |
|    time_elapsed         | 1111      |
|    total_timesteps      | 1462942   |
| train/                  |           |
|    approx_kl            | 1.9946144 |
|    clip_fraction        | 0.716     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.3       |
|    explained_variance   | 0.985     |
|    learning_rate        | 0.0003    |
|    loss                 | -0.0198   |
|    n_updates            | 1260      |
|    policy_gradient_loss | 0.0405    |
|    std                  | 0.0583    |
|    value_loss           | 0.0299    |
---------------------------------------


----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -73.8      |
| time/                   |            |
|    fps                  | 192        |
|    iterations           | 27         |
|    time_elapsed         | 1148       |
|    total_timesteps      | 1471134    |
| train/                  |            |
|    approx_kl            | 0.94442075 |
|    clip_fraction        | 0.69       |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.24       |
|    explained_variance   | 0.976      |
|    learning_rate        | 0.0003     |
|    loss                 | 0.00232    |
|    n_updates            | 1270       |
|    policy_gradient_loss | 0.0353     |
|    std                  | 0.0587     |
|    value_loss           | 0.0365     |
----------------------------------------


--------------------------------------
| rollout/                |          |
|    ep_len_mean          | 1e+03    |
|    ep_rew_mean          | -74.3    |
| time/                   |          |
|    fps                  | 194      |
|    iterations           | 28       |
|    time_elapsed         | 1177     |
|    total_timesteps      | 1479326  |
| train/                  |          |
|    approx_kl            | 1.694422 |
|    clip_fraction        | 0.711    |
|    clip_range           | 0.2      |
|    entropy_loss         | 4.22     |
|    explained_variance   | 0.988    |
|    learning_rate        | 0.0003   |
|    loss                 | 0.0887   |
|    n_updates            | 1280     |
|    policy_gradient_loss | 0.0249   |
|    std                  | 0.0593   |
|    value_loss           | 0.0285   |
--------------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 1e+03     |
|    ep_rew_mean          | -74.4     |
| time/                   |           |
|    fps                  | 196       |
|    iterations           | 29        |
|    time_elapsed         | 1207      |
|    total_timesteps      | 1487518   |
| train/                  |           |
|    approx_kl            | 1.0228643 |
|    clip_fraction        | 0.7       |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.18      |
|    explained_variance   | 0.981     |
|    learning_rate        | 0.0003    |
|    loss                 | 0.0136    |
|    n_updates            | 1290      |
|    policy_gradient_loss | 0.0317    |
|    std                  | 0.06      |
|    value_loss           | 0.0321    |
---------------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 1e+03     |
|    ep_rew_mean          | -74.6     |
| time/                   |           |
|    fps                  | 198       |
|    iterations           | 30        |
|    time_elapsed         | 1237      |
|    total_timesteps      | 1495710   |
| train/                  |           |
|    approx_kl            | 1.6046503 |
|    clip_fraction        | 0.717     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.16      |
|    explained_variance   | 0.98      |
|    learning_rate        | 0.0003    |
|    loss                 | -0.0238   |
|    n_updates            | 1300      |
|    policy_gradient_loss | 0.0332    |
|    std                  | 0.0604    |
|    value_loss           | 0.0358    |
---------------------------------------


----------------------------------------
| eval/                   |            |
|    mean_ep_length       | 1e+03      |
|    mean_reward          | -78.5      |
| time/                   |            |
|    total_timesteps      | 1499950    |
| train/                  |            |
|    approx_kl            | 0.67103827 |
|    clip_fraction        | 0.602      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.17       |
|    explained_variance   | 0.98       |
|    learning_rate        | 0.0003     |
|    loss                 | 0.0347     |
|    n_updates            | 1310       |
|    policy_gradient_loss | 0.0156     |
|    std                  | 0.0597     |
|    value_loss           | 0.0208     |
----------------------------------------


---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1e+03    |
|    ep_rew_mean     | -75.1    |
| time/              |          |
|    fps             | 192      |
|    iterations      | 31       |
|    time_elapsed    | 1319     |
|    total_timesteps | 1503902  |
---------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 1e+03     |
|    ep_rew_mean          | -75.8     |
| time/                   |           |
|    fps                  | 192       |
|    iterations           | 32        |
|    time_elapsed         | 1362      |
|    total_timesteps      | 1512094   |
| train/                  |           |
|    approx_kl            | 1.4344342 |
|    clip_fraction        | 0.707     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.2       |
|    explained_variance   | 0.971     |
|    learning_rate        | 0.0003    |
|    loss                 | -0.028    |
|    n_updates            | 1320      |
|    policy_gradient_loss | 0.0256    |
|    std                  | 0.0589    |
|    value_loss           | 0.0283    |
---------------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 1e+03     |
|    ep_rew_mean          | -75.9     |
| time/                   |           |
|    fps                  | 192       |
|    iterations           | 33        |
|    time_elapsed         | 1402      |
|    total_timesteps      | 1520286   |
| train/                  |           |
|    approx_kl            | 0.7649287 |
|    clip_fraction        | 0.697     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.2       |
|    explained_variance   | 0.97      |
|    learning_rate        | 0.0003    |
|    loss                 | -0.0254   |
|    n_updates            | 1330      |
|    policy_gradient_loss | 0.0341    |
|    std                  | 0.0596    |
|    value_loss           | 0.0328    |
---------------------------------------


----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -76.2      |
| time/                   |            |
|    fps                  | 192        |
|    iterations           | 34         |
|    time_elapsed         | 1449       |
|    total_timesteps      | 1528478    |
| train/                  |            |
|    approx_kl            | 0.51876044 |
|    clip_fraction        | 0.677      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.16       |
|    explained_variance   | 0.988      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.00343   |
|    n_updates            | 1340       |
|    policy_gradient_loss | 0.0395     |
|    std                  | 0.0607     |
|    value_loss           | 0.0241     |
----------------------------------------


----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -76.7      |
| time/                   |            |
|    fps                  | 191        |
|    iterations           | 35         |
|    time_elapsed         | 1497       |
|    total_timesteps      | 1536670    |
| train/                  |            |
|    approx_kl            | 0.55271184 |
|    clip_fraction        | 0.656      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.13       |
|    explained_variance   | 0.977      |
|    learning_rate        | 0.0003     |
|    loss                 | 0.0328     |
|    n_updates            | 1350       |
|    policy_gradient_loss | 0.0295     |
|    std                  | 0.0602     |
|    value_loss           | 0.0317     |
----------------------------------------


----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -76.7      |
| time/                   |            |
|    fps                  | 191        |
|    iterations           | 36         |
|    time_elapsed         | 1538       |
|    total_timesteps      | 1544862    |
| train/                  |            |
|    approx_kl            | 0.36869538 |
|    clip_fraction        | 0.644      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.13       |
|    explained_variance   | 0.986      |
|    learning_rate        | 0.0003     |
|    loss                 | 0.123      |
|    n_updates            | 1360       |
|    policy_gradient_loss | 0.0345     |
|    std                  | 0.0611     |
|    value_loss           | 0.0235     |
----------------------------------------


----------------------------------------
| eval/                   |            |
|    mean_ep_length       | 1e+03      |
|    mean_reward          | -76.2      |
| time/                   |            |
|    total_timesteps      | 1549950    |
| train/                  |            |
|    approx_kl            | 0.35467952 |
|    clip_fraction        | 0.636      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.09       |
|    explained_variance   | 0.984      |
|    learning_rate        | 0.0003     |
|    loss                 | 0.00571    |
|    n_updates            | 1370       |
|    policy_gradient_loss | 0.0236     |
|    std                  | 0.0615     |
|    value_loss           | 0.029      |
----------------------------------------


---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1e+03    |
|    ep_rew_mean     | -76.5    |
| time/              |          |
|    fps             | 185      |
|    iterations      | 37       |
|    time_elapsed    | 1637     |
|    total_timesteps | 1553054  |
---------------------------------


--------------------------------------
| rollout/                |          |
|    ep_len_mean          | 1e+03    |
|    ep_rew_mean          | -76.2    |
| time/                   |          |
|    fps                  | 184      |
|    iterations           | 38       |
|    time_elapsed         | 1685     |
|    total_timesteps      | 1561246  |
| train/                  |          |
|    approx_kl            | 1.561044 |
|    clip_fraction        | 0.686    |
|    clip_range           | 0.2      |
|    entropy_loss         | 4.09     |
|    explained_variance   | 0.979    |
|    learning_rate        | 0.0003   |
|    loss                 | 0.0114   |
|    n_updates            | 1380     |
|    policy_gradient_loss | 0.00895  |
|    std                  | 0.0611   |
|    value_loss           | 0.0316   |
--------------------------------------


----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -76.1      |
| time/                   |            |
|    fps                  | 185        |
|    iterations           | 39         |
|    time_elapsed         | 1726       |
|    total_timesteps      | 1569438    |
| train/                  |            |
|    approx_kl            | 0.42855942 |
|    clip_fraction        | 0.666      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.11       |
|    explained_variance   | 0.986      |
|    learning_rate        | 0.0003     |
|    loss                 | 0.00283    |
|    n_updates            | 1390       |
|    policy_gradient_loss | 0.0197     |
|    std                  | 0.0611     |
|    value_loss           | 0.0318     |
----------------------------------------


----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -76.3      |
| time/                   |            |
|    fps                  | 185        |
|    iterations           | 40         |
|    time_elapsed         | 1767       |
|    total_timesteps      | 1577630    |
| train/                  |            |
|    approx_kl            | 0.96496356 |
|    clip_fraction        | 0.676      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.11       |
|    explained_variance   | 0.989      |
|    learning_rate        | 0.0003     |
|    loss                 | 5.51e-05   |
|    n_updates            | 1400       |
|    policy_gradient_loss | 0.0194     |
|    std                  | 0.0608     |
|    value_loss           | 0.0327     |
----------------------------------------


----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -75.8      |
| time/                   |            |
|    fps                  | 185        |
|    iterations           | 41         |
|    time_elapsed         | 1808       |
|    total_timesteps      | 1585822    |
| train/                  |            |
|    approx_kl            | 0.44161856 |
|    clip_fraction        | 0.663      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.11       |
|    explained_variance   | 0.989      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.00997   |
|    n_updates            | 1410       |
|    policy_gradient_loss | 0.022      |
|    std                  | 0.0611     |
|    value_loss           | 0.0362     |
----------------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 1e+03     |
|    ep_rew_mean          | -75.6     |
| time/                   |           |
|    fps                  | 185       |
|    iterations           | 42        |
|    time_elapsed         | 1859      |
|    total_timesteps      | 1594014   |
| train/                  |           |
|    approx_kl            | 0.3758245 |
|    clip_fraction        | 0.632     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.09      |
|    explained_variance   | 0.987     |
|    learning_rate        | 0.0003    |
|    loss                 | -0.0381   |
|    n_updates            | 1420      |
|    policy_gradient_loss | 0.0221    |
|    std                  | 0.0617    |
|    value_loss           | 0.0434    |
---------------------------------------


---------------------------------------
| eval/                   |           |
|    mean_ep_length       | 1e+03     |
|    mean_reward          | -78.7     |
| time/                   |           |
|    total_timesteps      | 1599950   |
| train/                  |           |
|    approx_kl            | 0.7258016 |
|    clip_fraction        | 0.6       |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.11      |
|    explained_variance   | 0.984     |
|    learning_rate        | 0.0003    |
|    loss                 | -0.0239   |
|    n_updates            | 1430      |
|    policy_gradient_loss | 0.0178    |
|    std                  | 0.0607    |
|    value_loss           | 0.0371    |
---------------------------------------


---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1e+03    |
|    ep_rew_mean     | -75.4    |
| time/              |          |
|    fps             | 180      |
|    iterations      | 43       |
|    time_elapsed    | 1951     |
|    total_timesteps | 1602206  |
---------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 1e+03     |
|    ep_rew_mean          | -75.3     |
| time/                   |           |
|    fps                  | 180       |
|    iterations           | 44        |
|    time_elapsed         | 1994      |
|    total_timesteps      | 1610398   |
| train/                  |           |
|    approx_kl            | 0.6266417 |
|    clip_fraction        | 0.596     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.13      |
|    explained_variance   | 0.976     |
|    learning_rate        | 0.0003    |
|    loss                 | -0.0329   |
|    n_updates            | 1440      |
|    policy_gradient_loss | 0.00637   |
|    std                  | 0.0605    |
|    value_loss           | 0.0308    |
---------------------------------------


--------------------------------------
| rollout/                |          |
|    ep_len_mean          | 1e+03    |
|    ep_rew_mean          | -74.8    |
| time/                   |          |
|    fps                  | 181      |
|    iterations           | 45       |
|    time_elapsed         | 2033     |
|    total_timesteps      | 1618590  |
| train/                  |          |
|    approx_kl            | 0.615461 |
|    clip_fraction        | 0.592    |
|    clip_range           | 0.2      |
|    entropy_loss         | 4.17     |
|    explained_variance   | 0.983    |
|    learning_rate        | 0.0003   |
|    loss                 | 0.0125   |
|    n_updates            | 1450     |
|    policy_gradient_loss | 0.0188   |
|    std                  | 0.0591   |
|    value_loss           | 0.0361   |
--------------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 1e+03     |
|    ep_rew_mean          | -74.4     |
| time/                   |           |
|    fps                  | 181       |
|    iterations           | 46        |
|    time_elapsed         | 2071      |
|    total_timesteps      | 1626782   |
| train/                  |           |
|    approx_kl            | 0.8152946 |
|    clip_fraction        | 0.642     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.21      |
|    explained_variance   | 0.978     |
|    learning_rate        | 0.0003    |
|    loss                 | 0.104     |
|    n_updates            | 1460      |
|    policy_gradient_loss | 0.0101    |
|    std                  | 0.0585    |
|    value_loss           | 0.0428    |
---------------------------------------


----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -73.6      |
| time/                   |            |
|    fps                  | 181        |
|    iterations           | 47         |
|    time_elapsed         | 2117       |
|    total_timesteps      | 1634974    |
| train/                  |            |
|    approx_kl            | 0.55781996 |
|    clip_fraction        | 0.611      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.26       |
|    explained_variance   | 0.984      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0256    |
|    n_updates            | 1470       |
|    policy_gradient_loss | 0.0105     |
|    std                  | 0.0578     |
|    value_loss           | 0.0379     |
----------------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 1e+03     |
|    ep_rew_mean          | -73.3     |
| time/                   |           |
|    fps                  | 182       |
|    iterations           | 48        |
|    time_elapsed         | 2154      |
|    total_timesteps      | 1643166   |
| train/                  |           |
|    approx_kl            | 0.7003558 |
|    clip_fraction        | 0.679     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.26      |
|    explained_variance   | 0.984     |
|    learning_rate        | 0.0003    |
|    loss                 | -0.0312   |
|    n_updates            | 1480      |
|    policy_gradient_loss | 0.0268    |
|    std                  | 0.0577    |
|    value_loss           | 0.0446    |
---------------------------------------


---------------------------------------
| eval/                   |           |
|    mean_ep_length       | 1e+03     |
|    mean_reward          | -74.4     |
| time/                   |           |
|    total_timesteps      | 1649950   |
| train/                  |           |
|    approx_kl            | 0.4639774 |
|    clip_fraction        | 0.613     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.3       |
|    explained_variance   | 0.989     |
|    learning_rate        | 0.0003    |
|    loss                 | -0.00535  |
|    n_updates            | 1490      |
|    policy_gradient_loss | 0.0075    |
|    std                  | 0.0572    |
|    value_loss           | 0.0273    |
---------------------------------------


---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1e+03    |
|    ep_rew_mean     | -72.9    |
| time/              |          |
|    fps             | 178      |
|    iterations      | 49       |
|    time_elapsed    | 2245     |
|    total_timesteps | 1651358  |
---------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 1e+03     |
|    ep_rew_mean          | -72.9     |
| time/                   |           |
|    fps                  | 178       |
|    iterations           | 50        |
|    time_elapsed         | 2289      |
|    total_timesteps      | 1659550   |
| train/                  |           |
|    approx_kl            | 1.0419631 |
|    clip_fraction        | 0.638     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.29      |
|    explained_variance   | 0.991     |
|    learning_rate        | 0.0003    |
|    loss                 | 0.0256    |
|    n_updates            | 1500      |
|    policy_gradient_loss | 0.0243    |
|    std                  | 0.057     |
|    value_loss           | 0.0275    |
---------------------------------------


----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -73.1      |
| time/                   |            |
|    fps                  | 179        |
|    iterations           | 51         |
|    time_elapsed         | 2332       |
|    total_timesteps      | 1667742    |
| train/                  |            |
|    approx_kl            | 0.46963954 |
|    clip_fraction        | 0.636      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.31       |
|    explained_variance   | 0.981      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0297    |
|    n_updates            | 1510       |
|    policy_gradient_loss | 0.0113     |
|    std                  | 0.0568     |
|    value_loss           | 0.0309     |
----------------------------------------


----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -72.9      |
| time/                   |            |
|    fps                  | 179        |
|    iterations           | 52         |
|    time_elapsed         | 2369       |
|    total_timesteps      | 1675934    |
| train/                  |            |
|    approx_kl            | 0.52952784 |
|    clip_fraction        | 0.672      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.31       |
|    explained_variance   | 0.986      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0189    |
|    n_updates            | 1520       |
|    policy_gradient_loss | 0.0235     |
|    std                  | 0.0573     |
|    value_loss           | 0.0364     |
----------------------------------------


----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -73        |
| time/                   |            |
|    fps                  | 180        |
|    iterations           | 53         |
|    time_elapsed         | 2405       |
|    total_timesteps      | 1684126    |
| train/                  |            |
|    approx_kl            | 0.44384828 |
|    clip_fraction        | 0.641      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.29       |
|    explained_variance   | 0.989      |
|    learning_rate        | 0.0003     |
|    loss                 | 0.0405     |
|    n_updates            | 1530       |
|    policy_gradient_loss | 0.0148     |
|    std                  | 0.0577     |
|    value_loss           | 0.0319     |
----------------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 1e+03     |
|    ep_rew_mean          | -73.1     |
| time/                   |           |
|    fps                  | 180       |
|    iterations           | 54        |
|    time_elapsed         | 2452      |
|    total_timesteps      | 1692318   |
| train/                  |           |
|    approx_kl            | 0.5414933 |
|    clip_fraction        | 0.625     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.29      |
|    explained_variance   | 0.986     |
|    learning_rate        | 0.0003    |
|    loss                 | -0.0173   |
|    n_updates            | 1540      |
|    policy_gradient_loss | 0.0139    |
|    std                  | 0.0575    |
|    value_loss           | 0.0257    |
---------------------------------------


----------------------------------------
| eval/                   |            |
|    mean_ep_length       | 1e+03      |
|    mean_reward          | -67        |
| time/                   |            |
|    total_timesteps      | 1699950    |
| train/                  |            |
|    approx_kl            | 0.35730487 |
|    clip_fraction        | 0.645      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.25       |
|    explained_variance   | 0.991      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0198    |
|    n_updates            | 1550       |
|    policy_gradient_loss | 0.0201     |
|    std                  | 0.0582     |
|    value_loss           | 0.0253     |
----------------------------------------


---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1e+03    |
|    ep_rew_mean     | -73.2    |
| time/              |          |
|    fps             | 177      |
|    iterations      | 55       |
|    time_elapsed    | 2534     |
|    total_timesteps | 1700510  |
---------------------------------


----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -72.7      |
| time/                   |            |
|    fps                  | 178        |
|    iterations           | 56         |
|    time_elapsed         | 2575       |
|    total_timesteps      | 1708702    |
| train/                  |            |
|    approx_kl            | 0.48260885 |
|    clip_fraction        | 0.659      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.24       |
|    explained_variance   | 0.99       |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0595    |
|    n_updates            | 1560       |
|    policy_gradient_loss | 0.0214     |
|    std                  | 0.0584     |
|    value_loss           | 0.0289     |
----------------------------------------


----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -72.4      |
| time/                   |            |
|    fps                  | 179        |
|    iterations           | 57         |
|    time_elapsed         | 2605       |
|    total_timesteps      | 1716894    |
| train/                  |            |
|    approx_kl            | 0.38696375 |
|    clip_fraction        | 0.673      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.22       |
|    explained_variance   | 0.993      |
|    learning_rate        | 0.0003     |
|    loss                 | 0.067      |
|    n_updates            | 1570       |
|    policy_gradient_loss | 0.0185     |
|    std                  | 0.0594     |
|    value_loss           | 0.0249     |
----------------------------------------


----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -72        |
| time/                   |            |
|    fps                  | 180        |
|    iterations           | 58         |
|    time_elapsed         | 2636       |
|    total_timesteps      | 1725086    |
| train/                  |            |
|    approx_kl            | 0.43742186 |
|    clip_fraction        | 0.649      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.17       |
|    explained_variance   | 0.984      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0451    |
|    n_updates            | 1580       |
|    policy_gradient_loss | 0.0227     |
|    std                  | 0.0601     |
|    value_loss           | 0.0325     |
----------------------------------------


----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -72.1      |
| time/                   |            |
|    fps                  | 181        |
|    iterations           | 59         |
|    time_elapsed         | 2666       |
|    total_timesteps      | 1733278    |
| train/                  |            |
|    approx_kl            | 0.29346514 |
|    clip_fraction        | 0.626      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.15       |
|    explained_variance   | 0.989      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0256    |
|    n_updates            | 1590       |
|    policy_gradient_loss | 0.00756    |
|    std                  | 0.0602     |
|    value_loss           | 0.0324     |
----------------------------------------


----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -72.5      |
| time/                   |            |
|    fps                  | 181        |
|    iterations           | 60         |
|    time_elapsed         | 2705       |
|    total_timesteps      | 1741470    |
| train/                  |            |
|    approx_kl            | 0.47461584 |
|    clip_fraction        | 0.616      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.17       |
|    explained_variance   | 0.987      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0392    |
|    n_updates            | 1600       |
|    policy_gradient_loss | 0.00543    |
|    std                  | 0.0595     |
|    value_loss           | 0.0265     |
----------------------------------------


----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -72.4      |
| time/                   |            |
|    fps                  | 181        |
|    iterations           | 61         |
|    time_elapsed         | 2747       |
|    total_timesteps      | 1749662    |
| train/                  |            |
|    approx_kl            | 0.46981454 |
|    clip_fraction        | 0.65       |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.16       |
|    explained_variance   | 0.989      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0153    |
|    n_updates            | 1610       |
|    policy_gradient_loss | 0.0144     |
|    std                  | 0.0599     |
|    value_loss           | 0.0286     |
----------------------------------------


----------------------------------------
| eval/                   |            |
|    mean_ep_length       | 1e+03      |
|    mean_reward          | -70.5      |
| time/                   |            |
|    total_timesteps      | 1749950    |
| train/                  |            |
|    approx_kl            | 0.56474996 |
|    clip_fraction        | 0.661      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.18       |
|    explained_variance   | 0.987      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0141    |
|    n_updates            | 1620       |
|    policy_gradient_loss | 0.0135     |
|    std                  | 0.0596     |
|    value_loss           | 0.0383     |
----------------------------------------


---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1e+03    |
|    ep_rew_mean     | -72.3    |
| time/              |          |
|    fps             | 179      |
|    iterations      | 62       |
|    time_elapsed    | 2824     |
|    total_timesteps | 1757854  |
---------------------------------


--------------------------------------
| rollout/                |          |
|    ep_len_mean          | 1e+03    |
|    ep_rew_mean          | -71.5    |
| time/                   |          |
|    fps                  | 180      |
|    iterations           | 63       |
|    time_elapsed         | 2866     |
|    total_timesteps      | 1766046  |
| train/                  |          |
|    approx_kl            | 9.164274 |
|    clip_fraction        | 0.947    |
|    clip_range           | 0.2      |
|    entropy_loss         | 4.22     |
|    explained_variance   | 0.985    |
|    learning_rate        | 0.0003   |
|    loss                 | -0.059   |
|    n_updates            | 1630     |
|    policy_gradient_loss | 0.0574   |
|    std                  | 0.0582   |
|    value_loss           | 0.044    |
--------------------------------------


----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -71.7      |
| time/                   |            |
|    fps                  | 180        |
|    iterations           | 64         |
|    time_elapsed         | 2900       |
|    total_timesteps      | 1774238    |
| train/                  |            |
|    approx_kl            | 0.34272176 |
|    clip_fraction        | 0.592      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.28       |
|    explained_variance   | 0.987      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0503    |
|    n_updates            | 1640       |
|    policy_gradient_loss | -0.00309   |
|    std                  | 0.0574     |
|    value_loss           | 0.0366     |
----------------------------------------


----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -71.4      |
| time/                   |            |
|    fps                  | 181        |
|    iterations           | 65         |
|    time_elapsed         | 2941       |
|    total_timesteps      | 1782430    |
| train/                  |            |
|    approx_kl            | 0.30744994 |
|    clip_fraction        | 0.616      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.31       |
|    explained_variance   | 0.989      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0215    |
|    n_updates            | 1650       |
|    policy_gradient_loss | 0.00264    |
|    std                  | 0.057      |
|    value_loss           | 0.0322     |
----------------------------------------


----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -71.3      |
| time/                   |            |
|    fps                  | 181        |
|    iterations           | 66         |
|    time_elapsed         | 2975       |
|    total_timesteps      | 1790622    |
| train/                  |            |
|    approx_kl            | 0.45994008 |
|    clip_fraction        | 0.652      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.29       |
|    explained_variance   | 0.99       |
|    learning_rate        | 0.0003     |
|    loss                 | 0.000559   |
|    n_updates            | 1660       |
|    policy_gradient_loss | 0.00572    |
|    std                  | 0.0573     |
|    value_loss           | 0.038      |
----------------------------------------


----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -70.7      |
| time/                   |            |
|    fps                  | 182        |
|    iterations           | 67         |
|    time_elapsed         | 3011       |
|    total_timesteps      | 1798814    |
| train/                  |            |
|    approx_kl            | 0.37656483 |
|    clip_fraction        | 0.657      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.3        |
|    explained_variance   | 0.985      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0438    |
|    n_updates            | 1670       |
|    policy_gradient_loss | 0.00807    |
|    std                  | 0.0573     |
|    value_loss           | 0.0407     |
----------------------------------------


----------------------------------------
| eval/                   |            |
|    mean_ep_length       | 1e+03      |
|    mean_reward          | -67.1      |
| time/                   |            |
|    total_timesteps      | 1799950    |
| train/                  |            |
|    approx_kl            | 0.42035627 |
|    clip_fraction        | 0.691      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.28       |
|    explained_variance   | 0.991      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0608    |
|    n_updates            | 1680       |
|    policy_gradient_loss | 0.0148     |
|    std                  | 0.0578     |
|    value_loss           | 0.0367     |
----------------------------------------


---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1e+03    |
|    ep_rew_mean     | -70.7    |
| time/              |          |
|    fps             | 180      |
|    iterations      | 68       |
|    time_elapsed    | 3094     |
|    total_timesteps | 1807006  |
---------------------------------


----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -70.4      |
| time/                   |            |
|    fps                  | 180        |
|    iterations           | 69         |
|    time_elapsed         | 3140       |
|    total_timesteps      | 1815198    |
| train/                  |            |
|    approx_kl            | 0.35457107 |
|    clip_fraction        | 0.655      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.26       |
|    explained_variance   | 0.987      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.00918   |
|    n_updates            | 1690       |
|    policy_gradient_loss | 0.013      |
|    std                  | 0.0581     |
|    value_loss           | 0.0426     |
----------------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 1e+03     |
|    ep_rew_mean          | -70.9     |
| time/                   |           |
|    fps                  | 180       |
|    iterations           | 70        |
|    time_elapsed         | 3174      |
|    total_timesteps      | 1823390   |
| train/                  |           |
|    approx_kl            | 0.3224172 |
|    clip_fraction        | 0.623     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.29      |
|    explained_variance   | 0.992     |
|    learning_rate        | 0.0003    |
|    loss                 | -0.0212   |
|    n_updates            | 1700      |
|    policy_gradient_loss | 0.00153   |
|    std                  | 0.0571    |
|    value_loss           | 0.0309    |
---------------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 1e+03     |
|    ep_rew_mean          | -70.8     |
| time/                   |           |
|    fps                  | 181       |
|    iterations           | 71        |
|    time_elapsed         | 3209      |
|    total_timesteps      | 1831582   |
| train/                  |           |
|    approx_kl            | 0.2761214 |
|    clip_fraction        | 0.608     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.32      |
|    explained_variance   | 0.992     |
|    learning_rate        | 0.0003    |
|    loss                 | -0.0182   |
|    n_updates            | 1710      |
|    policy_gradient_loss | 0.00296   |
|    std                  | 0.0567    |
|    value_loss           | 0.0264    |
---------------------------------------


----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -70.6      |
| time/                   |            |
|    fps                  | 181        |
|    iterations           | 72         |
|    time_elapsed         | 3244       |
|    total_timesteps      | 1839774    |
| train/                  |            |
|    approx_kl            | 0.33264872 |
|    clip_fraction        | 0.629      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.34       |
|    explained_variance   | 0.982      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0204    |
|    n_updates            | 1720       |
|    policy_gradient_loss | 0.0102     |
|    std                  | 0.0563     |
|    value_loss           | 0.0418     |
----------------------------------------


----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -70.6      |
| time/                   |            |
|    fps                  | 182        |
|    iterations           | 73         |
|    time_elapsed         | 3273       |
|    total_timesteps      | 1847966    |
| train/                  |            |
|    approx_kl            | 0.28999624 |
|    clip_fraction        | 0.654      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.35       |
|    explained_variance   | 0.985      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0247    |
|    n_updates            | 1730       |
|    policy_gradient_loss | 0.013      |
|    std                  | 0.056      |
|    value_loss           | 0.0461     |
----------------------------------------


---------------------------------------
| eval/                   |           |
|    mean_ep_length       | 1e+03     |
|    mean_reward          | -69.9     |
| time/                   |           |
|    total_timesteps      | 1849950   |
| train/                  |           |
|    approx_kl            | 0.2826835 |
|    clip_fraction        | 0.592     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.38      |
|    explained_variance   | 0.985     |
|    learning_rate        | 0.0003    |
|    loss                 | -0.0348   |
|    n_updates            | 1740      |
|    policy_gradient_loss | 0.000923  |
|    std                  | 0.0554    |
|    value_loss           | 0.0387    |
---------------------------------------


---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1e+03    |
|    ep_rew_mean     | -70.9    |
| time/              |          |
|    fps             | 181      |
|    iterations      | 74       |
|    time_elapsed    | 3346     |
|    total_timesteps | 1856158  |
---------------------------------


--------------------------------------
| rollout/                |          |
|    ep_len_mean          | 1e+03    |
|    ep_rew_mean          | -71.2    |
| time/                   |          |
|    fps                  | 182      |
|    iterations           | 75       |
|    time_elapsed         | 3375     |
|    total_timesteps      | 1864350  |
| train/                  |          |
|    approx_kl            | 0.282882 |
|    clip_fraction        | 0.614    |
|    clip_range           | 0.2      |
|    entropy_loss         | 4.41     |
|    explained_variance   | 0.979    |
|    learning_rate        | 0.0003   |
|    loss                 | -0.0253  |
|    n_updates            | 1750     |
|    policy_gradient_loss | 0.0133   |
|    std                  | 0.055    |
|    value_loss           | 0.0509   |
--------------------------------------


----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -71.2      |
| time/                   |            |
|    fps                  | 182        |
|    iterations           | 76         |
|    time_elapsed         | 3410       |
|    total_timesteps      | 1872542    |
| train/                  |            |
|    approx_kl            | 0.44329727 |
|    clip_fraction        | 0.627      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.42       |
|    explained_variance   | 0.983      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0193    |
|    n_updates            | 1760       |
|    policy_gradient_loss | 0.0083     |
|    std                  | 0.0551     |
|    value_loss           | 0.032      |
----------------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 1e+03     |
|    ep_rew_mean          | -71.3     |
| time/                   |           |
|    fps                  | 183       |
|    iterations           | 77        |
|    time_elapsed         | 3438      |
|    total_timesteps      | 1880734   |
| train/                  |           |
|    approx_kl            | 0.3415745 |
|    clip_fraction        | 0.634     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.42      |
|    explained_variance   | 0.981     |
|    learning_rate        | 0.0003    |
|    loss                 | -0.017    |
|    n_updates            | 1770      |
|    policy_gradient_loss | 0.0133    |
|    std                  | 0.055     |
|    value_loss           | 0.0382    |
---------------------------------------


----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -71.5      |
| time/                   |            |
|    fps                  | 184        |
|    iterations           | 78         |
|    time_elapsed         | 3467       |
|    total_timesteps      | 1888926    |
| train/                  |            |
|    approx_kl            | 0.32696256 |
|    clip_fraction        | 0.628      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.44       |
|    explained_variance   | 0.984      |
|    learning_rate        | 0.0003     |
|    loss                 | 0.0561     |
|    n_updates            | 1780       |
|    policy_gradient_loss | 0.00846    |
|    std                  | 0.0541     |
|    value_loss           | 0.0393     |
----------------------------------------


----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -71.6      |
| time/                   |            |
|    fps                  | 185        |
|    iterations           | 79         |
|    time_elapsed         | 3495       |
|    total_timesteps      | 1897118    |
| train/                  |            |
|    approx_kl            | 0.33196044 |
|    clip_fraction        | 0.621      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.47       |
|    explained_variance   | 0.983      |
|    learning_rate        | 0.0003     |
|    loss                 | 0.0333     |
|    n_updates            | 1790       |
|    policy_gradient_loss | 0.00304    |
|    std                  | 0.0538     |
|    value_loss           | 0.0321     |
----------------------------------------


----------------------------------------
| eval/                   |            |
|    mean_ep_length       | 1e+03      |
|    mean_reward          | -73.7      |
| time/                   |            |
|    total_timesteps      | 1899950    |
| train/                  |            |
|    approx_kl            | 0.34463716 |
|    clip_fraction        | 0.615      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.52       |
|    explained_variance   | 0.983      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0428    |
|    n_updates            | 1800       |
|    policy_gradient_loss | 0.00486    |
|    std                  | 0.0531     |
|    value_loss           | 0.0351     |
----------------------------------------


---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1e+03    |
|    ep_rew_mean     | -72.3    |
| time/              |          |
|    fps             | 183      |
|    iterations      | 80       |
|    time_elapsed    | 3562     |
|    total_timesteps | 1905310  |
---------------------------------


--------------------------------------
| rollout/                |          |
|    ep_len_mean          | 1e+03    |
|    ep_rew_mean          | -72.5    |
| time/                   |          |
|    fps                  | 184      |
|    iterations           | 81       |
|    time_elapsed         | 3591     |
|    total_timesteps      | 1913502  |
| train/                  |          |
|    approx_kl            | 0.531317 |
|    clip_fraction        | 0.592    |
|    clip_range           | 0.2      |
|    entropy_loss         | 4.56     |
|    explained_variance   | 0.982    |
|    learning_rate        | 0.0003   |
|    loss                 | -0.0374  |
|    n_updates            | 1810     |
|    policy_gradient_loss | -0.00377 |
|    std                  | 0.0522   |
|    value_loss           | 0.0336   |
--------------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 1e+03     |
|    ep_rew_mean          | -72.2     |
| time/                   |           |
|    fps                  | 185       |
|    iterations           | 82        |
|    time_elapsed         | 3627      |
|    total_timesteps      | 1921694   |
| train/                  |           |
|    approx_kl            | 0.5851048 |
|    clip_fraction        | 0.728     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.5       |
|    explained_variance   | 0.98      |
|    learning_rate        | 0.0003    |
|    loss                 | -0.0611   |
|    n_updates            | 1820      |
|    policy_gradient_loss | 0.0392    |
|    std                  | 0.054     |
|    value_loss           | 0.055     |
---------------------------------------


----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -71.4      |
| time/                   |            |
|    fps                  | 185        |
|    iterations           | 83         |
|    time_elapsed         | 3665       |
|    total_timesteps      | 1929886    |
| train/                  |            |
|    approx_kl            | 0.42724806 |
|    clip_fraction        | 0.644      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.46       |
|    explained_variance   | 0.985      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0134    |
|    n_updates            | 1830       |
|    policy_gradient_loss | 0.00757    |
|    std                  | 0.0546     |
|    value_loss           | 0.0357     |
----------------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 1e+03     |
|    ep_rew_mean          | -71       |
| time/                   |           |
|    fps                  | 185       |
|    iterations           | 84        |
|    time_elapsed         | 3700      |
|    total_timesteps      | 1938078   |
| train/                  |           |
|    approx_kl            | 0.3869899 |
|    clip_fraction        | 0.651     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.48      |
|    explained_variance   | 0.987     |
|    learning_rate        | 0.0003    |
|    loss                 | 0.0144    |
|    n_updates            | 1840      |
|    policy_gradient_loss | 0.00217   |
|    std                  | 0.0537    |
|    value_loss           | 0.039     |
---------------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 1e+03     |
|    ep_rew_mean          | -70.7     |
| time/                   |           |
|    fps                  | 186       |
|    iterations           | 85        |
|    time_elapsed         | 3730      |
|    total_timesteps      | 1946270   |
| train/                  |           |
|    approx_kl            | 0.3664308 |
|    clip_fraction        | 0.614     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.53      |
|    explained_variance   | 0.988     |
|    learning_rate        | 0.0003    |
|    loss                 | -0.0345   |
|    n_updates            | 1850      |
|    policy_gradient_loss | -0.00402  |
|    std                  | 0.0525    |
|    value_loss           | 0.042     |
---------------------------------------


----------------------------------------
| eval/                   |            |
|    mean_ep_length       | 1e+03      |
|    mean_reward          | -69.9      |
| time/                   |            |
|    total_timesteps      | 1949950    |
| train/                  |            |
|    approx_kl            | 0.47037923 |
|    clip_fraction        | 0.604      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.6        |
|    explained_variance   | 0.99       |
|    learning_rate        | 0.0003     |
|    loss                 | -0.063     |
|    n_updates            | 1860       |
|    policy_gradient_loss | -0.00816   |
|    std                  | 0.0515     |
|    value_loss           | 0.0353     |
----------------------------------------


---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1e+03    |
|    ep_rew_mean     | -70.1    |
| time/              |          |
|    fps             | 185      |
|    iterations      | 86       |
|    time_elapsed    | 3797     |
|    total_timesteps | 1954462  |
---------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 1e+03     |
|    ep_rew_mean          | -69.5     |
| time/                   |           |
|    fps                  | 186       |
|    iterations           | 87        |
|    time_elapsed         | 3826      |
|    total_timesteps      | 1962654   |
| train/                  |           |
|    approx_kl            | 0.8379854 |
|    clip_fraction        | 0.717     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.64      |
|    explained_variance   | 0.988     |
|    learning_rate        | 0.0003    |
|    loss                 | 0.0466    |
|    n_updates            | 1870      |
|    policy_gradient_loss | 0.0196    |
|    std                  | 0.0512    |
|    value_loss           | 0.0463    |
---------------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 1e+03     |
|    ep_rew_mean          | -68.9     |
| time/                   |           |
|    fps                  | 186       |
|    iterations           | 88        |
|    time_elapsed         | 3862      |
|    total_timesteps      | 1970846   |
| train/                  |           |
|    approx_kl            | 0.5603474 |
|    clip_fraction        | 0.694     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.64      |
|    explained_variance   | 0.986     |
|    learning_rate        | 0.0003    |
|    loss                 | 0.0257    |
|    n_updates            | 1880      |
|    policy_gradient_loss | 0.0224    |
|    std                  | 0.0512    |
|    value_loss           | 0.0522    |
---------------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 1e+03     |
|    ep_rew_mean          | -68.4     |
| time/                   |           |
|    fps                  | 187       |
|    iterations           | 89        |
|    time_elapsed         | 3898      |
|    total_timesteps      | 1979038   |
| train/                  |           |
|    approx_kl            | 0.4053773 |
|    clip_fraction        | 0.632     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.68      |
|    explained_variance   | 0.991     |
|    learning_rate        | 0.0003    |
|    loss                 | -0.0254   |
|    n_updates            | 1890      |
|    policy_gradient_loss | -1.41e-05 |
|    std                  | 0.0499    |
|    value_loss           | 0.0351    |
---------------------------------------


---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 1e+03     |
|    ep_rew_mean          | -67.7     |
| time/                   |           |
|    fps                  | 187       |
|    iterations           | 90        |
|    time_elapsed         | 3934      |
|    total_timesteps      | 1987230   |
| train/                  |           |
|    approx_kl            | 0.5226428 |
|    clip_fraction        | 0.678     |
|    clip_range           | 0.2       |
|    entropy_loss         | 4.72      |
|    explained_variance   | 0.991     |
|    learning_rate        | 0.0003    |
|    loss                 | -0.039    |
|    n_updates            | 1900      |
|    policy_gradient_loss | 0.00792   |
|    std                  | 0.0498    |
|    value_loss           | 0.0355    |
---------------------------------------


----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -67.1      |
| time/                   |            |
|    fps                  | 188        |
|    iterations           | 91         |
|    time_elapsed         | 3963       |
|    total_timesteps      | 1995422    |
| train/                  |            |
|    approx_kl            | 0.45317417 |
|    clip_fraction        | 0.673      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.73       |
|    explained_variance   | 0.99       |
|    learning_rate        | 0.0003     |
|    loss                 | 0.0244     |
|    n_updates            | 1910       |
|    policy_gradient_loss | 0.00997    |
|    std                  | 0.0495     |
|    value_loss           | 0.0384     |
----------------------------------------


----------------------------------------
| eval/                   |            |
|    mean_ep_length       | 1e+03      |
|    mean_reward          | -72        |
| time/                   |            |
|    total_timesteps      | 1999950    |
| train/                  |            |
|    approx_kl            | 0.60438097 |
|    clip_fraction        | 0.675      |
|    clip_range           | 0.2        |
|    entropy_loss         | 4.76       |
|    explained_variance   | 0.989      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0345    |
|    n_updates            | 1920       |
|    policy_gradient_loss | 0.00926    |
|    std                  | 0.0489     |
|    value_loss           | 0.0435     |
----------------------------------------


---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1e+03    |
|    ep_rew_mean     | -66.1    |
| time/              |          |
|    fps             | 186      |
|    iterations      | 92       |
|    time_elapsed    | 4030     |
|    total_timesteps | 2003614  |
---------------------------------


Training completed! Total steps reached: 2000000


## 5. Save the Trained Model

Save the trained PPO model to disk for later evaluation or continued training.

In [8]:
# Save the final trained model
model_path = "./base_models/ppo_carracing_final"
model.save(model_path)
print(f"Model saved to: {model_path}")

# Save the vectorized environment statistics (if using VecNormalize)
# env.save("./models/vec_normalize.pkl")

# Close environments
env.close()
eval_env.close()

print("All environments closed successfully!")

Model saved to: ./models/ppo_carracing_final
All environments closed successfully!
All environments closed successfully!


## Next Steps

- To evaluate the trained model, load it using `PPO.load(model_path)`
- Use TensorBoard to visualize training progress: `tensorboard --logdir ./logs/`
- Fine-tune hyperparameters based on training curves
- Consider using frame stacking or action repeat for better performance

In [19]:
from gymnasium.wrappers import RecordVideo

# Create a new environment with human rendering
test_env = gym.make(ENV_ID, render_mode="rgb_array")
model_path = "./base_models/best_model/best_model.zip"

test_env = RecordVideo(
    test_env,
    video_folder="./videos/base",
    episode_trigger=lambda episode_id: True  # Record all episodes
)

# Load the trained model
model = PPO.load(model_path)

NUM_EPISODES = 10
for i in range(NUM_EPISODES):
    # Run the agent in the environment
    obs, info = test_env.reset()
    done = False
    total_reward = 0
    step_count = 0

    print(f"Starting to play Episode {i+1} with the trained agent...")
    print("Close the window to stop.")

    try:
        while not done:
            # Predict action using the trained model
            action, _states = model.predict(obs, deterministic=True)

            # Take action in the environment
            obs, reward, terminated, truncated, info = test_env.step(action)
            done = terminated or truncated

            total_reward += reward
            step_count += 1

            # Optional: print progress every 100 steps
            if step_count % 100 == 0:
                print(f"Step: {step_count}, Total Reward: {total_reward:.2f}")

        print(f"\nEpisode {i+1} finished!")
        print(f"Total steps: {step_count}")
        print(f"Total reward: {total_reward:.2f}")

    finally:
        test_env.close()

Starting to play Episode 1 with the trained agent...
Close the window to stop.
Step: 100, Total Reward: 0.03
Step: 100, Total Reward: 0.03
Step: 200, Total Reward: 0.07
Step: 200, Total Reward: 0.07
Step: 300, Total Reward: -9.93
Step: 300, Total Reward: -9.93
Step: 400, Total Reward: -13.24
Step: 400, Total Reward: -13.24
Step: 500, Total Reward: -13.21
Step: 500, Total Reward: -13.21
Step: 600, Total Reward: -23.21
Step: 600, Total Reward: -23.21
Step: 700, Total Reward: 47.06
Step: 700, Total Reward: 47.06
Step: 800, Total Reward: 87.22
Step: 800, Total Reward: 87.22
Step: 900, Total Reward: 77.22
Step: 900, Total Reward: 77.22
Step: 1000, Total Reward: 67.22

Episode 1 finished!
Total steps: 1000
Total reward: 67.22
Step: 1000, Total Reward: 67.22

Episode 1 finished!
Total steps: 1000
Total reward: 67.22
Starting to play Episode 2 with the trained agent...
Close the window to stop.
Starting to play Episode 2 with the trained agent...
Close the window to stop.
Step: 100, Total Rewa