## 0. Notebook description
In this notebook, we make the environment even more complex and train multiple agents using different algorithms in order to compare their performance. We modify key environment parameters, such as the `vehicles_count` and `lanes_count`, `vehicles_density` and `initial_lane_id` to introduce greater challenges (e.g., denser traffic, more lanes). 

## 1. Environment Modification
First, we have changed the environment so that the modified environment parameters are included.

In [1]:
import gymnasium
import highway_env
from gymnasium import register

register(
  id='CustomRewardEnv',
  entry_point='HighwayEnvCustomReward:HighwayEnvFastCustomReward',
)

config_updates = {
    "safe_distance_reward": 0.1,
    "left_vehicle_overtaken_reward": -0.5,
    "collision_reward": -4,
    "smooth_driving_reward" : 0.3,
    "right_lane_reward" : 0.5,
    "lanes_count": 8,  # More lanes
    "vehicles_count": 120,  # More vehicles
    "vehicles_density": 2.0,  # Increased vehicle density
    "initial_lane_id": 3,  # Start from the middle lane
}

## 2. Train agent using DQN algorithm
In this section, we train an agent using the Deep Q-Network (DQN) algorithm in the modified, more complex environment. The DQN algorithm utilizes a neural network to approximate the Q-value function, enabling the agent to make decisions based on the expected rewards of actions in given states.

In [5]:
from utils.training import train_model

log_filename="5_dqn_complex.csv"
log_performance_metrics_enabled=False

env = gymnasium.make('CustomRewardEnv', 
                     render_mode='rgb_array', 
                     log_performance_metrics_enabled=log_performance_metrics_enabled, 
                     log_filename=log_filename
                    )

# train model
train_model(
    env=env,
    config_updates=config_updates,
    session_name='5_Group15_RLProject_DQN',
    algorithm='DQN'
)

{'action': {'type': 'DiscreteMetaAction'},
 'centering_position': [0.3, 0.5],
 'collision_reward': -4,
 'controlled_vehicles': 1,
 'duration': 30,
 'ego_spacing': 1.5,
 'high_speed_reward': 0.4,
 'initial_lane_id': 3,
 'lane_change_reward': 0,
 'lanes_count': 8,
 'left_vehicle_overtaken_reward': -0.5,
 'manual_control': False,
 'normalize_reward': True,
 'observation': {'type': 'Kinematics'},
 'offroad_terminal': False,
 'offscreen_rendering': False,
 'other_vehicles_type': 'highway_env.vehicle.behavior.IDMVehicle',
 'policy_frequency': 1,
 'real_time_rendering': False,
 'render_agent': True,
 'reward_speed_range': [20, 30],
 'right_lane_reward': 0.5,
 'safe_distance_reward': 0.1,
 'scaling': 5.5,
 'screen_height': 150,
 'screen_width': 600,
 'show_trajectories': False,
 'simulation_frequency': 5,
 'smooth_driving_reward': 0.3,
 'vehicles_count': 120,
 'vehicles_density': 2.0}
Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Logging to ./log

In [6]:
from utils.evaluation import evaluate_model, aggregate_and_normalize_rewards

# evaluate model
log_performance_metrics_enabled=True
log_filename="5_dqn_complex.csv"

env = gymnasium.make('CustomRewardEnv', 
                     render_mode='rgb_array', 
                     log_performance_metrics_enabled=log_performance_metrics_enabled, 
                     log_filename=log_filename
                    )
evaluate_model(
    env=env,
    config_updates={**config_updates, "simulation_frequency": 15},
    model_path='models/5_Group15_RLProject_DQN',
    algorithm='DQN',
)

{'action': {'type': 'DiscreteMetaAction'},
 'centering_position': [0.3, 0.5],
 'collision_reward': -4,
 'controlled_vehicles': 1,
 'duration': 30,
 'ego_spacing': 1.5,
 'high_speed_reward': 0.4,
 'initial_lane_id': 3,
 'lane_change_reward': 0,
 'lanes_count': 8,
 'left_vehicle_overtaken_reward': -0.5,
 'manual_control': False,
 'normalize_reward': True,
 'observation': {'type': 'Kinematics'},
 'offroad_terminal': False,
 'offscreen_rendering': False,
 'other_vehicles_type': 'highway_env.vehicle.behavior.IDMVehicle',
 'policy_frequency': 1,
 'real_time_rendering': False,
 'render_agent': True,
 'reward_speed_range': [20, 30],
 'right_lane_reward': 0.5,
 'safe_distance_reward': 0.1,
 'scaling': 5.5,
 'screen_height': 150,
 'screen_width': 600,
 'show_trajectories': False,
 'simulation_frequency': 15,
 'smooth_driving_reward': 0.3,
 'vehicles_count': 120,
 'vehicles_density': 2.0}
Logging metrics for step 15 and seconds elapsed 1.0
Logging metrics for step 30 and seconds elapsed 2.0
Loggi

In [7]:
metrics = aggregate_and_normalize_rewards(log_filename)

if metrics:
    print("Performance metric (as percent of all steps):")
    for metric_name, avg_metric in metrics.items():
        print(f"{metric_name}: {avg_metric*100:.4f}%")

Performance metric (as percent of all steps):
collision_count: 5.1935%
right_lane_count: 100.0000%
on_road_count: 100.0000%
safe_distance_count: 96.1982%
left_vehicle_overtaken_count: 15.7162%
abrupt_accelerations_count: 8.7916%


## 3. Train agent using A2C algorithm
In this section, we train an agent using the Advantage Actor-Critic (A2C) algorithm. A2C is a policy gradient method that combines a value function (critic) to estimate the advantage and a policy function (actor) to guide the agent's actions.

In [None]:
from utils.training import train_model

log_filename="5_a2c_complex.csv"
log_performance_metrics_enabled=False

env = gymnasium.make('CustomRewardEnv', 
                     render_mode='rgb_array', 
                     log_performance_metrics_enabled=log_performance_metrics_enabled, 
                     log_filename=log_filename
                    )

# train model
train_model(
    env=env,
    config_updates=config_updates,
    session_name='5_Group15_RLProject_A2C',
    algorithm='A2C'
)

{'action': {'type': 'DiscreteMetaAction'},
 'centering_position': [0.3, 0.5],
 'collision_reward': -4,
 'controlled_vehicles': 1,
 'duration': 30,
 'ego_spacing': 1.5,
 'high_speed_reward': 0.4,
 'initial_lane_id': 3,
 'lane_change_reward': 0,
 'lanes_count': 8,
 'left_vehicle_overtaken_reward': -0.5,
 'manual_control': False,
 'normalize_reward': True,
 'observation': {'type': 'Kinematics'},
 'offroad_terminal': False,
 'offscreen_rendering': False,
 'other_vehicles_type': 'highway_env.vehicle.behavior.IDMVehicle',
 'policy_frequency': 1,
 'real_time_rendering': False,
 'render_agent': True,
 'reward_speed_range': [20, 30],
 'right_lane_reward': 0.5,
 'safe_distance_reward': 0.1,
 'scaling': 5.5,
 'screen_height': 150,
 'screen_width': 600,
 'show_trajectories': False,
 'simulation_frequency': 5,
 'smooth_driving_reward': 0.3,
 'vehicles_count': 120,
 'vehicles_density': 2.0}
Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Logging to ./log

In [8]:
from utils.evaluation import evaluate_model, aggregate_and_normalize_rewards

# evaluate model
log_performance_metrics_enabled=True
log_filename="5_a2c_complex.csv"

env = gymnasium.make('CustomRewardEnv', 
                     render_mode='rgb_array', 
                     log_performance_metrics_enabled=log_performance_metrics_enabled, 
                     log_filename=log_filename
                    )
evaluate_model(
    env=env,
    config_updates={**config_updates, "simulation_frequency": 15},
    model_path='models/5_Group15_RLProject_A2C',
    algorithm='A2C',
)

{'action': {'type': 'DiscreteMetaAction'},
 'centering_position': [0.3, 0.5],
 'collision_reward': -4,
 'controlled_vehicles': 1,
 'duration': 30,
 'ego_spacing': 1.5,
 'high_speed_reward': 0.4,
 'initial_lane_id': 3,
 'lane_change_reward': 0,
 'lanes_count': 8,
 'left_vehicle_overtaken_reward': -0.5,
 'manual_control': False,
 'normalize_reward': True,
 'observation': {'type': 'Kinematics'},
 'offroad_terminal': False,
 'offscreen_rendering': False,
 'other_vehicles_type': 'highway_env.vehicle.behavior.IDMVehicle',
 'policy_frequency': 1,
 'real_time_rendering': False,
 'render_agent': True,
 'reward_speed_range': [20, 30],
 'right_lane_reward': 0.5,
 'safe_distance_reward': 0.1,
 'scaling': 5.5,
 'screen_height': 150,
 'screen_width': 600,
 'show_trajectories': False,
 'simulation_frequency': 15,
 'smooth_driving_reward': 0.3,
 'vehicles_count': 120,
 'vehicles_density': 2.0}
Logging metrics for step 15 and seconds elapsed 1.0
Logging metrics for step 30 and seconds elapsed 2.0
Loggi

In [9]:
metrics = aggregate_and_normalize_rewards(log_filename)

if metrics:
    print("Performance metric (as percent of all steps):")
    for metric_name, avg_metric in metrics.items():
        print(f"{metric_name}: {avg_metric*100:.4f}%")

Performance metric (as percent of all steps):
collision_count: 2.9544%
right_lane_count: 100.0000%
on_road_count: 100.0000%
safe_distance_count: 97.7466%
left_vehicle_overtaken_count: 11.0666%
abrupt_accelerations_count: 7.2859%


## 4. Train agent using PPO algorithm
In this section, we train an agent using the Proximal Policy Optimization (PPO) algorithm. PPO is a popular reinforcement learning method known for its stability and sample efficiency, achieved through constrained policy updates.

In [3]:
from utils.training import train_model

log_filename="5_ppo_complex.csv"
log_performance_metrics_enabled=False

env = gymnasium.make('CustomRewardEnv', 
                     render_mode='rgb_array', 
                     log_performance_metrics_enabled=log_performance_metrics_enabled, 
                     log_filename=log_filename
                    )

# train model
train_model(
    env=env,
    config_updates=config_updates,
    session_name='5_Group15_RLProject_PPO',
    algorithm='PPO'
)

{'action': {'type': 'DiscreteMetaAction'},
 'centering_position': [0.3, 0.5],
 'collision_reward': -4,
 'controlled_vehicles': 1,
 'duration': 30,
 'ego_spacing': 1.5,
 'high_speed_reward': 0.4,
 'initial_lane_id': 3,
 'lane_change_reward': 0,
 'lanes_count': 8,
 'left_vehicle_overtaken_reward': -0.5,
 'manual_control': False,
 'normalize_reward': True,
 'observation': {'type': 'Kinematics'},
 'offroad_terminal': False,
 'offscreen_rendering': False,
 'other_vehicles_type': 'highway_env.vehicle.behavior.IDMVehicle',
 'policy_frequency': 1,
 'real_time_rendering': False,
 'render_agent': True,
 'reward_speed_range': [20, 30],
 'right_lane_reward': 0.5,
 'safe_distance_reward': 0.1,
 'scaling': 5.5,
 'screen_height': 150,
 'screen_width': 600,
 'show_trajectories': False,
 'simulation_frequency': 5,
 'smooth_driving_reward': 0.3,
 'vehicles_count': 120,
 'vehicles_density': 2.0}
Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Logging to ./log

In [3]:
from utils.evaluation import evaluate_model, aggregate_and_normalize_rewards

# evaluate model
log_performance_metrics_enabled=True
log_filename="5_ppo_complex.csv"

env = gymnasium.make('CustomRewardEnv', 
                     render_mode='rgb_array', 
                     log_performance_metrics_enabled=log_performance_metrics_enabled, 
                     log_filename=log_filename
                    )
evaluate_model(
    env=env,
    config_updates={**config_updates, "simulation_frequency": 15},
    model_path='models/5_Group15_RLProject_PPO',
    algorithm='PPO',
)

{'action': {'type': 'DiscreteMetaAction'},
 'centering_position': [0.3, 0.5],
 'collision_reward': -4,
 'controlled_vehicles': 1,
 'duration': 30,
 'ego_spacing': 1.5,
 'high_speed_reward': 0.4,
 'initial_lane_id': 3,
 'lane_change_reward': 0,
 'lanes_count': 8,
 'left_vehicle_overtaken_reward': -0.5,
 'manual_control': False,
 'normalize_reward': True,
 'observation': {'type': 'Kinematics'},
 'offroad_terminal': False,
 'offscreen_rendering': False,
 'other_vehicles_type': 'highway_env.vehicle.behavior.IDMVehicle',
 'policy_frequency': 1,
 'real_time_rendering': False,
 'render_agent': True,
 'reward_speed_range': [20, 30],
 'right_lane_reward': 0.5,
 'safe_distance_reward': 0.1,
 'scaling': 5.5,
 'screen_height': 150,
 'screen_width': 600,
 'show_trajectories': False,
 'simulation_frequency': 15,
 'smooth_driving_reward': 0.3,
 'vehicles_count': 120,
 'vehicles_density': 2.0}
Logging metrics for step 15 and seconds elapsed 1.0
Logging metrics for step 30 and seconds elapsed 2.0
Loggi

In [4]:
metrics = aggregate_and_normalize_rewards(log_filename)

if metrics:
    print("Performance metric (as percent of all steps):")
    for metric_name, avg_metric in metrics.items():
        print(f"{metric_name}: {avg_metric*100:.4f}%")

Performance metric (as percent of all steps):
collision_count: 3.3073%
right_lane_count: 100.0000%
on_road_count: 100.0000%
safe_distance_count: 97.5260%
left_vehicle_overtaken_count: 10.9115%
abrupt_accelerations_count: 7.7344%


## 5. Comparison