In [1]:
from utils.training import train_model
from utils.evaluation import evaluate_model, aggregate_and_normalize_rewards
from gymnasium import register
import gymnasium


# 0. Notebook description
In this notebook, we will pass the image of the road and surrounding vehicles (like those displayed during the video animations of model evaluation) to the algorithm. 
Each agent that we train in this notebook will be using the GrayscaleObservation type, instead of Kinematics, as in the previous notebooks. 

We first train agents with the default reward function and our custom reward function using a CnnPolicy. In addition, we train agents with both reward functions using the MlpPolicy. 
CnnPolicy is specifically designed for image-based input data, using a CNN to process the image inputs, which is ideal for this use case. 
However, we have been using MlpPolicy as our default policy in every notebook until now, which works with 1D/flat vectors. We were curious to see whether an agent would still be able to extract any useful information from the images, using this policy. 

# 1. Default reward function with CNN policy

We use the observation configuration specified in the highway-env documentation, where we specify that we want a 128 by 64 pixel image (the observation_shape) parameter, passing 4 images at a time as input (using the stack_size) parameter.
Below, we train the agent with a default reward function on these images. 

In [2]:
config_updates = {
    "observation": {
        "type": "GrayscaleObservation",
        "observation_shape": (128, 64),
        "stack_size": 4,
        "weights": [0.2989, 0.5870, 0.1140],  # weights for RGB conversion
        "scaling": 1.75,
    },
}

register(
    id='DefaultRewardEnv',
    entry_point='HighwayEnvDefaultReward:HighwayEnvDefaultReward',
)


In [3]:
log_filename="6_dqn_image_default_reward_log_cnnpolicy.csv"
log_performance_metrics_enabled=False

In [4]:
env = gymnasium.make("DefaultRewardEnv", 
                         config=config_updates, 
                     log_performance_metrics_enabled=log_performance_metrics_enabled, 
                     log_filename=log_filename, 
                         )
train_model(
    env=env,
    config_updates=config_updates,
    session_name='6_Group15_RLProject_dqn_default_image_cnnpolicy',
    policy="CnnPolicy"

)

{'action': {'type': 'DiscreteMetaAction'},
 'centering_position': [0.3, 0.5],
 'collision_reward': -1,
 'controlled_vehicles': 1,
 'duration': 30,
 'ego_spacing': 1.5,
 'high_speed_reward': 0.4,
 'initial_lane_id': None,
 'lane_change_reward': 0,
 'lanes_count': 3,
 'manual_control': False,
 'normalize_reward': True,
 'observation': {'observation_shape': (128, 64),
                 'scaling': 1.75,
                 'stack_size': 4,
                 'type': 'GrayscaleObservation',
                 'weights': [0.2989, 0.587, 0.114]},
 'offroad_terminal': False,
 'offscreen_rendering': False,
 'other_vehicles_type': 'highway_env.vehicle.behavior.IDMVehicle',
 'policy_frequency': 1,
 'real_time_rendering': False,
 'render_agent': True,
 'reward_speed_range': [20, 30],
 'right_lane_reward': 0.1,
 'scaling': 5.5,
 'screen_height': 150,
 'screen_width': 600,
 'show_trajectories': False,
 'simulation_frequency': 5,
 'vehicles_count': 20,
 'vehicles_density': 1}
Training with policy CnnPolicy
U



Logging to ./logs/tensorboard/6_Group15_RLProject_dqn_default_image_cnnpolicy_DQN/DQN_1
----------------------------------
| rollout/            |          |
|    ep_len_mean      | 14.2     |
|    ep_rew_mean      | 11       |
|    exploration_rate | 0.973    |
| time/               |          |
|    episodes         | 4        |
|    fps              | 105      |
|    time_elapsed     | 0        |
|    total_timesteps  | 57       |
----------------------------------
----------------------------------
| rollout/            |          |
|    ep_len_mean      | 14.6     |
|    ep_rew_mean      | 11.4     |
|    exploration_rate | 0.944    |
| time/               |          |
|    episodes         | 8        |
|    fps              | 92       |
|    time_elapsed     | 1        |
|    total_timesteps  | 117      |
| train/              |          |
|    learning_rate    | 0.0001   |
|    loss             | 0.327    |
|    n_updates        | 4        |
----------------------------------
--

In [5]:
log_performance_metrics_enabled=True



env = gymnasium.make('DefaultRewardEnv', 
                     render_mode='rgb_array', 
                    log_performance_metrics_enabled=log_performance_metrics_enabled,
                     log_filename=log_filename
                     )

# evaluate the model with the default reward function
evaluate_model(
    env=env,
    config_updates={**config_updates, "simulation_frequency": 15},
    model_path="models/6_Group15_RLProject_dqn_default_image_cnnpolicy",
    algorithm='DQN',
)

{'action': {'type': 'DiscreteMetaAction'},
 'centering_position': [0.3, 0.5],
 'collision_reward': -1,
 'controlled_vehicles': 1,
 'duration': 30,
 'ego_spacing': 1.5,
 'high_speed_reward': 0.4,
 'initial_lane_id': None,
 'lane_change_reward': 0,
 'lanes_count': 3,
 'manual_control': False,
 'normalize_reward': True,
 'observation': {'observation_shape': (128, 64),
                 'scaling': 1.75,
                 'stack_size': 4,
                 'type': 'GrayscaleObservation',
                 'weights': [0.2989, 0.587, 0.114]},
 'offroad_terminal': False,
 'offscreen_rendering': False,
 'other_vehicles_type': 'highway_env.vehicle.behavior.IDMVehicle',
 'policy_frequency': 1,
 'real_time_rendering': False,
 'render_agent': True,
 'reward_speed_range': [20, 30],
 'right_lane_reward': 0.1,
 'scaling': 5.5,
 'screen_height': 150,
 'screen_width': 600,
 'show_trajectories': False,
 'simulation_frequency': 15,
 'vehicles_count': 20,
 'vehicles_density': 1}
Loading model with path models/



Logging metrics for step 30 and seconds elapsed 2.0
Logging metrics for step 45 and seconds elapsed 3.0
Logging metrics for step 60 and seconds elapsed 4.0
Logging metrics for step 75 and seconds elapsed 5.0
Logging metrics for step 90 and seconds elapsed 6.0
Logging metrics for step 105 and seconds elapsed 7.0
Logging metrics for step 120 and seconds elapsed 8.0
Logging metrics for step 135 and seconds elapsed 9.0
Logging metrics for step 150 and seconds elapsed 10.0
Logging metrics for step 165 and seconds elapsed 11.0
Logging metrics for step 180 and seconds elapsed 12.0
Logging metrics for step 195 and seconds elapsed 13.0
Logging metrics for step 210 and seconds elapsed 14.0
Logging metrics for step 15 and seconds elapsed 1.0
Logging metrics for step 30 and seconds elapsed 2.0
Logging metrics for step 45 and seconds elapsed 3.0
Logging metrics for step 60 and seconds elapsed 4.0
Logging metrics for step 75 and seconds elapsed 5.0
Logging metrics for step 90 and seconds elapsed 6.0

In [6]:
metrics = aggregate_and_normalize_rewards(log_filename)

if metrics:
    print("Performance metric (as percent of all steps):")
    for metric_name, avg_metric in metrics.items():
        print(f"{metric_name}: {avg_metric*100:.4f}%")

Performance metric (as percent of all steps):
collision_count: 7.0666%
right_lane_count: 70.0493%
on_road_count: 100.0000%
safe_distance_count: 94.7412%
left_vehicle_overtaken_count: 8.2169%
abrupt_accelerations_count: 30.8135%


The model trained with the default reward function on images of the highway collides at a rate of 7% of all recorded steps, and was not in the leftmost lane for 70% of steps. 

# 2. Custom reward function with CNN policy

Below, we train a model with our custom reward function using the DQN algorithm, and inspect its performance on our predefined metrics.


In [7]:
config_updates = {
    "safe_distance_reward": 0.1,
    "left_vehicle_overtaken_reward": -0.5,
    "collision_reward": -4,
    "smooth_driving_reward" : 0.3,
    "right_lane_reward" : 0.5, 
    "observation": {
        "type": "GrayscaleObservation",
        "observation_shape": (128, 64),
        "stack_size": 4,
        "weights": [0.2989, 0.5870, 0.1140],  # weights for RGB conversion
        "scaling": 1.75,
    },
}

# Register the custom environment
register(
    id='CustomRewardEnv',
    entry_point='HighwayEnvCustomReward:HighwayEnvFastCustomReward',
)

In [8]:
# Create the environment with the custom parameter
# Set log_rewards_enabled to True or False as per your requirement
log_filename="6_dqn_image_custom_reward_log_cnnpolicy.csv"
log_performance_metrics_enabled=False

In [9]:
env = gymnasium.make("CustomRewardEnv", 
                    config=config_updates, 
                     log_performance_metrics_enabled=log_performance_metrics_enabled,
                     log_filename=log_filename
                         )
train_model(
    env=env,
    config_updates=config_updates,
    session_name='6_Group15_RLProject_dqn_custom_image_cnnpolicy',
    policy="CnnPolicy"

)

{'action': {'type': 'DiscreteMetaAction'},
 'centering_position': [0.3, 0.5],
 'collision_reward': -4,
 'controlled_vehicles': 1,
 'duration': 30,
 'ego_spacing': 1.5,
 'high_speed_reward': 0.4,
 'initial_lane_id': None,
 'lane_change_reward': 0,
 'lanes_count': 3,
 'left_vehicle_overtaken_reward': -0.5,
 'manual_control': False,
 'normalize_reward': True,
 'observation': {'observation_shape': (128, 64),
                 'scaling': 1.75,
                 'stack_size': 4,
                 'type': 'GrayscaleObservation',
                 'weights': [0.2989, 0.587, 0.114]},
 'offroad_terminal': False,
 'offscreen_rendering': False,
 'other_vehicles_type': 'highway_env.vehicle.behavior.IDMVehicle',
 'policy_frequency': 1,
 'real_time_rendering': False,
 'render_agent': True,
 'reward_speed_range': [20, 30],
 'right_lane_reward': 0.5,
 'safe_distance_reward': 0.1,
 'scaling': 5.5,
 'screen_height': 150,
 'screen_width': 600,
 'show_trajectories': False,
 'simulation_frequency': 5,
 'smooth_



----------------------------------
| rollout/            |          |
|    ep_len_mean      | 9        |
|    ep_rew_mean      | 7.33     |
|    exploration_rate | 0.983    |
| time/               |          |
|    episodes         | 4        |
|    fps              | 84       |
|    time_elapsed     | 0        |
|    total_timesteps  | 36       |
----------------------------------
----------------------------------
| rollout/            |          |
|    ep_len_mean      | 13.4     |
|    ep_rew_mean      | 11.3     |
|    exploration_rate | 0.949    |
| time/               |          |
|    episodes         | 8        |
|    fps              | 91       |
|    time_elapsed     | 1        |
|    total_timesteps  | 107      |
| train/              |          |
|    learning_rate    | 0.0001   |
|    loss             | 0.41     |
|    n_updates        | 1        |
----------------------------------
----------------------------------
| rollout/            |          |
|    ep_len_mean    

In [10]:
log_performance_metrics_enabled=True

env = gymnasium.make('CustomRewardEnv', 
                     render_mode='rgb_array', 
                     log_performance_metrics_enabled=log_performance_metrics_enabled, 
                     log_filename=log_filename, 
                     config=config_updates
                    )
evaluate_model(
    env=env,
    config_updates={**config_updates, "simulation_frequency": 15},
    model_path='models/6_Group15_RLProject_dqn_custom_image_cnnpolicy',
    algorithm='DQN',
)

{'action': {'type': 'DiscreteMetaAction'},
 'centering_position': [0.3, 0.5],
 'collision_reward': -4,
 'controlled_vehicles': 1,
 'duration': 30,
 'ego_spacing': 1.5,
 'high_speed_reward': 0.4,
 'initial_lane_id': None,
 'lane_change_reward': 0,
 'lanes_count': 3,
 'left_vehicle_overtaken_reward': -0.5,
 'manual_control': False,
 'normalize_reward': True,
 'observation': {'observation_shape': (128, 64),
                 'scaling': 1.75,
                 'stack_size': 4,
                 'type': 'GrayscaleObservation',
                 'weights': [0.2989, 0.587, 0.114]},
 'offroad_terminal': False,
 'offscreen_rendering': False,
 'other_vehicles_type': 'highway_env.vehicle.behavior.IDMVehicle',
 'policy_frequency': 1,
 'real_time_rendering': False,
 'render_agent': True,
 'reward_speed_range': [20, 30],
 'right_lane_reward': 0.5,
 'safe_distance_reward': 0.1,
 'scaling': 5.5,
 'screen_height': 150,
 'screen_width': 600,
 'show_trajectories': False,
 'simulation_frequency': 15,
 'smooth



Logging metrics for step 45 and seconds elapsed 3.0
Logging metrics for step 60 and seconds elapsed 4.0
Logging metrics for step 75 and seconds elapsed 5.0
Logging metrics for step 90 and seconds elapsed 6.0
Logging metrics for step 105 and seconds elapsed 7.0
Logging metrics for step 120 and seconds elapsed 8.0
Logging metrics for step 135 and seconds elapsed 9.0
Logging metrics for step 150 and seconds elapsed 10.0
Logging metrics for step 165 and seconds elapsed 11.0
Logging metrics for step 180 and seconds elapsed 12.0
Logging metrics for step 195 and seconds elapsed 13.0
Logging metrics for step 210 and seconds elapsed 14.0
Logging metrics for step 225 and seconds elapsed 15.0
Logging metrics for step 240 and seconds elapsed 16.0
Logging metrics for step 255 and seconds elapsed 17.0
Logging metrics for step 270 and seconds elapsed 18.0
Logging metrics for step 285 and seconds elapsed 19.0
Logging metrics for step 300 and seconds elapsed 20.0
Logging metrics for step 315 and second

In [11]:
metrics = aggregate_and_normalize_rewards(log_filename)

if metrics:
    print("Performance metric (as percent of all steps):")
    for metric_name, avg_metric in metrics.items():
        print(f"{metric_name}: {avg_metric*100:.4f}%")

Performance metric (as percent of all steps):
collision_count: 0.8486%
right_lane_count: 100.0000%
on_road_count: 100.0000%
safe_distance_count: 99.7434%
left_vehicle_overtaken_count: 2.0920%
abrupt_accelerations_count: 4.2431%


# 3. Default reward function with MlpPolicy

In [12]:
config_updates = {
    "observation": {
        "type": "GrayscaleObservation",
        "observation_shape": (128, 64),
        "stack_size": 4,
        "weights": [0.2989, 0.5870, 0.1140],  # weights for RGB conversion
        "scaling": 1.75,
    },
}

register(
    id='DefaultRewardEnv',
    entry_point='HighwayEnvDefaultReward:HighwayEnvDefaultReward',
)


  logger.warn(f"Overriding environment {new_spec.id} already in registry.")


In [13]:
log_filename="6_dqn_image_default_reward_log_mlppolicy.csv"
log_performance_metrics_enabled=False

In [14]:
env = gymnasium.make("DefaultRewardEnv", 
                         config=config_updates, 
                     log_performance_metrics_enabled=log_performance_metrics_enabled, 
                     log_filename=log_filename, 
                         )
train_model(
    env=env,
    config_updates=config_updates,
    session_name='6_dqn_image_default_reward_log_mlppolicy',
    policy="MlpPolicy"

)

{'action': {'type': 'DiscreteMetaAction'},
 'centering_position': [0.3, 0.5],
 'collision_reward': -1,
 'controlled_vehicles': 1,
 'duration': 30,
 'ego_spacing': 1.5,
 'high_speed_reward': 0.4,
 'initial_lane_id': None,
 'lane_change_reward': 0,
 'lanes_count': 3,
 'manual_control': False,
 'normalize_reward': True,
 'observation': {'observation_shape': (128, 64),
                 'scaling': 1.75,
                 'stack_size': 4,
                 'type': 'GrayscaleObservation',
                 'weights': [0.2989, 0.587, 0.114]},
 'offroad_terminal': False,
 'offscreen_rendering': False,
 'other_vehicles_type': 'highway_env.vehicle.behavior.IDMVehicle',
 'policy_frequency': 1,
 'real_time_rendering': False,
 'render_agent': True,
 'reward_speed_range': [20, 30],
 'right_lane_reward': 0.1,
 'scaling': 5.5,
 'screen_height': 150,
 'screen_width': 600,
 'show_trajectories': False,
 'simulation_frequency': 5,
 'vehicles_count': 20,
 'vehicles_density': 1}
Training with policy MlpPolicy
U



----------------------------------
| rollout/            |          |
|    ep_len_mean      | 11       |
|    ep_rew_mean      | 7.73     |
|    exploration_rate | 0.979    |
| time/               |          |
|    episodes         | 4        |
|    fps              | 84       |
|    time_elapsed     | 0        |
|    total_timesteps  | 44       |
----------------------------------
----------------------------------
| rollout/            |          |
|    ep_len_mean      | 8.25     |
|    ep_rew_mean      | 5.91     |
|    exploration_rate | 0.969    |
| time/               |          |
|    episodes         | 8        |
|    fps              | 89       |
|    time_elapsed     | 0        |
|    total_timesteps  | 66       |
----------------------------------
----------------------------------
| rollout/            |          |
|    ep_len_mean      | 7.67     |
|    ep_rew_mean      | 5.66     |
|    exploration_rate | 0.956    |
| time/               |          |
|    episodes       

In [15]:
log_performance_metrics_enabled=True



env = gymnasium.make('DefaultRewardEnv', 
                     render_mode='rgb_array', 
                    log_performance_metrics_enabled=log_performance_metrics_enabled,
                     log_filename=log_filename
                     )

# evaluate the model with the default reward function
evaluate_model(
    env=env,
    config_updates={**config_updates, "simulation_frequency": 15},
    model_path="models/6_dqn_image_default_reward_log_mlppolicy",
    algorithm='DQN',
)

{'action': {'type': 'DiscreteMetaAction'},
 'centering_position': [0.3, 0.5],
 'collision_reward': -1,
 'controlled_vehicles': 1,
 'duration': 30,
 'ego_spacing': 1.5,
 'high_speed_reward': 0.4,
 'initial_lane_id': None,
 'lane_change_reward': 0,
 'lanes_count': 3,
 'manual_control': False,
 'normalize_reward': True,
 'observation': {'observation_shape': (128, 64),
                 'scaling': 1.75,
                 'stack_size': 4,
                 'type': 'GrayscaleObservation',
                 'weights': [0.2989, 0.587, 0.114]},
 'offroad_terminal': False,
 'offscreen_rendering': False,
 'other_vehicles_type': 'highway_env.vehicle.behavior.IDMVehicle',
 'policy_frequency': 1,
 'real_time_rendering': False,
 'render_agent': True,
 'reward_speed_range': [20, 30],
 'right_lane_reward': 0.1,
 'scaling': 5.5,
 'screen_height': 150,
 'screen_width': 600,
 'show_trajectories': False,
 'simulation_frequency': 15,
 'vehicles_count': 20,
 'vehicles_density': 1}
Loading model with path models/



Logging metrics for step 45 and seconds elapsed 3.0
Logging metrics for step 60 and seconds elapsed 4.0
Logging metrics for step 15 and seconds elapsed 1.0
Logging metrics for step 30 and seconds elapsed 2.0
Logging metrics for step 45 and seconds elapsed 3.0
Logging metrics for step 15 and seconds elapsed 1.0
Logging metrics for step 30 and seconds elapsed 2.0
Logging metrics for step 45 and seconds elapsed 3.0
Logging metrics for step 60 and seconds elapsed 4.0
Logging metrics for step 75 and seconds elapsed 5.0
Logging metrics for step 90 and seconds elapsed 6.0
Logging metrics for step 105 and seconds elapsed 7.0
Logging metrics for step 120 and seconds elapsed 8.0
Logging metrics for step 135 and seconds elapsed 9.0
Logging metrics for step 150 and seconds elapsed 10.0
Logging metrics for step 15 and seconds elapsed 1.0
Logging metrics for step 30 and seconds elapsed 2.0
Logging metrics for step 45 and seconds elapsed 3.0
Logging metrics for step 60 and seconds elapsed 4.0
Logging

In [16]:
metrics = aggregate_and_normalize_rewards(log_filename)

if metrics:
    print("Performance metric (as percent of all steps):")
    for metric_name, avg_metric in metrics.items():
        print(f"{metric_name}: {avg_metric*100:.4f}%")

Performance metric (as percent of all steps):
collision_count: 7.6377%
right_lane_count: 76.1989%
on_road_count: 100.0000%
safe_distance_count: 93.9165%
left_vehicle_overtaken_count: 8.0817%
abrupt_accelerations_count: 31.7496%


# 4. Custom reward function with MlpPolicy

In [24]:
config_updates = {
    "safe_distance_reward": 0.1,
    "left_vehicle_overtaken_reward": -0.5,
    "collision_reward": -4,
    "smooth_driving_reward" : 0.3,
    "right_lane_reward" : 0.5, 
    "observation": {
        "type": "GrayscaleObservation",
        "observation_shape": (128, 64),
        "stack_size": 4,
        "weights": [0.2989, 0.5870, 0.1140],  # weights for RGB conversion
        "scaling": 1.75,
    },
}

# Register the custom environment
register(
    id='CustomRewardEnv',
    entry_point='HighwayEnvCustomReward:HighwayEnvFastCustomReward',
)

In [25]:
# Create the environment with the custom parameter
# Set log_rewards_enabled to True or False as per your requirement
log_filename="6_dqn_image_custom_reward_log_mlppolicy.csv"
log_performance_metrics_enabled=False

In [26]:
env = gymnasium.make("CustomRewardEnv", 
                    config=config_updates, 
                     log_performance_metrics_enabled=log_performance_metrics_enabled,
                     log_filename=log_filename
                         )
train_model(
    env=env,
    config_updates=config_updates,
    session_name='6_Group15_RLProject_dqn_custom_image_mlppolicy',
    policy="MlpPolicy"

)

{'action': {'type': 'DiscreteMetaAction'},
 'centering_position': [0.3, 0.5],
 'collision_reward': -4,
 'controlled_vehicles': 1,
 'duration': 30,
 'ego_spacing': 1.5,
 'high_speed_reward': 0.4,
 'initial_lane_id': None,
 'lane_change_reward': 0,
 'lanes_count': 3,
 'left_vehicle_overtaken_reward': -0.5,
 'manual_control': False,
 'normalize_reward': True,
 'observation': {'observation_shape': (128, 64),
                 'scaling': 1.75,
                 'stack_size': 4,
                 'type': 'GrayscaleObservation',
                 'weights': [0.2989, 0.587, 0.114]},
 'offroad_terminal': False,
 'offscreen_rendering': False,
 'other_vehicles_type': 'highway_env.vehicle.behavior.IDMVehicle',
 'policy_frequency': 1,
 'real_time_rendering': False,
 'render_agent': True,
 'reward_speed_range': [20, 30],
 'right_lane_reward': 0.5,
 'safe_distance_reward': 0.1,
 'scaling': 5.5,
 'screen_height': 150,
 'screen_width': 600,
 'show_trajectories': False,
 'simulation_frequency': 5,
 'smooth_



----------------------------------
| rollout/            |          |
|    ep_len_mean      | 9        |
|    ep_rew_mean      | 7.45     |
|    exploration_rate | 0.983    |
| time/               |          |
|    episodes         | 4        |
|    fps              | 95       |
|    time_elapsed     | 0        |
|    total_timesteps  | 36       |
----------------------------------
----------------------------------
| rollout/            |          |
|    ep_len_mean      | 14       |
|    ep_rew_mean      | 11.7     |
|    exploration_rate | 0.947    |
| time/               |          |
|    episodes         | 8        |
|    fps              | 96       |
|    time_elapsed     | 1        |
|    total_timesteps  | 112      |
| train/              |          |
|    learning_rate    | 0.0001   |
|    loss             | 0.521    |
|    n_updates        | 2        |
----------------------------------
----------------------------------
| rollout/            |          |
|    ep_len_mean    

In [30]:
log_performance_metrics_enabled=False

env = gymnasium.make('CustomRewardEnv', 
                     render_mode='rgb_array', 
                     log_performance_metrics_enabled=log_performance_metrics_enabled, 
                     log_filename=log_filename, 
                     config=config_updates
                    )
evaluate_model(
    env=env,
    config_updates={**config_updates, "simulation_frequency": 15},
    model_path='models/6_Group15_RLProject_dqn_custom_image_mlppolicy',
    algorithm='DQN',
)

{'action': {'type': 'DiscreteMetaAction'},
 'centering_position': [0.3, 0.5],
 'collision_reward': -4,
 'controlled_vehicles': 1,
 'duration': 30,
 'ego_spacing': 1.5,
 'high_speed_reward': 0.4,
 'initial_lane_id': None,
 'lane_change_reward': 0,
 'lanes_count': 3,
 'left_vehicle_overtaken_reward': -0.5,
 'manual_control': False,
 'normalize_reward': True,
 'observation': {'observation_shape': (128, 64),
                 'scaling': 1.75,
                 'stack_size': 4,
                 'type': 'GrayscaleObservation',
                 'weights': [0.2989, 0.587, 0.114]},
 'offroad_terminal': False,
 'offscreen_rendering': False,
 'other_vehicles_type': 'highway_env.vehicle.behavior.IDMVehicle',
 'policy_frequency': 1,
 'real_time_rendering': False,
 'render_agent': True,
 'reward_speed_range': [20, 30],
 'right_lane_reward': 0.5,
 'safe_distance_reward': 0.1,
 'scaling': 5.5,
 'screen_height': 150,
 'screen_width': 600,
 'show_trajectories': False,
 'simulation_frequency': 15,
 'smooth



In [28]:
metrics = aggregate_and_normalize_rewards(log_filename)

if metrics:
    print("Performance metric (as percent of all steps):")
    for metric_name, avg_metric in metrics.items():
        print(f"{metric_name}: {avg_metric*100:.4f}%")

Performance metric (as percent of all steps):
collision_count: 0.5252%
right_lane_count: 100.0000%
on_road_count: 100.0000%
safe_distance_count: 99.7827%
left_vehicle_overtaken_count: 1.1771%
abrupt_accelerations_count: 4.0203%


# 5. Analysis
Below, we compare the performances of each agent on our performance metrics, and offer an explanation for their differences.


| Metric                          | Default Reward (CNN) | Custom Reward (CNN) | Default Reward (MLP) | Custom Reward (MLP) |
|---------------------------------|-----------------------|----------------------|-----------------------|----------------------|
| collision_count                 | 7.07%                | 0.85%               | 7.64%                | 0.53%               |
| right_lane_count                | 70.05%               | 100.00%             | 76.20%               | 100.00%             |
| on_road_count                   | 100.00%              | 100.00%             | 100.00%              | 100.00%             |
| safe_distance_count             | 94.74%               | 99.74%              | 93.92%               | 99.78%              |
| left_vehicle_overtaken_count    | 8.22%                | 2.09%               | 8.08%                | 1.18%               |
| abrupt_accelerations_count      | 30.81%               | 4.24%               | 31.75%               | 4.02%               |


## Safety performance 

### Collisions
The custom reward function drastically reduces collision rates (0.85% with CNN and 0.53% with MLP) compared to the default reward function (7.07% with CNN and 7.64% with MLP).

### Right lane usage
The custom reward function ensures that agents stay in the right lane 100% of the time, compared to lower percentages under the default reward function (70.05% for CNN and 76.20% for MLP).

### Safe distance from other vehicles: 

The custom reward results in higher safe distance adherence (99.74% for CNN and 99.78% for MLP) compared to the default reward function (94.74% for CNN and 93.92% for MLP).

### Overtaking left vehicles:

Agents with the custom reward function overtook fewer cars on the left, as intended by the design (2.09% for CNN and 1.18% for MLP vs. 8.22% for CNN and 8.08% for MLP under the default).
 
### Comfortable acceleration: 
The custom reward also reduced abrupt accelerations significantly (4.24% for CNN and 4.02% for MLP vs. 30.81% for CNN and 31.75% for MLP under the default).


Using a custom reward function with both CNN and MLP policies significantly improves performance metrics associated with safety and adherence to our performance metrics. However, the CNN and MLP policies were nearly equal on our metrics, which was unexpected, given that MLP policies are not intended to process image data. 

Staying out of the right lane or maintaining a safe distance likely correspond to easily distinguishable grayscale patterns (e.g., lane markings, proximity to other vehicles). MLP policies would probably pick up these patterns in the flattened image vector, even without understanding their spatial arrangement.

It is possible that our simple environment allows the two policies to have comparable performance, but that increasing the number of vehicles or lanes may cause agents trained with an MLP policy to perform worse than ones trained with a CNN policy. The fact that CNNs process spatial relationships and patterns from images would make them inherently scalable to more complex environments. 
