## 0. Notebook description
This notebook explores the impact of increased environment complexity on the agent's performance. We modify key environment parameters, such as the `vehicles_count` and `lanes_count`, to introduce greater challenges (e.g., denser traffic, more lanes). 

The notebook is structured as follows:
1. **Environment Modification:** Adjusts environment parameters to create a more complex scenario.
2. **Agent Testing:** Evaluates a pre-trained agent's performance in the modified environment.
3. **Retraining:** Trains an agent from scratch to adapt to the new complexity.
4. **Performance Analysis:** Compares the pre-trained agent's performance with the retrained agent, discussing insights and observed trends.

## 1. Environment Modification
First, we have changed the environment so that there are more lanes and vehicles.

In [2]:
import gymnasium
import highway_env
from gymnasium import register

register(
  id='CustomRewardEnv',
  entry_point='HighwayEnvCustomReward:HighwayEnvFastCustomReward',
)

config_updates = {
    "safe_distance_reward": 0.1,
    "left_vehicle_overtaken_reward": -0.5,
    "collision_reward": -4,
    "smooth_driving_reward" : 0.3,
    "right_lane_reward" : 0.5,
    "lanes_count": 6, # more lanes
    "vehicles_count": 80, # more vehicles
}

## 2. Evaluate pre-trained agent's performance
For this purpose, we use our pre-trained DQN agent from our notebook `2_Group15_RLProject_AlgorithmComparison`.

In [4]:
from utils.evaluation import evaluate_model, aggregate_and_normalize_rewards

log_filename="3_dqn_before_complex_training.csv"
log_performance_metrics_enabled=True

env = gymnasium.make('CustomRewardEnv', 
                     render_mode='rgb_array', 
                     log_performance_metrics_enabled=log_performance_metrics_enabled, 
                     log_filename=log_filename
                    )

evaluate_model(
    env=env,
    config_updates={**config_updates, "simulation_frequency": 15},
    model_path='models/2_Group15_RLProject_DQN',
    algorithm='DQN',
)

{'action': {'type': 'DiscreteMetaAction'},
 'centering_position': [0.3, 0.5],
 'collision_reward': -4,
 'controlled_vehicles': 1,
 'duration': 30,
 'ego_spacing': 1.5,
 'high_speed_reward': 0.4,
 'initial_lane_id': None,
 'lane_change_reward': 0,
 'lanes_count': 6,
 'left_vehicle_overtaken_reward': -0.5,
 'manual_control': False,
 'normalize_reward': True,
 'observation': {'type': 'Kinematics'},
 'offroad_terminal': False,
 'offscreen_rendering': False,
 'other_vehicles_type': 'highway_env.vehicle.behavior.IDMVehicle',
 'policy_frequency': 1,
 'real_time_rendering': False,
 'render_agent': True,
 'reward_speed_range': [20, 30],
 'right_lane_reward': 0.5,
 'safe_distance_reward': 0.1,
 'scaling': 5.5,
 'screen_height': 150,
 'screen_width': 600,
 'show_trajectories': False,
 'simulation_frequency': 15,
 'smooth_driving_reward': 0.3,
 'vehicles_count': 80,
 'vehicles_density': 1}
Logging metrics for step 15 and seconds elapsed 1.0
Logging metrics for step 30 and seconds elapsed 2.0
Loggi

In [5]:
metrics = aggregate_and_normalize_rewards(log_filename)

if metrics:
    print("Performance metric (as percent of all steps):")
    for metric_name, avg_metric in metrics.items():
        print(f"{metric_name}: {avg_metric*100:.4f}%")

Performance metric (as percent of all steps):
collision_count: 2.8200%
right_lane_count: 82.6946%
on_road_count: 100.0000%
safe_distance_count: 97.9272%
left_vehicle_overtaken_count: 7.6645%
abrupt_accelerations_count: 6.2184%


### Conclusion
The pre-trained agent, originally trained on a simpler environment (4 lanes and 50 vehicles), demonstrated surprisingly good performance in the more complex environment (6 lanes and 80 vehicles). This can be attributed to the following reasons:

**1. Generalization Capability:**\
The pre-trained agent may have learned strategies that are robust and adaptable in different scenarios. For example, strategies such as maintaining safe distances, staying in the correct lane or avoiding collisions are effective regardless of the number of lanes or vehicles.

**2. Reward Structure:**\
Since the reward structure in the complex environment is the same as the one used during training in the simpler environment, the agent's learned behavior can remain optimal.  For example, rewards for “staying in the right lane” or “keeping a safe distance” are likely to incentivize behaviors that are beneficial in both environments.

**3. Complexity Dilution:**\
In environments with higher vehicle density, the traffic flow may stabilize due to increased congestion, making it easier for the agent to avoid abrupt maneuvers or collisions. This phenomenon can make the environment seem less unpredictable than initially expected.

## 3. Retrain an agent on complex environment
We train another agent using the DQN algorithm on the complex environment in order to be able to better compare the performance with the previous one.

In [4]:
from utils.training import train_model

log_filename="3_dqn_after_complex_training.csv"
log_performance_metrics_enabled=False

env = gymnasium.make('CustomRewardEnv', 
                     render_mode='rgb_array', 
                     log_performance_metrics_enabled=log_performance_metrics_enabled, 
                     log_filename=log_filename
                    )

# train model
train_model(
    env=env,
    config_updates=config_updates,
    session_name='3_Group15_RLProject',
    algorithm='DQN'
)

{'action': {'type': 'DiscreteMetaAction'},
 'centering_position': [0.3, 0.5],
 'collision_reward': -4,
 'controlled_vehicles': 1,
 'duration': 30,
 'ego_spacing': 1.5,
 'high_speed_reward': 0.4,
 'initial_lane_id': None,
 'lane_change_reward': 0,
 'lanes_count': 6,
 'left_vehicle_overtaken_reward': -0.5,
 'manual_control': False,
 'normalize_reward': True,
 'observation': {'type': 'Kinematics'},
 'offroad_terminal': False,
 'offscreen_rendering': False,
 'other_vehicles_type': 'highway_env.vehicle.behavior.IDMVehicle',
 'policy_frequency': 1,
 'real_time_rendering': False,
 'render_agent': True,
 'reward_speed_range': [20, 30],
 'right_lane_reward': 0.5,
 'safe_distance_reward': 0.1,
 'scaling': 5.5,
 'screen_height': 150,
 'screen_width': 600,
 'show_trajectories': False,
 'simulation_frequency': 5,
 'smooth_driving_reward': 0.3,
 'vehicles_count': 80,
 'vehicles_density': 1}
Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Logging to ./log

In [6]:
from utils.evaluation import evaluate_model, aggregate_and_normalize_rewards

# evaluate model
log_performance_metrics_enabled=True
log_filename="3_dqn_after_complex_training.csv"

env = gymnasium.make('CustomRewardEnv', 
                     render_mode='rgb_array', 
                     log_performance_metrics_enabled=log_performance_metrics_enabled, 
                     log_filename=log_filename
                    )
evaluate_model(
    env=env,
    config_updates={**config_updates, "simulation_frequency": 15},
    model_path='models/3_Group15_RLProject_DQN',
    algorithm='DQN',
)

{'action': {'type': 'DiscreteMetaAction'},
 'centering_position': [0.3, 0.5],
 'collision_reward': -4,
 'controlled_vehicles': 1,
 'duration': 30,
 'ego_spacing': 1.5,
 'high_speed_reward': 0.4,
 'initial_lane_id': None,
 'lane_change_reward': 0,
 'lanes_count': 6,
 'left_vehicle_overtaken_reward': -0.5,
 'manual_control': False,
 'normalize_reward': True,
 'observation': {'type': 'Kinematics'},
 'offroad_terminal': False,
 'offscreen_rendering': False,
 'other_vehicles_type': 'highway_env.vehicle.behavior.IDMVehicle',
 'policy_frequency': 1,
 'real_time_rendering': False,
 'render_agent': True,
 'reward_speed_range': [20, 30],
 'right_lane_reward': 0.5,
 'safe_distance_reward': 0.1,
 'scaling': 5.5,
 'screen_height': 150,
 'screen_width': 600,
 'show_trajectories': False,
 'simulation_frequency': 15,
 'smooth_driving_reward': 0.3,
 'vehicles_count': 80,
 'vehicles_density': 1}
Logging metrics for step 15 and seconds elapsed 1.0
Logging metrics for step 30 and seconds elapsed 2.0
Loggi

In [7]:
metrics = aggregate_and_normalize_rewards(log_filename)

if metrics:
    print("Performance metric (as percent of all steps):")
    for metric_name, avg_metric in metrics.items():
        print(f"{metric_name}: {avg_metric*100:.4f}%")

Performance metric (as percent of all steps):
collision_count: 2.9921%
right_lane_count: 87.1669%
on_road_count: 100.0000%
safe_distance_count: 97.5222%
left_vehicle_overtaken_count: 10.0281%
abrupt_accelerations_count: 2.7115%


### Conclusion
The agent trained on the complex environment showed only marginal improvements compared to the pre-trained agent on the less complex environment. This can be explained as follows:

**1. Complexity of the environment:**\
The complex environment introduces more variables (e.g., more lanes and vehicles), increasing the difficulty of identifying and exploiting specific patterns during training. As a result, learning can be slower and may not significantly outperform the pre-trained agent within the given training duration.

**3. Trade-off Between Generalization and Specialization:**\
The newly trained agent might adapt too much to the complexity of the new environment, e.g. higher vehicle density or number of lanes, which could lead to slightly less efficient behavior in scenarios that overlap with the simpler environment.

## 4. Key Insights and Takeaways
- Agents can perform well in unseen environments if their training policies focus on generalizable principles like collision avoidance or maintaining safe distances.
- Retraining agents on more complex environments does not always guarantee significant performance gains, particularly when the reward structures and overarching strategies remain consistent.
- The slightly better performance of the retrained agent suggest that the reward system and environmental parameters are well designed to promote transferable skills and make the agent's behavior adaptive across different levels of complexity.

## 5. Future Considerations
To further investigate these findings, experiments could be conducted by:

- Testing the pre-trained agent on environments with even more significant deviations (e.g., higher traffic density or more erratic driver behaviors).
- Analyzing the impact of training duration and hyperparameter tuning on the retrained agent’s performance.