## Notebook description

This notebook demonstrates the customization of the reward function for the `highway-fast-v0 environment`. The custom reward logic is implemented in the `utils/highwayEnvCustomReward.py` file and replaces or enhances elements of the default reward function to better align with realistic driving scenarios.

## Summary of the default reward function in `highway-fast-v0`

The default reward function in the highway-fast-v0 environment is designed to promote safe and efficient driving behaviors. It calculates a scalar reward for each action based on four key factors:

1. **Collision Penalty:**

   - The `collision_reward` penalizes the agent if the vehicle crashes. This is a binary penalty, either 0 (no crash) or a negative value (collision occurred).

2. **Right Lane Reward:**

   - The `right_lane_reward` rewards the agent for staying in the rightmost lane. The reward increases as the vehicle moves closer to the rightmost lane, encouraging proper lane usage.

3. **High-Speed Reward:**

   - The `high_speed_reward` incentivizes driving at higher speeds. The reward is proportional to the vehicle's forward speed, normalized within a defined speed range.

4. **On-Road Reward:**
   - The `on_road_reward` encourages the vehicle to stay on the road by multiplying the total reward by 1 if the vehicle is on the road or 0 if it is off-road. This means that if the vehicle is off-road, the total reward is also zero.

## Shortcomings of the default reward function in `highway-fast-v0`

1. **Overemphasising the rightmost lane:**

   - The `right_lane_reward` encourages staying in the rightmost lane, which may not be in line with realistic driving goals. For example, overtaking may require swerving into other lanes, which is not directly encouraged.

2. **Speed reward out of context:**

   - The `high_speed_reward` rewards speed linearly, but does not take into account traffic density. Speeding in traffic jams should be penalised.

3. **Binary Collision Penalty:**

   - The `collision_reward` is binary, providing the same penalty regardless of the severity or cause of the crash. This ignores scenarios like near-misses, which could be penalized slightly to encourage caution.

4. **Lack of a Safe Distance Mechanism:**

   - The current reward function does not provide an incentive to maintain a safe distance from the vehicle in front. However, promoting a safe following distance is a fundamental part of safe driving. It helps avoid collisions and creates smoother traffic flow.

## Changes made to the reward function

### Add safe distance reward

**Implementation**

1. Identify the Front Vehicle:

   - Use the existing `road.neighbour_vehicles(vehicle)` method to locate the vehicle ahead of the agent.

2. Compute the Distance:

   - Calculate the distance between the agent's vehicle and the identified front vehicle. If no front vehicle exists, assume the distance is infinite.

3. Define the Reward:

   - Reward the agent if the distance exceeds a safe threshold (e.g., 10 meters). Penalize the agent as the distance decreases below this threshold.

**Advantages**

- Encourages the agent to maintain a safe distance, reducing the likelihood of rear-end collisions.
- Promotes safer and more realistic driving behaviors.

In [None]:
# safe distance reward
front_vehicle, _ = self.road.neighbour_vehicles(self.vehicle) # identify the front vehicle
safe_distance = 5
if front_vehicle:
    distance = max(front_vehicle.position[0] - self.vehicle.position[0], 0)

    if distance > safe_distance:
        safe_distance_reward = 1  # Full reward if distance is safe
    else:
        safe_distance_reward = -1 * (safe_distance - distance) / safe_distance
else:
    safe_distance_reward = 0

### Improve existing collision reward

**Implementation**

1. Collision Penalty:
    - Check whether the agent's vehicle has crashed (self.vehicle.crashed).
    - Apply a fixed penalty of -1 if a collision has occurred; otherwise, no penalty is applied.

2. Near-Miss Penalty:
    - Iterate through all vehicles on the road and calculate the Euclidean distance between the agent's vehicle and other vehicles.
    - Define a "near-miss" threshold (e.g., 2 meters).
    - Apply a scaled penalty when the distance falls within the near-miss range (closer near-misses incur higher penalties).

3. Combine Penalties:
    - The total collision reward is the sum of the collision penalty and the near-miss penalties.

**Advantages**
- Encourages the agent to avoid collisions entirely by imposing a strict penalty.
- Promotes safer behavior by penalizing close interactions (near-misses), even when collisions are avoided.
- Improves realism in the simulation, as near-misses are indicative of risky driving.

In [None]:
# Improved collision reward
collision_penalty = -1 if self.vehicle.crashed else 0  # Full penalty for collision
near_miss_penalty = 0
for vehicle in self.road.vehicles:
    if vehicle is not self.vehicle:
        distance_to_vehicle = np.linalg.norm(
            np.array(self.vehicle.position) - np.array(vehicle.position)
        )
        if 0 < distance_to_vehicle <= 2:  # Near-miss threshold
            near_miss_penalty += -0.5 * (2 - distance_to_vehicle) / 2

collision_reward = collision_penalty + near_miss_penalty

### Add high speed reward

**Implementation**

1. Define Traffic Radius:
    - Set a traffic_radius (e.g., 10 meters) within which the agent's vehicle assesses the surrounding traffic density.

2. Count Nearby Vehicles:
    - Iterate through all vehicles on the road and calculate the Euclidean distance between the agent's vehicle and others.
    - Increment the count of nearby vehicles if they are within the defined traffic radius.

3. Compute Traffic Density Factor:
    - Use the number of nearby vehicles to calculate a traffic density factor, where higher traffic density reduces the reward.
    - Define a maximum density (max_density, e.g., 10 vehicles) for scaling. The factor decreases linearly as the density approaches this maximum.

4. Scale the High-Speed Reward:
    - Adjust the agent's speed-based reward (scaled_speed) using the traffic density factor.
    - Clip the reward within a range of 0 to 1 for consistency.

**Advantages**
- Encourages the agent to drive at higher speeds when traffic density is low, promoting efficient driving.
- Discourages risky high-speed driving in dense traffic, reducing the likelihood of collisions or unsafe maneuvers.

In [None]:
# high speed reward
traffic_radius = 10

# Count the number of vehicles within the traffic radius
nearby_vehicles = 0
for other_vehicle in self.road.vehicles:
    if other_vehicle is not self.vehicle:
        distance = np.linalg.norm(
            np.array(other_vehicle.position) - np.array(self.vehicle.position)
        )
        if distance < traffic_radius:
            nearby_vehicles += 1

# Traffic density factor: more vehicles -> higher penalty
max_density = 10
traffic_density_factor = max(0, 1 - nearby_vehicles / max_density)

# Adjust high-speed reward based on traffic density
high_speed_reward = np.clip(scaled_speed * traffic_density_factor, 0, 1)