This is the implementation of the function in [this article](https://medium.com/@anshml/craft-a-powerful-reward-function-in-student-league-81925c56a11e)

Decisions:
1. **Distance Reward**:
   - Encourages the car to stay near the center with an exponential curve for gradual feedback.
2. **Steering Penalty**:
   - Reduces reward for steering above 15 degrees to minimize zig-zagging.
3. **Speed Reward**:
   - Motivates maintaining a speed close to 1 m/s, with an exponential penalty for deviation.
4. **Off-Track Penalty**:
   - Heavily penalizes when all wheels go off-track to enforce strict adherence.

This function balances speed and precision, aligning with the competition's objectives. Let me know if you'd like any adjustments!

In [None]:
def reward_function(params):
    """
    Reward function for AWS DeepRacer to incentivize center line adherence,
    controlled steering, and speed maintenance.
    """
    # Extract parameters
    distance_from_center = params['distance_from_center']
    track_width = params['track_width']
    steering_angle = params['steering_angle']
    speed = params['speed']
    all_wheels_on_track = params['all_wheels_on_track']
    
    # Center alignment reward
    distance_reward = 1 - (distance_from_center / (0.5 * track_width))**0.4

    # Speed reward
    SPEED_THRESHOLD = 1.0  # m/s
    speed_diff = abs(SPEED_THRESHOLD - speed)
    max_speed_diff = 0.2 #set it carefully in range [0.01,0.3] 
    speed_reward = max(1e-3, 1 - (speed_diff / max_speed_diff)**0.5) #never set negative or zero rewards

    # Combine rewards and handle off-track penalty
    if not all_wheels_on_track:
        reward = 1e-3  # Penalize heavily if off-track
    else:
        reward = (distance_reward * 2) + (speed_reward * 10)
    
    return float(reward)


Now, from [this article](https://blog.gofynd.com/how-we-broke-into-the-top-1-of-the-aws-deepracer-virtual-circuit-573ba46c275)

Key Decisions and Design

1. **Speed Component**:
   - Encourages the car to maintain an optimal speed range using a Gaussian function centered around the mean of min and max speeds.

2. **Distance Component**:
   - Penalizes deviation from the center of the track, favoring paths close to the racing line.

3. **Heading Component**:
   - Rewards alignment of the car's heading with the direction of the track.

4. **Curve Bonus**:
   - Provides additional rewards for navigating sharp turns effectively, ensuring stability.

5. **Progress Reward**:
   - Incentivizes steady progress along the track, exponentially increasing with milestones.

6. **Immediate and Long-Term Components**:
   - Combines both short-term actions and overall track progress for balanced learning.

7. **Avoid Unpardonable Actions**:
   - Penalizes heavily for leaving the track or making unreasonably sharp turns.

---

This reward function reflects the detailed principles discussed in the article and can guide a DeepRacer model towards better performance. Let me know if you’d like more customization or detailed parameter adjustments!

In [3]:
import math

# Classe para armazenar os parâmetros anteriores
class PARAMS:
    prev_speed = None
    prev_steering_angle = None 
    prev_steps = None
    prev_direction_diff = None
    prev_normalized_distance_from_route = None
    unpardonable_action = False
    waypoints = []
    optimal_speed = []
    intermediate_progress = [0] * 11  # Para bônus de progresso intermediário

# Função de recompensa
def reward_function(params):
    # Parâmetros do episódio
    heading = params['heading']
    vehicle_x = params['x']
    vehicle_y = params['y']
    distance_from_center = params['distance_from_center']
    steps = params['steps']
    steering_angle = params['steering_angle']
    speed = params['speed']
    progress = params['progress']

    # Reinicializar parâmetros se for um novo episódio
    if PARAMS.prev_steps is None or steps < PARAMS.prev_steps:
        PARAMS.prev_speed = None
        PARAMS.prev_steering_angle = None
        PARAMS.prev_direction_diff = None
        PARAMS.prev_normalized_distance_from_route = None

    # Cálculo da direção da rota
    next_route_point_x = params['next_waypoint'][0]
    next_route_point_y = params['next_waypoint'][1]
    route_direction = math.atan2(next_route_point_y - vehicle_y, next_route_point_x - vehicle_x)
    route_direction = math.degrees(route_direction)
    direction_diff = route_direction - heading

    # Normalização do desvio de direção
    direction_diff = (direction_diff + 360) % 360
    if direction_diff > 180:
        direction_diff -= 360

    # Recompensa de direção
    heading_reward = math.cos(abs(direction_diff) * (math.pi / 180)) ** 10
    if abs(direction_diff) <= 20:
        heading_reward = math.cos(abs(direction_diff) * (math.pi / 180)) ** 4

    # Verificar se a velocidade caiu
    has_speed_dropped = PARAMS.prev_speed is not None and PARAMS.prev_speed > speed
    speed_maintain_bonus = min(speed / PARAMS.prev_speed, 1) if has_speed_dropped else 1

    # Penalizar piora no desvio de direção
    heading_decrease_bonus = 0
    if PARAMS.prev_direction_diff is not None:
        if abs(PARAMS.prev_direction_diff / direction_diff) > 1:
            heading_decrease_bonus = min(10, abs(PARAMS.prev_direction_diff / direction_diff))

    # Verificar alteração no ângulo de direção
    has_steering_angle_changed = (
        PARAMS.prev_steering_angle is not None and
        not math.isclose(PARAMS.prev_steering_angle, steering_angle)
    )
    steering_angle_maintain_bonus = 1
    if abs(direction_diff) < 10:
        steering_angle_maintain_bonus *= 2
    if PARAMS.prev_direction_diff is not None and abs(PARAMS.prev_direction_diff) > abs(direction_diff):
        steering_angle_maintain_bonus *= 2

    # Recompensa por manter distância da rota
    distance_reduction_bonus = 1
    if PARAMS.prev_normalized_distance_from_route is not None:
        distance_reduction_bonus = min(
            abs(PARAMS.prev_normalized_distance_from_route / distance_from_center), 2
        )

    # Recompensas principais
    HC = (10 * heading_reward * steering_angle_maintain_bonus)
    DC = (10 * distance_reduction_bonus)
    SC = (5 * speed_maintain_bonus)
    IC = (HC + DC + SC) ** 2 + (HC * DC * SC)

    if PARAMS.unpardonable_action:
        IC = 1e-3

    # Recompensa de longo prazo
    curve_bonus = 0  # Adicione lógica, se necessário
    intermediate_progress_bonus = 0
    pi = int(progress // 10)
    if pi != 0 and PARAMS.intermediate_progress[pi] == 0:
        if pi == 10:
            intermediate_progress_bonus = progress ** 14
        else:
            intermediate_progress_bonus = progress ** (5 + 0.75 * pi)
        PARAMS.intermediate_progress[pi] = intermediate_progress_bonus

    LC = curve_bonus + intermediate_progress_bonus
    reward = max(IC + LC, 1e-3)

    # Atualizar parâmetros anteriores
    PARAMS.prev_speed = speed
    PARAMS.prev_steering_angle = steering_angle
    PARAMS.prev_direction_diff = direction_diff
    PARAMS.prev_steps = steps
    PARAMS.prev_normalized_distance_from_route = distance_from_center

    return reward

# Função para calcular a recompensa de velocidade
def calculate_speed_reward(speed, i, nsteps):
    MIN_SPEED = 2.0
    MAX_SPEED = 4.0
    sigma_speed = abs(MAX_SPEED - MIN_SPEED) / 6.0
    optimal_speed = 0

    if i + nsteps < len(PARAMS.waypoints):
        optimal_speed = min(PARAMS.optimal_speed[i:(i + nsteps) % len(PARAMS.waypoints)])
    else:
        optimal_speed = min(
            min(PARAMS.optimal_speed[i:]),
            min(PARAMS.optimal_speed[:(i + nsteps) % len(PARAMS.waypoints) + 1])
        )
    optimal_speed = min(MAX_SPEED, optimal_speed)
    return math.exp(-0.5 * abs(speed - optimal_speed) ** 2 / sigma_speed ** 2)


Finally, the implementation from [this article](https://medium.com/@marsmans/how-i-got-into-the-top-2-in-aws-deepracer-32127a364212)

### Key Design Decisions

1. **Distance from Center**:
   - Rewards staying close to the center line with a linear penalty as the car drifts outward.

2. **Steering Angle**:
   - Introduces a penalty for sharp turns (greater than 15 degrees) to encourage smoother driving and minimize zig-zagging.

3. **Speed Optimization**:
   - Incentivizes speeds within an optimal range (1.0–4.0 m/s). Models are penalized for excessive or insufficient speed.

4. **Progress-Based Reward**:
   - Directly ties the reward to the percentage of the track completed relative to steps taken, promoting efficiency.

5. **Off-Track Penalty**:
   - Implements a significant penalty for leaving the track, encouraging strict adherence to the racing line.

6. **Custom Adjustments**:
   - Encourages testing different weights for distance, speed, and progress rewards based on specific track challenges.

In [None]:
def reward_function(params):
    """
    Example of rewarding the agent to follow center line
    """

    reward = 1e-3
    # Read input parameters
    track_width = params['track_width']
    distance_from_center = params['distance_from_center']

    # Calculate 3 markers that are at varying distances away from the center line
    marker_1 = 0.1 * track_width
    marker_2 = 0.25 * track_width
    marker_3 = 0.5 * track_width

    # Give higher reward if the car is closer to center line and vice versa
    if distance_from_center <= marker_1:
        reward = 2.0
    elif distance_from_center <= marker_2:
        reward = 0.5
    elif distance_from_center <= marker_3:
        reward = 0.1  # getting close to off track

    # Incentivize going fast on straight ways and slower on curves
    steering_angle = params['steering_angle']
    speed = params['speed']
    if -5 < steering_angle < 5:
        if speed > 2.5:
            reward += 2.0
        elif speed > 2.0:
            reward += 1.0
    elif -15 > steering_angle or steering_angle > 15:
        if speed < 1.8:
            reward += 1.0
        elif speed < 2.2:
            reward += 0.5

    #Incentivizing fewer steps
    steps = params['steps']
    progress = params['progress']
    step_reward = (progress/steps)*10
    reward += step_reward


    return float(reward)


## Lívia

1. entender as funcoes que foram implementadas nos post do medium
2. escolher ideias que mais façam sentido num primeiro momento
3. treinar modelo 

- repetir até que pareça bom o suficiente:
    4. avaliar o que pode ser melhorado
    5. mudar funcao 

6. copiar o modelo 
7. comparar com o do grupo

In [None]:


def reward_function(params):
    """
    Função de recompensa para direção suave, rápida e consistente
    """
    # Extrair parâmetros
    track_width = params['track_width']
    distance_from_center = params['distance_from_center']
    steering_angle = params['steering_angle']
    speed = params['speed']
    steps = params['steps']
    progress = params['progress']
    heading = params['heading']
    all_wheels_on_track = params['all_wheels_on_track']

    # Penalização por sair da pista
    if not all_wheels_on_track:
        return 1e-3

    # Recompensa por adesão à linha central
    center_reward = max(1e-3, 1 - (distance_from_center / (0.5 * track_width))**0.4)

    # Penalização para curvas muito fechadas e alta velocidade
    if abs(steering_angle) > 15 and speed > 1.8:
        steering_penalty = 0.5  # Reduz recompensa em curvas fechadas com alta velocidade
    else:
        steering_penalty = 1.0

    # Recompensa por velocidade ideal
    if abs(steering_angle) < 10:  # Retas
        speed_reward = max(1e-3, 1 - abs(speed - 2.5) / 2.5)
    else:  # Curvas
        speed_reward = max(1e-3, 1 - abs(speed - 1.8) / 1.8)

    # Recompensa por progresso (proporcional à eficiência dos passos)
    progress_reward = (progress / steps) * 10 if steps > 0 else 0

    # Combinação ponderada das recompensas
    reward = (center_reward * 3) + (speed_reward * 2) + (steering_penalty) + (progress_reward * 0.5)

    return float(reward)


In [None]:
import math

# Class to store previous parameters
class PARAMS:
    prev_speed = None
    prev_steering_angle = None
    prev_direction_diff = None
    prev_normalized_distance_from_route = None
    unpardonable_action = False

def reward_function(params):
    """
    Reward function for smoother driving without unsafe parameters like 'next_waypoint'.
    """

    # Extract safe parameters
    track_width = params['track_width']
    distance_from_center = params['distance_from_center']
    steering_angle = params['steering_angle']
    speed = params['speed']
    heading = params['heading']
    all_wheels_on_track = params['all_wheels_on_track']

    # Penalize if off-track
    if not all_wheels_on_track:
        return 1e-3

    # Reward for staying near the center of the track
    center_reward = max(1e-3, 1 - (distance_from_center / (0.5 * track_width))**0.4)

    # Reward for consistent speed
    if PARAMS.prev_speed is not None:
        speed_diff = abs(speed - PARAMS.prev_speed)
        speed_reward = max(1e-3, 1 - speed_diff / 2)  # Penalize large speed variations
    else:
        speed_reward = 1.0

    # Reward for smooth steering
    if PARAMS.prev_steering_angle is not None:
        steering_change = abs(steering_angle - PARAMS.prev_steering_angle)
        steering_reward = max(1e-3, 1 - steering_change / 15)  # Penalize sharp turns
    else:
        steering_reward = 1.0

    # Combine rewards
    reward = (center_reward * 2) + (speed_reward * 3) + (steering_reward * 2)

    # Apply unpardonable action penalty
    if PARAMS.unpardonable_action:
        reward = 1e-3

    # Update previous state
    PARAMS.prev_speed = speed
    PARAMS.prev_steering_angle = steering_angle
    PARAMS.prev_normalized_distance_from_route = distance_from_center

    return float(reward)