<a href="https://colab.research.google.com/github/alazaradane/marl-robot-navigation/blob/main/Reward_System_Code.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Reward System Code

#Explanation of the Reward Function:


Target Progress:

Dist to Target: The reward becomes positive when the drone reaches the target point, with a large reward of 50 for reaching it. The drone receives a negative reward for moving away from the target.
Start Point Penalty:

Dist to Start: A small penalty is added for moving too far away from the start point, encouraging the drone to take the shortest path toward the target.
Collision Penalty:

A penalty (collision_penalty = -10.0) is applied if the drone collides with an obstacle, encouraging safe navigation.

Orange Plot Parameters:

orange_center: This is the center of the danger zone (the orange plot).

orange_radius: This defines the radius of the circular danger zone. You can adjust this value based on the size of the zone you want to create.

Penalty for Entering the Orange Plot:

Full Penalty: If the drone enters the orange plot (i.e., the distance to the center of the plot is less than or equal to orange_radius), a large penalty of -50 is applied.

Proximity Penalty: If the drone is near the orange plot but outside of it (i.e., the distance is between orange_radius and 1.5 * orange_radius), a smaller penalty of -20 is applied.

Smooth Movement Encouraged: The drone is still encouraged to move smoothly and efficiently with penalties for excessive movement or unnecessary steps.

Time Efficiency:

A small penalty (time_penalty = -0.1) is applied for each step, encouraging the drone to reach the target in the least amount of time.

In [None]:
import numpy as np

class CustomDroneRewardCalculator:
    def __init__(self, start_point, target_point, orange_center, orange_radius, threshold=0.5, collision_penalty=-10.0, time_penalty=-0.1):
        # Starting point and target point
        self.start_point = np.array(start_point)
        self.target_point = np.array(target_point)
        # Orange plot (danger zone)
        self.orange_center = np.array(orange_center)
        self.orange_radius = orange_radius
        # Threshold for considering the target "reached"
        self.threshold = threshold
        # Penalty for collision
        self.collision_penalty = collision_penalty
        # Penalty for inefficiency in time taken
        self.time_penalty = time_penalty

        # State tracking
        self.prev_position = None
        self.total_steps = 0
        self.done = False

    def update_position(self, current_position):
        """
        Update the drone's position.
        """
        self.prev_position = np.copy(current_position)

    def calculate_reward(self, current_position, action, collision_occurred):
        """
        Calculate the reward based on the drone's current position, action, and whether a collision occurred.
        """
        # 1. Progress toward the target point
        dist_to_target = np.linalg.norm(current_position - self.target_point)
        dist_to_start = np.linalg.norm(current_position - self.start_point)

        # Reward for moving toward the target point
        if dist_to_target <= self.threshold:
            reward = 50  # Large reward for reaching the target
            self.done = True
        else:
            reward = -dist_to_target  # Negative reward based on distance to target

        # Reward for moving away from the start point (penalizing inefficiency)
        reward -= dist_to_start * 0.1

        # 2. Collision penalty (if any)
        if collision_occurred:
            reward += self.collision_penalty

        # 3. Efficiency: penalize for excessive actions (to encourage smooth flight)
        if self.prev_position is not None:
            movement_penalty = np.linalg.norm(current_position - self.prev_position)
            reward -= movement_penalty  # Reward decreases as movement becomes erratic or excessive

        # 4. Time penalty (to minimize time spent)
        self.total_steps += 1
        reward += self.time_penalty  # Penalize each step to encourage faster completion

        # 5. Avoiding the Orange Plot (Danger Zone)
        # Calculate distance from the drone's position to the center of the orange plot
        dist_to_orange_plot = np.linalg.norm(current_position - self.orange_center)

        # Apply a penalty if the drone is inside or too close to the orange plot
        if dist_to_orange_plot <= self.orange_radius:
            reward -= 50  # Large penalty for entering the danger zone (orange plot)
        elif dist_to_orange_plot <= self.orange_radius * 1.5:
            reward -= 20  # Smaller penalty if drone is near but not inside the danger zone

        return reward

    def reset(self):
        """
        Reset tracking variables at the start of an episode.
        """
        self.prev_position = None
        self.total_steps = 0
        self.done = False
