## Drone simulator - Notebook

This simulator is a drone environment simulation, "inherited" from OpenAI PettingZoe - mpe environment. The "parasites" are represented as green circles and the drones are represented as red circles. It aims to create an interactive environment where drones and enemies can navigate, interact, and potentially engage in strategic behaviors.

The drones and enemies can move in various directions within the environment, allowing for dynamic movement patterns and interactions.
The drones have additional capabilities, such as the ability to change colors, reach higher ground,open a lamp,providing them with increased versatility and strategic options.

In addition to the drones and enemies, the environment features shadow zones, which are areas where both drones and enemies can enter. These shadow zones introduce an additional element to the simulation and can potentially affect the behavior and interactions of the entities within them.

Overall, the simulator aims to provide a realistic and immersive simulation of a drone-based scenario, allowing for the exploration and analysis of various drone and enemy interactions within the simulated environment.

In [1]:
from IPython.core.display import HTML
from IPython.display import display
gif_url = "gifs/example.gif"
html = HTML(f'<img src="{gif_url}" />')
display(html)

 There are 2 ways to initialize the environment:

 1. Empty Constructor (Parameters from Setting File)
    * Initialize the environment using an empty constructor.
    * Read parameters from the settings file with predefined values.
    * Allows easy configuration without modifying the code directly.

In [2]:
from CustomAgents import *
from Auxiliry import *
env = simple_tag_v3.env()

 2. Parameter Constructor
    * Initialize the environment by directly passing parameters to the constructor.
    * Parameters include render mode, agent and adversary counts, obstacle count, maximum cycles,
      observation and factor dictionaries, and possible agent colors.
    * Provides flexibility and customization for specific requirements or dynamic conditions.

In [3]:

env = simple_tag_v3.env(
    render_mode='human',
    num_parasites=5,
    num_drones=5,
    num_obstacles=1,
    max_cycles=1000,
    num_of_possible_colors_for_agent=4,
    lamp_flag=True,
    height_flag=True,
    landmark_colide=False,
    no_dead_flag=True,
    max_hit_drone=1,
    max_hit_parasite=2,

)


1. `render_mode` - Determines the rendering mode for the simulation ('human' for visual rendering, 'rgb_array' for non-visual processing).
                   Default value is 'human'.

2. `num_parasites` - Number of parasites (green agents) in the simulation.
                     Default value is 5.

3. `num_adversaries` - Number of drones (red agents) in the simulation.
                     Default value is 5.

4. `num_obstacles` - Number of shadowed fields/obstacles in the environment.
                     Default value is 2.

5. `max_cycles` - Maximum number of cycles for the simulation.
                     Default value is 1000.

6. `num_of_possible_colors_for_agent` - Number of possible colors available for the Drones.
                     Default value is 3.

7. `render_object_shrinking` - Allows rendering to shrink when agents are out of bounds.
                     Default value is 'True'.

8. `lamp_flag` - Flag indicating whether the drone can light a lamp.
                     Default value is 'True'.

9. `height_flag` - Flag indicating whether the drone can change its height.
                     Default value is 'False'.

10. `landmark_colide` - Flag indicating whether agents collide with the landmark and can't enter it.
                     Default value is 'False'.

11. `no_dead_flag` - Flag indicating if drones and parasites can get killed - Original scenario.
                     Default value is 'True'.

12. `max_hit_parasite` - Number of hits the parasite can take before being destroyed.
                     Default value is 1.

13. `max_hit_drone` - Number of hits the drone can take before being destroyed.
                     Default value is 1.
                     
14. `out_of_bounds_reward` - Flag indicating penelization of agents for going out of bounds (if an agent distances himself from the enviorment center, he will get negative reward).
                 Default value is 'True'.

15. `obs_dict` - Observation radius dictionary. Allows the definition of different observation radii for different agents.
                 Default values: For the first 10 entities, observation radius values are '10', and for the next 10 entities, the observation radius is 0.1.

16. `factor_dict` -  Dictionary used to set factors for manipulating the agents' observation radius.
                    All factor values are '1' by default.

17. `reward_dict` - Dictionary that configures the agents' rewards.
                    Default values: The intial reward is only for entities coliison - +10 for drones and -10 for parasites. All other rewards are set to zero.


# Default values:

All the default values are configured within a Python file named 'settings.py,' which can be located at the following path: "DroneProject/pettingzoo/mpe/simple_tag." Please be aware that any modifications made to this file will permanently alter the default environment values.

## ACTIONS

All entities in this simulator have the ability to move in four directions: up, down, left, and right. An entity can also remain static (by using action number zero). Drones possess additional actions that can be enabled or disabled using flags in the envelope settings.

These actions include:

Lamp activation/deactivation: Drones are equipped with lamps that enhance their visibility in shadowy areas and enable them to signal nearby drones. The degree to which the lamp improves a drone's observation can be adjusted using the 'obs_dict' parameter in the envelope settings.

Height adjustment: Drones feature a binary height attribute that enhances their ability to detect other drones while limiting their vision of parasites. You can modify the extent to which this feature extends a drone's vision through the 'obs_dict' parameter in the envelope settings.

Color change: Drones can change their color to communicate with nearby drones. This color change action operates cyclically, meaning that using it once will trigger a cyclical color shift. The number of colors available to the drones can be specified in the envelope settings, with a maximum limit of six colors.

Using the next function, we can print the world's action's indexes.

In [4]:
print_action_dict(env)


('0', 'stay in place')
('1', 'move left')
('2', 'move right')
('3', 'move down')
('4', 'move up')
(5, 'lamp action')
(6, 'get height action')
(7, 'change color action')


As you can see, the agent can move to any direction, also, the agent can preform the following actions: (if the corresponding flag is on)
* activate/deactivate a lamp.
 * get to higher grounds or get back to normal height.
  * change its color(if it's a drone).


Up Down Right Left movments:

In [5]:
gif_url = "gifs/UDLR.gif"
html = HTML(f'<img src="{gif_url}" />')
display(html)

Activate/Deactivate Lamp action:

In [6]:
gif_url = "gifs/lamp.gif"
html = HTML(f'<img src="{gif_url}" />')
display(html)

Ascend/Descend action(height):

In [7]:
gif_url = "gifs/height.gif"
html = HTML(f'<img src="{gif_url}" />')
display(html)

Change color action:

In [8]:
gif_url = "gifs/color.gif"
html = HTML(f'<img src="{gif_url}" />')
display(html)

## OBSERVATIONS

In this scenario, multiple agents exist in a world, each with its observations. The observations include the lamp status, whether the agent is a drone (with a color index if applicable), and a list of data about visible agents: type of agent, relative distance, and color index (if they are drones). This data enables the agents to make informed decisions based on their environment and the attributes of other agents they can see (each agent has its on radius).

The observation format:
(is the agent a drone?,lamp status, if drone - what is its color index,is it destroyed)
followed by a list of all visible agents,each entry looks like this:
(is the agent a drone?, its relative_distance,its color index).

example of an observation of a drone with lamp off and color index 2, still alive:
[True, False, 2, True, (True, array([-0.81258817, -1.22620661]), 2), (True, array([ 0.63219806, -1.53451418]), 2), (True, array([-1.91364366, -0.67692917]), 3), (True, array([-0.64812583, -0.18377225])]

after the agent takes a step in the environment, you can use env.last()[0] to get the observations of the agent.  

Each agent has an observation radius that is set using 2 objects in the environment:  
* obs_dict - sets the agents initial observation radius.  
* factor_dict - used to manipulate the agent's observation radius in specific situations.  

obs_dict: A dictionary of (key=radius, value=num_of_agents_with_radius).    
This dictionary helps set the observation radius for each agent.  
 For instance, in an enviorment with 12 agents, 4 drones and 8 parasites using this observation dictionary {3:2, 0.3:6, 1:4}:  
The first two agents(drones) have a radius of 3, the next six agents(2 drones,4 parasites) have a radius of 0.3, and the last four agents(parasites) have a radius of 1.    
 **NOTE: WHEN COUNTING THE AGENTS, DRONES COMES BEFORE PARASITES!**  
   
If the total number of agent in obs_dict is greater than the number of good agents -  
we ignore the last radiuses in the dict.  
If the total number of agent in obs_dict is smaller than the number of good agents -  
  we assume the remaining agents see in a default radius of 2.  


factor_dict: The default factor for each parameter is 1
                          The factors are multiplied by the observation radius, so it's recommended to give values that are lower than 1 for interference factors,
                          and values that are higher than 1 for improvement factors.

The 'factor_dict' factors:
* 'shadow_interference_factor' - Used when an agent is inside a landmark(shadow) and he's seeing another agent.
* 'lamp_improvement_factor' - Used when an agent lamp is on and his not inside a shadow and his seeing another agent.
* 'shadow_interference_factor' - Used when an agent sees a different agent in the shadow.
* 'light_in_shadow_factor' - Used when an agent sees another agent in the shadow and the other agent has its lamp on.
* 'height_adversary_factor' - Used when an agent is in high height, and he's seeing another drone.
* 'height_non_adversary_factor' - Used when an agent is in high height, and he's seeing a parasite.
* 'height_other_factor' - Used when an agent is in low height, and he's seeing another drone in high height.

High observation radius:

In [9]:
gif_url = "gifs/high_radius.gif"
html = HTML(f'<img src="{gif_url}" />')
display(html)

Low observation radius:

In [10]:
gif_url = "gifs/low_radius.gif"
html = HTML(f'<img src="{gif_url}" />')
display(html)

## REWARDS

Each type of agent receives different rewards based on specific scenarios that can happen.
 The rewards value for each scenario is stored in a dictionary where the scenario is the key, and the corresponding reward is the value. Let's break down the rewards for drones and parasites:

**Drone Rewards:**
- `drone_collision`: When a drone collides with a parasite. Default value is +10.
- `drone_in_shadow_lamp_on`: If a drone is in the shadow and its lamp is on.
- `drone_in_shadow_lamp_off`: If a drone is in the shadow and its lamp is off.
- `drone_lamp_active`: When a drone's lamp is active.
- `drone_lamp_off`: If a drone deactivates its lamp.
- `drone_turn_lamp_on`: When a drone successfully turns its lamp on.
- `drone_in_height`: If a drone is at a certain height.
- `drone_height_change`: When a drone changes its height.
- `drone_color_change`: If a drone changes its color.

**Parasite Rewards:**
- `parasite_collision`: When a parasite collides with a drone. Default value is -10.

**General Rewards:**
 * Agents are penalized for exiting the screen, so that they can be caught by the adversaries.



**All the rewards for which we didn't specify a default value have a default value of zero.**

You can modify the reward value for a specific scenario by first initializing a dictionary. Add a new entry for the scenarios whose rewards you want to change, and then pass this dictionary as a parameter during the environment creation.

reward_dict={}  
reward_dict["drone_collision"] = 10  
env = simple_tag_v3.env(
    ...  
    ...  
    reward_dict=reward_dict,  
    ...  
    )


## Obstacles and Shadows

By utilizing the 'num_obstacles' parameter within the envelope settings, we have the ability to introduce obstacles or shadows into the environment.

Obstacles are represented as impenetrable circles, serving as barriers within the environment.

On the other hand, shadows are depicted as penetrable circles that can disrupt the vision of entities. The degree to which shadows affect entity vision can be adjusted using the 'obs_dict' parameter within the envelope settings.

You can choose to include either obstacles or shadows by toggling the 'landmark_colide' parameter. Set it to 'True' to introduce obstacles, and 'False' to introduce shadows.

## Running the enviorment

To run the environment, you first need to have a function policy for each type of agent, drone or parasite.

A function policy has 3 parameters:
(observation, env, agent)
and it should return an action index.

The simulator can be run in two ways:
1. Uniform Policies: All drones have the same policy, and all parasites have the same policy.

To run it, use the function:
run_env(env, drone_policy=chaseParasiteAgent,parasite_policy=staticAgent):


2. Single agent: All drones follow the same policy except one drone that has its own policy. All parasites have the same policy.

To run it, use the function:
run_env_single(env,single_agent_policy,drone_policy=chaseParasiteAgent,parasite_policy=staticAgent):


3. Multiple policies -  You can assign individual policies to each agent in the environment using a dictionary; agents not specified in the dictionary will adopt a default policy.
 Example:
policy_dic={}
policy_dic["adversary_0"]=staticAgent
policy_dic["adversary_1"]=chaseParasiteAgent
policy_dic["agent_1"]=escapeFromDronesAgent

**NOTE: Drone's base name is adversary and parasites base name is agent!!**

To run it, use the function:
run_env_multi_policy(env, policy_dic, drone_policy = chaseParasiteAgent, parasite_policy = staticAgent):

# Single agent envelope

Another way to utilize the simulator is through the single-agent envelope. This envelope provides support for individual agents while imposing a fixed policy on all other entities. **The drone and parasite policies must be specified during the envelope's initialization.**

The single-agent envelope encompasses all the features of the standard environment. The key distinctions in execution are as follows:

* Step Function: In the standard environment, the step function advances a single entity. In the single-agent envelope, the step function executes the desired action on the individual agent and then carries out all the fixed policies on the other entities within the environment.

* Observe Function: This function returns the observation for the single agent. In the standard environment, this function provides the observation for a specified agent.
Note:Similarly, in the single-agent environment, there are analogous functions that yield results exclusively for the single agent, as opposed to a particular agent (e.g., action_space and observation_space functions).

* In this environment, the single agent is depicted in shades of yellow, distinguishing it from the other drones, which retain their regular shades of red coloring.


In the following GIF we can see a single agent environment, with static drones and parasites and a single drone agent.

In [11]:
gif_url = "gifs/single_env.gif"
html = HTML(f'<img src="{gif_url}" />')
display(html)

# Custom Agents - Simple given policies

The simulator features several basic policies for entity behavior:

'chaseParasiteAgent' - This agent actively pursues the nearest parasite in a greedy manner. If there are no parasites within the agent's observation radius, the agent will take random actions.

'escapeFromDronesAgent' - This agent attempts to move away from the nearest drone in a greedy manner. If no drones are present within the agent's observation radius, the agent will take random actions.

'escapeFromParasiteAgent' - This agent strives to distance itself from the closest parasite in a greedy fashion. If there are no parasites within the agent's observation radius, the agent will take random actions.

'randomAgent' - This agent selects a random action, including the static action, during each iteration.

'staticAgent' - This agent does not perform any actions.

## Example

Let's create an environment with 5 drones and 3 parasites, and no obstacles.
The drones will have a radius of 1, and parasite radius of 0.00001.
with 2 times better vision if its lamp is on and the drone is inside a shadow zone.
The agents cannot get to higher heights.
The drone will get a reward=20 for colliding with a parasite.
**We will turn the no dead flag on.**


In [12]:
my_obs_dict = {1: 5, 0.00001: 3}
factors={}
reward_dict={}
reward_dict["drone_collision"] = 20
factors["light_in_shadow_factor"] = 2.0


env = simple_tag_v3.env(
    render_mode='human',
    num_drones=5,
    num_parasites=3,
    num_obstacles=0,
    max_cycles=1000,

    num_of_possible_colors_for_agent=4,
    lamp_flag=True,
    height_flag=False,
    landmark_colide=False,
    no_dead_flag=True,
    factor_dict = factors,
    reward_dict = reward_dict,
    obs_dict=my_obs_dict,

)

Let's see our action index dictionary.

In [13]:
print_action_dict(env)

('0', 'stay in place')
('1', 'move left')
('2', 'move right')
('3', 'move down')
('4', 'move up')
(5, 'lamp action')
(6, 'change color action')


Let's proceed to execute the environment. We will employ a policy that directs drones to pursue parasites, while the parasites will exhibit random behavior.

In [14]:
env.reset()
run_env(env,chaseParasiteAgent,randomAgent)

[True, False, 0, False, (True, array([-0.36029543,  0.44754192]), 0), (True, array([ 0.90206714, -0.17826195]), 0), (True, array([-0.33140865, -0.15792403]), 0)]
[True, False, 0, False, (True, array([ 0.36029543, -0.44754192]), 0), (True, array([ 0.02888678, -0.60546595]), 0)]
[True, False, 0, False, (True, array([-0.90206714,  0.17826195]), 0), (True, array([ 0.00689125, -0.69777093]), 0), (False, array([0.23840674, 0.61683299]), 0), (False, array([ 0.09684778, -0.15981556]), 0)]
[True, False, 0, False, (True, array([-0.00689125,  0.69777093]), 0), (False, array([0.08995653, 0.53795537]), 0)]
[True, False, 0, False, (True, array([0.33140865, 0.15792403]), 0), (True, array([-0.02888678,  0.60546595]), 0), (False, array([-0.18505137, -0.83538408]), 0)]
[False, False, 0, False]
[False, False, 0, False]
[False, False, 0, False]
[True, False, 0, False, (True, array([-0.36029543,  0.44754192]), 1), (True, array([ 0.90206714, -0.17826195]), 0), (True, array([-0.33140865, -0.15792403]), 0)]
[

[True, False, 0, False, (True, array([-0.62867239, -0.39141125]), 3), (True, array([ 0.0211183 , -0.29331137]), 0), (False, array([-0.72885045, -0.3975399 ]), 0), (False, array([-0.11906254, -0.59769638]), 0)]
[True, False, 0, False, (True, array([-0.6497907 , -0.09809988]), 3), (True, array([-0.0211183 ,  0.29331137]), 0), (False, array([-0.74996875, -0.10422852]), 0), (False, array([-0.14018085, -0.30438501]), 0)]
[True, False, 0, False, (False, array([-0.13181088, -0.33165453]), 0)]
[False, False, 0, False]
[False, False, 0, False]
[False, False, 0, False]
[True, False, 3, False, (True, array([0.67158756, 0.32896347]), 0), (True, array([0.69890635, 0.06668519]), 0), (False, array([-0.08376587, -0.01497104]), 0), (False, array([ 0.513149  , -0.31161563]), 0)]
[True, False, 3, False]
[True, False, 0, False, (True, array([-0.67158756, -0.32896347]), 3), (True, array([ 0.02731879, -0.26227828]), 0), (False, array([-0.75535343, -0.34393451]), 0), (False, array([-0.15843856, -0.64057911])

: 