# Simple Spread Testing

## Standard Simple Spread (Collaborative Only)

Agent observations: `[self_vel, self_pos, landmark_rel_positions, other_agent_rel_positions, communication]`
 - `self_vel = (2, )`
 - `self_pos = (2, )`
 - `landmark_rel_positions = (2 * N, )`
 - `other_agent_rel_positions = (2 * (N - 1), )`
 - `communication = (2 * (N - 1), )`

Agent action space: `[no_action, move_left, move_right, move_down, move_up] = (0-4)` 

In [None]:
from pettingzoo.mpe import simple_spread_v3

In [None]:
env = simple_spread_v3.parallel_env(N=5)
observations, infos = env.reset()
observations, infos


In [None]:
env.num_agents

In [None]:
env.state()

In [None]:
print(observations["agent_0"].shape)
observations["agent_0"]

In [None]:
env.action_space("agent_0")

In [None]:
# this is where you would insert your policy
# actions = {agent: env.action_space(agent).sample() for agent in env.agents}
actions = {agent: 0 for agent in env.agents}
observations, rewards, terminations, truncations, infos = env.step(actions)
observations, rewards, terminations, truncations, infos

## Adversarial Variant (Custom)

Agent observations: `[self_is_adversary, self_vel, self_pos, landmark_rel_positions, other_agent_is_adversary_rel_positions]`
 - `self_is_adversary = (1, )`: 0 / 1 flag
 - `self_vel = (2, )`
 - `self_pos = (2, )`
 - `landmark_rel_positions = (2 * n_landmarks, )`
 - `other_agent_is_adversary_rel_positions = ((1 + 2) * (n_agents + n_adversaries - 1), )`: 0 / 1 flag  for if that other agent is an adversary + relative position for the other agent times the number of other agents

Agent action space: `[no_action, move_left, move_right, move_down, move_up] = (0-4)` 

In [None]:
%load_ext autoreload
%autoreload 2

import simple_spread_adversarial

In [None]:
env = simple_spread_adversarial.parallel_env(n_agents=2, n_adversaries=2, n_landmarks=2)
observations, infos = env.reset()
observations, infos

In [None]:
env.num_agents, env.agents

In [None]:
env.state()

In [None]:
print(observations["agent_0"].shape)
observations["agent_0"]

In [None]:
env.action_space("agent_0")

In [None]:
# this is where you would insert your policy
# actions = {agent: env.action_space(agent).sample() for agent in env.agents}
actions = {agent: env.action_space(agent).sample() for agent in env.agents}
observations, rewards, terminations, truncations, infos = env.step(actions)
observations, rewards, terminations, truncations, infos

In [None]:
# Visualize full episode
env = simple_spread_adversarial.parallel_env(
    n_agents=2,
    n_adversaries=2,
    n_landmarks=3,
    render_mode="human"
)
observations, infos = env.reset()

while env.agents:
    # this is where you would insert your policy
    actions = {agent: env.action_space(agent).sample() for agent in env.agents}

    observations, rewards, terminations, truncations, infos = env.step(actions)
env.close()


# Testing Communication

In [None]:
temp = 3
env = simple_spread_v3.parallel_env(N=temp)
observations, infos = env.reset()
observations, infos

In [None]:
for agent in env.agents:
    observation = observations[agent]
    
    self_vel = observation[:2]
    self_pos = observation[2:4]
    idx = 4 + temp * 2
    landmark = observation[4:idx]
    idx2 = idx + (temp - 1) * 2
    other_pos = observation[idx:idx2]
    comms = observation[idx2:]
    
    print("self vel: ", self_vel)
    print("self pos: ", self_pos)
    print("landmarks: ", landmark)
    print("other players: ", other_pos)
    print("comms: ", comms)
    print("")

In [None]:
#     def observation(self, agent, world):
#         # get positions of all entities in this agent's reference frame
#         entity_pos = []
#         for entity in world.landmarks:  # world.entities:
#             entity_pos.append(entity.state.p_pos - agent.state.p_pos)
#         # communication of all other agents
#         comm = []
#         other_pos = []
#         for other in world.agents:
#             if other is agent:
#                 continue
#             comm.append(other.state.c)
#             other_pos.append(other.state.p_pos - agent.state.p_pos)
#         return np.concatenate(
#             [agent.state.p_vel] + [agent.state.p_pos] + entity_pos + other_pos + comm
#         )

Thoughts: Vary the amount of comm being transferred. By default, other pos are included outside of the comm vector. Potential Baselines: Mask landmarks / Mask other pos. Masking both doesn't make much sense as it essentially. becomes. Run on small number of iterations to learn policy. Ideas for custom defined comm vector: provide velocity of self to other agents (2 per other agent, 2N-1 like right now). alternatively, provide euclidan distance to each of the landmarks (my thinking is that it would explicitly force the agents to learn instead of learning implicitly via the reward func. The number would be N per agent). This could either be an absolute L2 distance or some binary variable. The binary variable could either be N per other agent (1 if within some parameter bound to landmark x, 0 if not) or 2 per other agent (1 if within some parameter bound to any landmark, 0 if not)

# Communciation (Custom)

In [1]:
import simple_spread_comms

IndentationError: unindent does not match any outer indentation level (simple_spread_comms.py, line 150)

In [8]:
temp = 3
env = simple_spread_comms.parallel_env(N = temp)
observations, infos = env.reset()
observations, infos

({'agent_0': array([ 0.        ,  0.        , -0.22761886,  0.9517223 , -0.6603963 ,
         -1.492733  , -0.7116871 , -0.9740574 ,  1.0357677 , -0.30283958,
          0.7488561 , -0.714137  ,  1.0947871 , -0.4342686 ,  0.        ,
          0.        ,  0.        ,  0.        ], dtype=float32),
  'agent_1': array([ 0.        ,  0.        ,  0.52123725,  0.23758529, -1.4092524 ,
         -0.77859604, -1.4605433 , -0.25992033,  0.2869115 ,  0.41129747,
         -0.7488561 ,  0.714137  ,  0.345931  ,  0.27986842,  0.        ,
          0.        ,  0.        ,  0.        ], dtype=float32),
  'agent_2': array([ 0.        ,  0.        ,  0.86716825,  0.51745373, -1.7551835 ,
         -1.0584644 , -1.8064742 , -0.5397888 , -0.0590195 ,  0.13142903,
         -1.0947871 ,  0.4342686 , -0.345931  , -0.27986842,  0.        ,
          0.        ,  0.        ,  0.        ], dtype=float32)},
 {'agent_0': {}, 'agent_1': {}, 'agent_2': {}})

In [9]:
for agent in env.agents:
    observation = observations[agent]
    
    self_vel = observation[:2]
    self_pos = observation[2:4]
    idx = 4 + temp * 2
    landmark = observation[4:idx]
    idx2 = idx + (temp - 1) * 2
    other_pos = observation[idx:idx2]
    comms = observation[idx2:]
    
    print("self vel: ", self_vel)
    print("self pos: ", self_pos)
    print("landmarks: ", landmark)
    print("other players: ", other_pos)
    print("comms: ", comms)
    print("")

self vel:  [0. 0.]
self pos:  [-0.22761886  0.9517223 ]
landmarks:  [-0.6603963  -1.492733   -0.7116871  -0.9740574   1.0357677  -0.30283958]
other players:  [ 0.7488561 -0.714137   1.0947871 -0.4342686]
comms:  [0. 0. 0. 0.]

self vel:  [0. 0.]
self pos:  [0.52123725 0.23758529]
landmarks:  [-1.4092524  -0.77859604 -1.4605433  -0.25992033  0.2869115   0.41129747]
other players:  [-0.7488561   0.714137    0.345931    0.27986842]
comms:  [0. 0. 0. 0.]

self vel:  [0. 0.]
self pos:  [0.86716825 0.51745373]
landmarks:  [-1.7551835  -1.0584644  -1.8064742  -0.5397888  -0.0590195   0.13142903]
other players:  [-1.0947871   0.4342686  -0.345931   -0.27986842]
comms:  [0. 0. 0. 0.]

