Description
What is your question?
My goal is to create a crowd simulation with multiple agents, where each one the agents is driven by the same policy (because the agents are homogeneous). This can be achieved with variable sharing, which is easy with the multi agent framework in the following way.
single_env = SingleAgentEnv(env_config)
obs_space = single_env.get_observation_space()
action_space = single_env.get_action_space()
config["multiagent"] = {
"policies": {
"policy_0": (None, obs_space, action_space, {"gamma": 0.95})
},
"policy_mapping_fn": lambda agent_id: "policy_0"
}
However, the agents need to know the positions of the other agents. Each agent acts through a SingleAgentEnvironment (single agents simulations work fine), which updates the position of the agent, and the positions of other agents are distributed to these agents through the MultiAgentEnvironment (don't really know if this is a correct approach, see code for current MultiAgentEnvironment).
class MultiAgentEnvironment(MultiAgentEnv):
def __init__(self, env_config):
self.sim_state: SimulationState = env_config["sim_state"]
self.env_config = env_config
self.load_agents()
def load_agents(self):
self.original_shared_state = copy.deepcopy(self.shared_state)
self.agents = []
for i, agent in self.shared_state.agents:
self.env_config["agent_id"] = i
self.agents.append(SingleAgentEnv(self.env_config))
def step(self, action_dict):
obs, rew, done, info = {}, {}, {}, {}
for i, action in action_dict.items():
obs[i], rew[i], done[i], info[i] = self.agents[i].step(action)
if done[i]:
self.dones.add(i)
done["__all__"] = len(self.dones) > 0
return obs, rew, done, info
def reset(self):
self.resetted = True
self.dones = set()
self.shared_state = copy.deepcopy(self.original_shared_state)
for agent in self.agents:
agent.load_params(self.sim_state)
return {i: a.reset() for i, a in enumerate(self.agents)}
The problem is that these agents need to act simultaneously, so would grouping of the agents be a good idea here? (Similar to #7341) Also how would a grouping like this be applied, and would it alter the observation/action space of the policy (which is currently the same as the single agent environment)?
I plan to use curriculum learning to gradually increase the number of agents, so I'm not sure if this approach with interfere with that.