Skip to content

[rllib] Grouping of agents during parameter sharing (same policy) #7422

Closed
@lennardsnoeks

Description

@lennardsnoeks

What is your question?

My goal is to create a crowd simulation with multiple agents, where each one the agents is driven by the same policy (because the agents are homogeneous). This can be achieved with variable sharing, which is easy with the multi agent framework in the following way.

single_env = SingleAgentEnv(env_config)
obs_space = single_env.get_observation_space()
action_space = single_env.get_action_space()
config["multiagent"] = {
    "policies": {
        "policy_0": (None, obs_space, action_space, {"gamma": 0.95})
    },
    "policy_mapping_fn": lambda agent_id: "policy_0"
}

However, the agents need to know the positions of the other agents. Each agent acts through a SingleAgentEnvironment (single agents simulations work fine), which updates the position of the agent, and the positions of other agents are distributed to these agents through the MultiAgentEnvironment (don't really know if this is a correct approach, see code for current MultiAgentEnvironment).

class MultiAgentEnvironment(MultiAgentEnv):

    def __init__(self, env_config):
        self.sim_state: SimulationState = env_config["sim_state"]
        self.env_config = env_config

        self.load_agents()

    def load_agents(self):
        self.original_shared_state = copy.deepcopy(self.shared_state)

        self.agents = []
        for i, agent in self.shared_state.agents:
            self.env_config["agent_id"] = i
            self.agents.append(SingleAgentEnv(self.env_config))

    def step(self, action_dict):
        obs, rew, done, info = {}, {}, {}, {}

        for i, action in action_dict.items():
            obs[i], rew[i], done[i], info[i] = self.agents[i].step(action)
            if done[i]:
                self.dones.add(i)

        done["__all__"] = len(self.dones) > 0

        return obs, rew, done, info

    def reset(self):
        self.resetted = True
        self.dones = set()
        self.shared_state = copy.deepcopy(self.original_shared_state)

        for agent in self.agents:
            agent.load_params(self.sim_state)

        return {i: a.reset() for i, a in enumerate(self.agents)}

The problem is that these agents need to act simultaneously, so would grouping of the agents be a good idea here? (Similar to #7341) Also how would a grouping like this be applied, and would it alter the observation/action space of the policy (which is currently the same as the single agent environment)?

I plan to use curriculum learning to gradually increase the number of agents, so I'm not sure if this approach with interfere with that.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionJust a question :)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions