# DRAMA at the PettingZoo: Dynamically Restricted Action Spaces for Multi-Agent Reinforcement Learning Frameworks

This notebook demonstrates the basic functionality of restrictions, restrictors and restriction wrappers as described in _Oesterle et al. (2024): DRAMA at the PettingZoo: Dynamically Restricted Action Spaces for Multi-Agent Reinforcement Learning Frameworks_. More detailed examples can be found in the respective notebooks at `./examples/`, and the full documentation is available at XXX.

## Imports

In [22]:
import numpy as np

from gymnasium.spaces import Discrete, Box, Space
from pettingzoo import AECEnv
from pettingzoo.classic import rps_v2
from drama import DiscreteSetRestriction, IntervalUnionRestriction, DiscreteSetActionSpace, Restrictor, RestrictionWrapper, RestrictorActionSpace
from examples.utils import play

## Basic usage of restrictions

Restrictions are subsets of `gym.Space`s. They are initialized with a base space and offer the same methods as a `gym.Space`, in particular `contains(x)` and `sample()`.

In [17]:
restriction = DiscreteSetRestriction(base_space=Discrete(10))
print(restriction)
restriction.remove(3)
restriction.remove(5)
print(restriction)
restriction.add(2)
restriction.add(3)
print(restriction.contains(8))
print(restriction.contains(5))

DiscreteSetRestriction({0, 1, 2, 3, 4, 5, 6, 7, 8, 9})
DiscreteSetRestriction({0, 1, 2, 4, 6, 7, 8, 9})
True
False


In [18]:
restriction = IntervalUnionRestriction(base_space=Box(0, 10))
print(restriction)
restriction.remove(3, 6)
print(restriction)
restriction.add(2, 4)
print(restriction.contains(3))
print(restriction.contains(5))

IntervalUnionRestriction([(0.0, 10.0)])
IntervalUnionRestriction([(0.0, 3.0), (6.0, 10.0)])
True
False


## Example: Rock-Paper-Scissors

In this example, we build a restriction wrapper around the _Rock-Paper-Scissors_ environment (`rps_v2`) of `pettingzoo`. 

- The restrictor prevents each player from repeating an action, i.e., it observes the player's last move and excludes this action from the set of allowed actions.
- The agents simply choose a random action from the allowed set.
- The `RestrictionWrapper` wraps the environment (including its agents) and one or more `Restrictor`s. The agent-environment cycle (AEC) is extended by the wrapper such that a restriction is created before each agent's action by the respective restrictor. The agent then observes not only the original observation, but also the restriction, and can act according to this additional information.

In [58]:
class RPSRestrictor(Restrictor):
    def preprocess_observation(self, env: AECEnv):
        env = env.unwrapped
        return env.state[env.agent_selection]
    
    def act(self, observation: Space) -> RestrictorActionSpace:
        return DiscreteSetRestriction(base_space=self.action_space.base_space, allowed_actions=set(range(3)) - {observation})

In [59]:
env = rps_v2.env(num_actions=3, max_cycles=10)
restrictor = RPSRestrictor(Discrete(4), DiscreteSetActionSpace(base_space=Discrete(3)))
wrapper = RestrictionWrapper(env, restrictor)

def rps_policy(obs):
    _, restriction = obs['observation'], obs['restriction']
    return np.random.choice(restriction)

policies = {'player_0': rps_policy, 'player_1': rps_policy, 'restrictor_0': restrictor.act}

## Execution

In [60]:
play(wrapper, policies, record_trajectory=True)

Unnamed: 0,agent,observation,reward,termination,truncation,info,action
0,restrictor_0,3,0.0,False,False,{},"DiscreteSetRestriction({0, 1, 2})"
1,player_0,"{'observation': 3, 'restriction': [0, 1, 2]}",0.0,False,False,{},1
2,restrictor_0,3,0.0,False,False,{},"DiscreteSetRestriction({0, 1, 2})"
3,player_1,"{'observation': 3, 'restriction': [0, 1, 2]}",0.0,False,False,{},0
4,restrictor_0,1,0.0,False,False,{},"DiscreteSetRestriction({0, 2})"
5,player_0,"{'observation': 0, 'restriction': [0, 2]}",1.0,False,False,{},0
6,restrictor_0,3,-1.0,False,False,{},"DiscreteSetRestriction({0, 1, 2})"
7,player_1,"{'observation': 1, 'restriction': [0, 1, 2]}",-1.0,False,False,{},1
8,restrictor_0,0,0.0,False,False,{},"DiscreteSetRestriction({1, 2})"
9,player_0,"{'observation': 1, 'restriction': [1, 2]}",-1.0,False,False,{},1
