-
Notifications
You must be signed in to change notification settings - Fork 1
rl_env.py
Contains the abstract Environment and Agent classes, as well as the HanabiEnv class.
This class is the reinforcement learning interface to DeepMind's Hanabi environment.
environment = rl_env.make()
config = { 'players': 5 }
observation = environment.reset(config)
while not done:
# Agent takes action
action = ...
# Environment take a step
observation, reward, done, info = environment.step(action)
Each player's observation is a dictionary. Find specific keys and values in rl_env.py
Stored in class HanabiEnv's self.hist
The action history format for two players is as follows:
[Agent0's history, Agent1's history]
Agent0's history:
[first_move_done_by_agent0, second_... , ... , most_recent_move_done_by_agent0]
Move encoding for a 2-player game:
Moves are encoded in a dictionary.
action_type = {'PLAY', 'DISCARD', 'REVEAL_COLOR', 'REVEAL_RANK'}
card_index = {0, 1, 2, 3, 4} # Index of card that was played or discarded.
color = {'B', 'G', 'R', 'W', 'Y'} # Color of card(s) that was hinted.
rank = {0, 1, 2, 3, 4} # Rank (number) of card(s) that was hinted.
target_offset = {1} # Specifies agent that was targeted by hint.
indices_affected = {0, 1, 2, 3, 4} # Positions of the hand that were affected by a hint. Can be multiple.