Skip to content

jtwwang edited this page Jul 27, 2019 · 7 revisions

File description

Contains the abstract Environment and Agent classes, as well as the HanabiEnv class.

HanabiEnv class

This class is the reinforcement learning interface to DeepMind's Hanabi environment.


environment = rl_env.make()  
config = { 'players': 5 }  
observation = environment.reset(config)  
while not done:  
    # Agent takes action  
    action =  ...  
    # Environment take a step  
    observation, reward, done, info = environment.step(action)  


Each player's observation is a dictionary. Find specific keys and values in

Action History

Stored in class HanabiEnv's self.hist
The action history format for two players is as follows:
[Agent0's history, Agent1's history]

Agent0's history:
[first_move_done_by_agent0, second_... , ... , most_recent_move_done_by_agent0]

Move encoding for a 2-player game:
Moves are encoded in a dictionary.
action_type = {'PLAY', 'DISCARD', 'REVEAL_COLOR', 'REVEAL_RANK'}
card_index = {0, 1, 2, 3, 4} # Index of card that was played or discarded.
color = {'B', 'G', 'R', 'W', 'Y'} # Color of card(s) that was hinted.
rank = {0, 1, 2, 3, 4} # Rank (number) of card(s) that was hinted.
target_offset = {1} # Specifies agent that was targeted by hint.
indices_affected = {0, 1, 2, 3, 4} # Positions of the hand that were affected by a hint. Can be multiple.