pymahjong (older version)

This is the reinforcement learning environment of Japanese Riichi Mahjong (https://en.wikipedia.org/wiki/Japanese_Mahjong), used in the research article "Variational oracle guiding for reinforcement learning" (https://openreview.net/forum?id=pjqqxepwoMy).

Authors: Dongqi Han, Tadashi Kozuno, Xufang Luo, Zhao-Yun Chen, Kenji Doya, Yuqing Yang, Dongsheng Li

This version is deprecated.

Please switch to the newer version of this envrionment (https://github.com/Agony5757/mahjong), which is ~100x times faster (by re-implementing the computations in python by C++) and of which we fixed some known issues of this version.

Usage

The environment is wrapped like a Gym environment (https://github.com/openai/gym), but it differs from a normal single-agent gym environment because Mahjong is a 4-players game with sophisticated state transitions. Please see below for detailed API.

This environment only supports Python 3.7 on Ubuntu 18.04 and Python 3.6 on Windows.

Decision flow

Decision steps were divided into two categories (drawing a tile is not considered a decision step):

step_response: when a player can make response to another player's played tile (Pong, Chi, Kan, Win (Ron), etc.).
step_play: when a player can make some decisions based on the tiles of itself (An-Kan, Ka-Kan, Win (Tsumo), etc.).

When either step for any player is needed, the corresponding player ("get_curr_player_id()") selects a valid action and then the game proceeds until the next decision step). (sometimes multiple players can respond to the same tile, then they need make decisions respectively according to counterclockwise order and then game proceeds).

For convenience to AI study, we unified all decision steps into a single "step" method. See API and Example for details

Observation

The executor observation has shape of 93 by 34, and the oracle observation has shape of 111 by 34. The first dimension corresponds to 111 features (channels), The second dimension of observation (with size 34) corresponds to 34 Mahjong tiles (the order is Character 1-9, Dot 1-9, Bamboo 1-9, East, South, West, North, White, Green, Red).

The value of any element in an observation is 1 or 0. See observation_action_explanation.pdf for details.

Action

We use 47 discrete actions to encode all possible decisions in a game. See observation_action_explanation.pdf for details.

Dataset for offline reinforcement learning / imitation learning

Please downloaded from: https://1drv.ms/u/s!AuxZyB8UeEtsgpNPCHlPEtWh75By7Q?e=lOkARM

The data after unzipping are in .mat format, which can be loaded in Python using scipy

data = scipy.io.loadmat("xxx.mat")

The loaded data is a Python dictionary contains the human demonstrations

data["X"] : executor observation
data["O"] : additional oracle observation (oracle observation is np.concatenate([data["X"], data["O"]], axis=-2))
data["A"] : Action selected
data["M"] : Valid actions (=1 means the corresponding action is valid at that step)
data["R"] : Reward
data["D"] : Done signal
data["V"] : =0 if this step is the terminated step of an episode (i.e. the status after a game comes to a result, and observations at this step were recorded); otherwise = 1.

Each batch of data corresponds to around 0.6~0.7 million steps, or around 40,000 games.

Example

For a simplest example where agents make fully random decisions (Please make sure you are using Python 3.7 on Ubuntu 18.04 or Python 3.6 on Windows)

git clone https://github.com/pymahjong/pymahjong
cd pymahjong
pip install -r requirements.txt
python env_mahjong_example.py

Know Issues

There is a common error message:

ImportError: libpython3.7m.so.1.0: cannot open shared object file: No such file or directory

Some possible ways to fix this are

sudo apt-get install libpython3.7

or

export LD_LIBRARY_PATH=/path/to/libpython/directory

API

The basic API is similar to openAI-gym, however, since Mahjong is a multi-player game with complicated state transitions, there is some difference

First to make an instance of the environment

from env_mahjong import *
env = EnvMahjong()

To init a game (episode), you need the ID of the parent player ("Oya") and game wind, e.g.,

sp = env.reset(0, 'east')  # oya ID and game wind

After an episode starts, we need to know the ID of current decison-making player, obatined by

curr_pid = env.get_curr_player_id()

The next thing to know what actions are valid at current decision step, with 2 ways

valid_actions = env.get_valid_actions(nhot=False)  # List[int] of valid actions
action_mask = env.get_valid_actions(nhot=True)  # n-hot vector of valid actions

To get executor and oracle observations,

x_executor = env.get_obs(curr_pid)
x_oracle = np.concatenate([x_executor, env.get_oracle_obs(curr_pid)], axis=-2)

The agent should select an action inside "valid_actions" using executor observation. Then, we can used the Gym-like step function to continue the game

action = decision_model(x_executor)
sp, r, done, info = env.step(curr_pid, action)

Here sp is the raw state after this decision. We do not recommend to use sp directly because game state may frequently chang during non-decison steps (e.g., drawing tiles). Instead, we should always call "env.get_obs(curr_pid)" and "env.get_oracle_obs(curr_pid)" right before usage.

We also do not recommend to use r and done during the game. Instead, we can get the payoff if a game is done:

if env.has_done():  # Should use env.has_done() instead of done from step()!!
    print("result:", np.array(env.get_payoffs()))

where "env.get_payoffs()" returns a list of payoffs of the four players.

See env_mahjong_example.py for a runnable instance.

Citation

@inproceedings{
han2022variational,
title={Variational oracle guiding for reinforcement learning},
author={Dongqi Han and Tadashi Kozuno and Xufang Luo and Zhao-Yun Chen and Kenji Doya and Yuqing Yang and Dongsheng Li},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=pjqqxepwoMy}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
MahjongPy.pyd		MahjongPy.pyd
MahjongPy.so		MahjongPy.so
README.md		README.md
env_mahjong.py		env_mahjong.py
env_mahjong_example.py		env_mahjong_example.py
observation_action_explanation.pdf		observation_action_explanation.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

MahjongPy.pyd

MahjongPy.pyd

MahjongPy.so

MahjongPy.so

README.md

README.md

env_mahjong.py

env_mahjong.py

env_mahjong_example.py

env_mahjong_example.py

observation_action_explanation.pdf

observation_action_explanation.pdf

requirements.txt

requirements.txt

Repository files navigation

pymahjong (older version)

This version is deprecated.

Usage

Decision flow

Observation

Action

Dataset for offline reinforcement learning / imitation learning

Example

Know Issues

API

Citation

About

Releases

Packages

Contributors 2

Languages

License

pymahjong/pymahjong

Folders and files

Latest commit

History

Repository files navigation

pymahjong (older version)

This version is deprecated.

Usage

Decision flow

Observation

Action

Dataset for offline reinforcement learning / imitation learning

Example

Know Issues

API

Citation

About

Resources

License

Stars

Watchers

Forks

Languages