feat: add online RL with native game logic refinements #82

smly · 2026-01-30T01:22:12Z

Close #33

This PR introduces new codes by implementing an online reinforcement learning (RL) training pipeline and includes several crucial fixes and improvements to the native Rust game logic.

The new online RL codes consist of:

A UnifiedNetwork architecture combining policy (actor) and value (critic) networks.
A GlobalReplayBuffer for managing training data from self-play.
A MahjongLearner to perform online updates using PPO and CQL algorithms.
Ray-based MahjongWorker for scalable, distributed episode collection.
The train_online.py script to manage the online training loop and evaluations.

Native Rust game logic refinements address:

More accurate handling of haitei, houtei, rinshan, and chankan yaku conditions in yaku.rs and state.rs.
Enhanced Kakan action processing to include furiten checks and correct MJAI logging in state.rs.
Updates to the Agari struct in types.rs and its usage in agari_calculator.rs to correctly reflect agari shape.
General cleanup of debug statements and minor logical corrections in agari.rs, agari_calculator.rs, env.rs, and state.rs.

NOTE: The current feature encoding does not adequately represent the game state. Therefore, it cannot become a strong AI.

smly · 2026-01-30T01:29:16Z

The recommended version for developing with RiichiEnv is 3.14, but Ray does not support 3.14. How should I handle this? Let me think about it for a moment... 🤔

smly · 2026-01-30T02:42:43Z

ray's 3.14 support doesn't seem likely to be released anytime soon, so it might be better to consider an alternative approach. ray-project/ray#56434

Copilot

Pull request overview

This PR introduces an online reinforcement learning training pipeline with PPO and CQL algorithms, alongside critical fixes to the native Rust mahjong game logic for more accurate yaku conditions and agari handling.

Changes:

Added UnifiedNetwork architecture, GlobalReplayBuffer, MahjongLearner, Ray-based MahjongWorker, and train_online.py for distributed online RL training
Fixed yaku conditions for haitei, houtei, rinshan, chankan to properly check tsumo/ron context
Enhanced Kakan action processing with furiten checks and MJAI logging
Added has_agari_shape field to Agari struct to distinguish complete hands without yaku from incomplete hands

Reviewed changes

Copilot reviewed 19 out of 21 changed files in this pull request and generated 22 comments.

Show a summary per file

File	Description
uv.lock	CRITICAL: Removed ALL ML dependencies (torch, numpy, wandb, ray, etc.)
pyproject.toml	CRITICAL: Duplicate member entry, removed ml_baseline from workspace
demos/ml_baseline/pyproject.toml	CRITICAL: Changed to exact Python version, broke workspace dependency refs
native/src/yaku.rs	Fixed yaku conditions to properly check tsumo/ron for haitei/houtei/rinshan/chankan
native/src/types.rs	Added has_agari_shape field to Agari, removed unused shuntsu_counts from Hand
native/src/state.rs	BUG: Kakan MJAI logging before chankan check; state mutation in query method
native/src/agari_calculator.rs	Updated to return has_agari_shape
demos/ml_baseline/*.py	New online RL training code - non-functional due to missing dependencies
tests/env/agari/test_chankan.py	Weakened assertion from exact to membership check
tools/mjsoul-scoring-validation/main.py	Changed data path (breaking change)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

demos/ml_baseline/pyproject.toml

native/src/state.rs

native/src/agari_calculator.rs

demos/ml_baseline/train_online.py

demos/ml_baseline/test_online.py

demos/ml_baseline/train_online.py

demos/ml_baseline/buffer.py

demos/ml_baseline/train_online.py

smly added 4 commits January 28, 2026 17:22

chore: update python dependencies and uv lock

cf04c84

feat: enhance online learning capabilities and model components

3c23118

perf: optimize replay buffer storage and data ingestion

f8d9c66

fix: refine agari calculation conditions and kakan chankan furiten

54bb2da

smly self-assigned this Jan 30, 2026

smly added the enhancement New feature or request label Jan 30, 2026

smly added this to the v0.3.0 milestone Jan 30, 2026

chore: clean up ml baseline actor model and buffer files

7da1dce

chore: adjust workspace members in root pyproject.toml

83c94e7

smly requested a review from Copilot January 30, 2026 13:41

Copilot started reviewing on behalf of smly January 30, 2026 13:41 View session

Copilot AI reviewed Jan 30, 2026

View reviewed changes

smly added 11 commits January 30, 2026 17:18

chore: clean up train_online.py imports and exception handling

1b93917

chore: remove unused torch import from train_online.py

7e31537

chore: clean up unused imports and variable assignments in ml baseline

41397d0

fix: missing import

f356045

fix: random agent is not used

e8609ab

chore: remove unused lines

8d309a7

fix: remove duplicated line

a4a75c4

refactor: change _get_legal_actions_internal to take immutable self

b9007c5

docs: add comment to agari calculator explaining agari shape check

87e74a1

refactor: clean up comments and debug prints in learner.py

d21961c

chore: improve debugging output for model state dict loading

876cba0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add online RL with native game logic refinements #82

feat: add online RL with native game logic refinements #82

Uh oh!

smly commented Jan 30, 2026 •

edited

Loading

Uh oh!

smly commented Jan 30, 2026

Uh oh!

smly commented Jan 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: add online RL with native game logic refinements #82

Are you sure you want to change the base?

feat: add online RL with native game logic refinements #82

Uh oh!

Conversation

smly commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

smly commented Jan 30, 2026

Uh oh!

smly commented Jan 30, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

smly commented Jan 30, 2026 •

edited

Loading