Skip to content

Conversation

@smly
Copy link
Owner

@smly smly commented Jan 30, 2026

Close #33

This PR introduces new codes by implementing an online reinforcement learning (RL) training pipeline and includes several crucial fixes and improvements to the native Rust game logic.

The new online RL codes consist of:

  • A UnifiedNetwork architecture combining policy (actor) and value (critic) networks.
  • A GlobalReplayBuffer for managing training data from self-play.
  • A MahjongLearner to perform online updates using PPO and CQL algorithms.
  • Ray-based MahjongWorker for scalable, distributed episode collection.
  • The train_online.py script to manage the online training loop and evaluations.

Native Rust game logic refinements address:

  • More accurate handling of haitei, houtei, rinshan, and chankan yaku conditions in yaku.rs and state.rs.
  • Enhanced Kakan action processing to include furiten checks and correct MJAI logging in state.rs.
  • Updates to the Agari struct in types.rs and its usage in agari_calculator.rs to correctly reflect agari shape.
  • General cleanup of debug statements and minor logical corrections in agari.rs, agari_calculator.rs, env.rs, and state.rs.
image

NOTE: The current feature encoding does not adequately represent the game state. Therefore, it cannot become a strong AI.

@smly smly self-assigned this Jan 30, 2026
@smly smly added the enhancement New feature or request label Jan 30, 2026
@smly smly added this to the v0.3.0 milestone Jan 30, 2026
@smly
Copy link
Owner Author

smly commented Jan 30, 2026

The recommended version for developing with RiichiEnv is 3.14, but Ray does not support 3.14. How should I handle this? Let me think about it for a moment... 🤔

@smly
Copy link
Owner Author

smly commented Jan 30, 2026

ray's 3.14 support doesn't seem likely to be released anytime soon, so it might be better to consider an alternative approach. ray-project/ray#56434

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an online reinforcement learning training pipeline with PPO and CQL algorithms, alongside critical fixes to the native Rust mahjong game logic for more accurate yaku conditions and agari handling.

Changes:

  • Added UnifiedNetwork architecture, GlobalReplayBuffer, MahjongLearner, Ray-based MahjongWorker, and train_online.py for distributed online RL training
  • Fixed yaku conditions for haitei, houtei, rinshan, chankan to properly check tsumo/ron context
  • Enhanced Kakan action processing with furiten checks and MJAI logging
  • Added has_agari_shape field to Agari struct to distinguish complete hands without yaku from incomplete hands

Reviewed changes

Copilot reviewed 19 out of 21 changed files in this pull request and generated 22 comments.

Show a summary per file
File Description
uv.lock CRITICAL: Removed ALL ML dependencies (torch, numpy, wandb, ray, etc.)
pyproject.toml CRITICAL: Duplicate member entry, removed ml_baseline from workspace
demos/ml_baseline/pyproject.toml CRITICAL: Changed to exact Python version, broke workspace dependency refs
native/src/yaku.rs Fixed yaku conditions to properly check tsumo/ron for haitei/houtei/rinshan/chankan
native/src/types.rs Added has_agari_shape field to Agari, removed unused shuntsu_counts from Hand
native/src/state.rs BUG: Kakan MJAI logging before chankan check; state mutation in query method
native/src/agari_calculator.rs Updated to return has_agari_shape
demos/ml_baseline/*.py New online RL training code - non-functional due to missing dependencies
tests/env/agari/test_chankan.py Weakened assertion from exact to membership check
tools/mjsoul-scoring-validation/main.py Changed data path (breaking change)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Provide RL Training Script Sample

2 participants