-
Notifications
You must be signed in to change notification settings - Fork 1
feat: add online RL with native game logic refinements #82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
The recommended version for developing with RiichiEnv is 3.14, but Ray does not support 3.14. How should I handle this? Let me think about it for a moment... 🤔 |
|
ray's 3.14 support doesn't seem likely to be released anytime soon, so it might be better to consider an alternative approach. ray-project/ray#56434 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces an online reinforcement learning training pipeline with PPO and CQL algorithms, alongside critical fixes to the native Rust mahjong game logic for more accurate yaku conditions and agari handling.
Changes:
- Added UnifiedNetwork architecture, GlobalReplayBuffer, MahjongLearner, Ray-based MahjongWorker, and train_online.py for distributed online RL training
- Fixed yaku conditions for haitei, houtei, rinshan, chankan to properly check tsumo/ron context
- Enhanced Kakan action processing with furiten checks and MJAI logging
- Added
has_agari_shapefield to Agari struct to distinguish complete hands without yaku from incomplete hands
Reviewed changes
Copilot reviewed 19 out of 21 changed files in this pull request and generated 22 comments.
Show a summary per file
| File | Description |
|---|---|
| uv.lock | CRITICAL: Removed ALL ML dependencies (torch, numpy, wandb, ray, etc.) |
| pyproject.toml | CRITICAL: Duplicate member entry, removed ml_baseline from workspace |
| demos/ml_baseline/pyproject.toml | CRITICAL: Changed to exact Python version, broke workspace dependency refs |
| native/src/yaku.rs | Fixed yaku conditions to properly check tsumo/ron for haitei/houtei/rinshan/chankan |
| native/src/types.rs | Added has_agari_shape field to Agari, removed unused shuntsu_counts from Hand |
| native/src/state.rs | BUG: Kakan MJAI logging before chankan check; state mutation in query method |
| native/src/agari_calculator.rs | Updated to return has_agari_shape |
| demos/ml_baseline/*.py | New online RL training code - non-functional due to missing dependencies |
| tests/env/agari/test_chankan.py | Weakened assertion from exact to membership check |
| tools/mjsoul-scoring-validation/main.py | Changed data path (breaking change) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Close #33
This PR introduces new codes by implementing an online reinforcement learning (RL) training pipeline and includes several crucial fixes and improvements to the native Rust game logic.
The new online RL codes consist of:
UnifiedNetworkarchitecture combining policy (actor) and value (critic) networks.GlobalReplayBufferfor managing training data from self-play.MahjongLearnerto perform online updates using PPO and CQL algorithms.MahjongWorkerfor scalable, distributed episode collection.train_online.pyscript to manage the online training loop and evaluations.Native Rust game logic refinements address:
haitei,houtei,rinshan, andchankanyaku conditions inyaku.rsandstate.rs.Kakanaction processing to include furiten checks and correct MJAI logging instate.rs.Agaristruct intypes.rsand its usage inagari_calculator.rsto correctly reflect agari shape.agari.rs,agari_calculator.rs,env.rs, andstate.rs.NOTE: The current feature encoding does not adequately represent the game state. Therefore, it cannot become a strong AI.