diff --git a/blog/2025-11-19-miles.md b/blog/2025-11-19-miles.md index 76f264f2..bc0bb7a7 100644 --- a/blog/2025-11-19-miles.md +++ b/blog/2025-11-19-miles.md @@ -5,13 +5,13 @@ date: "November 19, 2025" previewImg: /images/blog/miles/miles.jpg --- -> A journey of a thousand miles begins with a single step. +> *A journey of a thousand miles begins with a single step.* We're excited to introduce Miles, an enterprise-facing reinforcement learning framework designed for large-scale MoE training and production workloads. This introductory chapter will be the beginning of a series of tech blogs. Miles is forked from slime, the lightweight RL framework that has quietly powered many of today’s post-training pipelines and large MoE training runs. Building on slime’s foundation, Miles aims to deliver a smooth and controllable RL experience for teams that need reliability and scale in real-world deployments. -The GitHub link for Miles can be found here: https://github.com/radixark/miles +The GitHub link for Miles can be found [here](https://github.com/radixark/miles). ## 🧠 Starting Point: slime - A Lightweight and Customizable RL Framework @@ -64,7 +64,7 @@ In RL, freezing the draft model prevents it from following the target model poli ### Miscellaneous Updates -Enhance the FSDP training backend; allow deploying the rollout subsystem independently outside the framework; debug utilities such as more metrics, post-hoc analyzers, and enhancing profilers; gradually refactor the code to further enhance it; A formal mathematics (Lean) example is provided with SFT/RL scripts. +Enhance the FSDP training backend; allow deploying the rollout subsystem independently outside the framework; debug utilities such as more metrics, post-hoc analyzers, and enhancing profilers; gradually refactor the code to further enhance it; A formal mathematics (Lean) example is provided with [SFT/RL scripts](https://github.com/radixark/miles/tree/main/examples/formal_math/single_round). ## 🚧 Towards the Future: Our Roadmap