From 90ef20b65d5b31f39c892b203b1b398be29e34e4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E8=B5=B5=E6=99=A8=E9=98=B3?= Date: Tue, 18 Nov 2025 23:21:54 -0800 Subject: [PATCH] Update markdown formatting and links in blog post --- blog/2025-11-19-miles.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/blog/2025-11-19-miles.md b/blog/2025-11-19-miles.md index 76f264f2..bc0bb7a7 100644 --- a/blog/2025-11-19-miles.md +++ b/blog/2025-11-19-miles.md @@ -5,13 +5,13 @@ date: "November 19, 2025" previewImg: /images/blog/miles/miles.jpg --- -> A journey of a thousand miles begins with a single step. +> *A journey of a thousand miles begins with a single step.* We're excited to introduce Miles, an enterprise-facing reinforcement learning framework designed for large-scale MoE training and production workloads. This introductory chapter will be the beginning of a series of tech blogs. Miles is forked from slime, the lightweight RL framework that has quietly powered many of today’s post-training pipelines and large MoE training runs. Building on slime’s foundation, Miles aims to deliver a smooth and controllable RL experience for teams that need reliability and scale in real-world deployments. -The GitHub link for Miles can be found here: https://github.com/radixark/miles +The GitHub link for Miles can be found [here](https://github.com/radixark/miles). ## 🧠 Starting Point: slime - A Lightweight and Customizable RL Framework @@ -64,7 +64,7 @@ In RL, freezing the draft model prevents it from following the target model poli ### Miscellaneous Updates -Enhance the FSDP training backend; allow deploying the rollout subsystem independently outside the framework; debug utilities such as more metrics, post-hoc analyzers, and enhancing profilers; gradually refactor the code to further enhance it; A formal mathematics (Lean) example is provided with SFT/RL scripts. +Enhance the FSDP training backend; allow deploying the rollout subsystem independently outside the framework; debug utilities such as more metrics, post-hoc analyzers, and enhancing profilers; gradually refactor the code to further enhance it; A formal mathematics (Lean) example is provided with [SFT/RL scripts](https://github.com/radixark/miles/tree/main/examples/formal_math/single_round). ## 🚧 Towards the Future: Our Roadmap