v0.3.4
What's Changed
- Support OAI-GPT-OSS by @foreverlms in #202
- feat: support multi-turn rl and tool call by @jingxu9x in #197
- fix: prompts payload from dataset fixed for general cases. by @lfengad in #285
- refactor: use api_client replace request url by @jingxu9x in #265
- Add cosine similarity check into context parallel test by @foreverlms in #289
- fix: For pending weight sync cmds in rollout to do them all at once. by @lfengad in #291
- fix: more cases compatible for the RLpayload format update by @lfengad in #293
- fix: Refine command fetch filter in validation case. by @lfengad in #295
- fix: Relax packing sequence test check by @lfengad in #299
- Support cp for VLMs by @kane-vln in #296
- Activation offload in policy by @foreverlms in #294
- Support gpt-oss based internvl of SFT by @kane-vln in #290
- Set p2r nccl group size default to 1. by @foreverlms in #308
- Move reward calculation to rollout worker by @lfengad in #292
- feat: custom sampler support by @lfengad in #309
- feat: custom batch sampler support and batch data loader in RL. by @lfengad in #314
- fix: add warning log for dataloader batch setting by @lfengad in #316
- [Feat] Implement GSPO by @Bin-NV in #306
- Optimize hfmodel loading by @kane-vln in #323
- Support qwen3_vl_moe sft by @kane-vln in #315
- Supporting pure text prompt for qwen2.5-vl by @zekunhao1995 in #326
- Fix HF_HOME model path by @Dinghow in #328
- Adjust samples generation for on-policy. by @foreverlms in #324
- Cache model downloads in CI by @bddppq in #332
- feat: update reference and optimizers periodically by @lfengad in #327
- feat: More metrics collection during RL by @lfengad in #330
- feat: lora alpha pattern by @xlu451 in #331
- Fix: qwen3_vl_moe data packer renaming by @kane-vln in #333
- HF way support of Qwen3-VL. by @foreverlms in #329
- fix weight version calc in on-policy by @foreverlms in #339
- feat: Refine step logic with optimization batch control by @lfengad in #338
- Disable dual streams in act-offloading by @foreverlms in #341
- feat: Reduce peak memory in P2R check process by @lfengad in #340
- Fix CI failure by @foreverlms in #344
- fix: fix unmatched token numbers in calculating token-mean loss by @lfengad in #342
New Contributors
- @Bin-NV made their first contribution in #306
- @zekunhao1995 made their first contribution in #326
Full Changelog: v0.3.3...v0.3.4