v0.10.3a2
Pre-release
Pre-release
What's Changed
- other(test): improve worker test coverage by @rebel-jinhwan in #517
- other(tests): tests for bucketing manager by @huijjj in #521
- refactor(scheduler): delay caching instead of undoing by @rebel-jaehwang in #525
- fix(encoder): use RBLNClassifierPooler for classification models by @rebel-jonghewk in #526
- fix: remove unused util func by @rebel-kblee in #530
- fix(config): ensure max_num_batched_tokens >= max_source_positions for enc-dec by @rebel-jonghewk in #527
- fix(core): deduplicate KV cache inputs for torch.export compatibility by @rebel-chanheo in #524
- fix(core): handle meta tensors in KV cache storage key computation by @rebel-chanheo in #533
- fix: skip NUMA cpu affinity on bare metal to prevent 32x latency regression by @rebel-jonghewk in #529
- fix(gemma3): use non-negative sentinel for IMG_PAD_TOKEN_ID by @rebel-jonghewk in #528
- fix(model): fix moe model attribute by @rebel-kblee in #531
- fix: use num_tokens_no_spec in optimum model runner by @rebel-seinpark in #536
- fix(encoder): fix T5EncoderModel scoring mismatch after v0.18.1 bump by @rebel-jonghewk in #532
New Contributors
- @rebel-chanheo made their first contribution in #524
Full Changelog: v0.10.3a1...v0.10.3a2