Release v0.10.4 · RBLN-SW/vllm-rbln

What's Changed

other: sync main with dev by @rebel-eunji in #580
feature(sub-block): emit sub-block-granular KV cache events by @rebel-jaehwang in #553
fix(core): account for draft model memory in speculative decoding by @junstar92 in #546
other(ci): dispatch Merge CI on push to dev by @rebel-jinhwan in #570
fix(lora): sync with vLLM 0.18.0 and update LoRA tests by @junstar92 in #504
fix(pooling): align with vllm 0.18 PoolingMetadata + LLM API by @rebel-jinhwan in #582
other: trigger fsw-integration nightly e2e via repository_dispatch by @rebel-jinhwan in #545
fix(test): e2e pytest CI bugfix of lora oom and platform path mismatch by @rebel-jinhwan in #586
fix: handle chunked prefill with speculative decoding by @junstar92 in #554
fix(kv_connector): prevent double finalizing kv connector when using spec decoding by @rebel-jinhwan in #575
other: log KV cache layout, warm-up phases, rbln backend invocations by @rebel-jaehwang in #589
refactor: compile optimum models internally by @rebel-eunji in #538
feature(core): "rbln" device tensor by @rebel-jonghewk in #548
fix: rbln_config name(device) in from_optimum by @rebel-seinpark in #598
fix(model): add moe custom op args - scoring_func by @rebel-thkim in #590
other: auto-update optimum-rbln to 0.10.4a0 by @rebel-develop in #600
feature: add no_export_fallback mode by @rebel-daeyang in #601
fix: ci log level by @rebel-seinpark in #607
fix(disagg_encoder): handle mixed input scenario and fix potential memory leak by @rebel-yskim in #608
fix: allow the block size to be omitted for enc & enc-dec models by @rebel-eunji in #602
fix(platform): set device attrs at class definition for spawn compatibility by @rebel-jaehwang in #609
fix: sub-block cache copy compat with device tensor by @rebel-jaehwang in #611
refactor: deduplicate _without_outlier with TypeVar by @rebel-eunji in #606
fix(whisper): remove workaround INVALID_TOKEN and add missing feature by @rebel-eunji in #597
fix(sampler): default temperature to 1.0 and torch.zeros to avoid NaN logits by @rebel-eunji in #615
other: auto-update optimum-rbln to 0.10.4a1 by @rebel-develop in #617
feature: add vmemory performance metrics by @rebel-daeyang in #587
fix(sampler): decouple greedy and topk-topp sampling by @rebel-eunji in #571
other: auto-update optimum-rbln to 0.10.4rc0 by @rebel-develop in #620
fix: add warmup config and fix logits dtype casting in apply_temperature by @rebel-eunji in #621
fix(sampler): guard padded-row sampling tensors against torch.empty garbage by @rebel-eunji in #623
other(platform): default max_num_seqs to 1 by @rebel-eunji in #625
other: auto-update optimum-rbln to 0.10.4 by @rebel-develop in #633
release: v0.10.4 by @rebel-seinpark in #634

New Contributors

@rebel-daeyang made their first contribution in #601

Full Changelog: v0.10.3...v0.10.4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.10.4

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

New Contributors

Contributors

Uh oh!