Release v0.10.2 · RBLN-SW/vllm-rbln

What's Changed

other: sync dev with main by @rebel-eunji in #421
fix(kernel): added sinks as parameters for attention and causal attention by @rebel-jindol21 in #422
release: v0.10.1post1 by @rebel-eunji in #424
other: Merge pull request #424 from RBLN-SW/dev by @rebel-eunji in #425
feature(spec-dec): initial works on ngram and suffix by @huijjj in #408
fix: add sink argument for swa prefill by @rebel-jaehunryu in #427
fix: add sink argument for swa prefill by @rebel-eunji in #428
other: sync with dev by @rebel-eunji in #429
other: sync main with dev by @rebel-eunji in #432
other: sync dev with main by @rebel-eunji in #433
fix: update format of auto-created PR title by @rebel-seinpark in #426
model: support minimax by @rebel-kblee in #435
fix: guide what to do on oom by @rebel-jaehwang in #437
fix: update batch attention logic to handle padding and size conditions by @rebel-jaehunryu in #436
fix(moe): order of token mask by @rebel-kblee in #441
fix(context): Quickfix for global context by @rebel-yskim in #444
fix: enable to turn off RBLN_FORCE_CCL_ASYNC in debugging by @rebel-eunji in #446
fix: input ref for pr ci by @rebel-seinpark in #447
other: auto-update optimum-rbln to 0.10.2a0 by @rebel-develop in #452
feature(context): always use global_ctx by @rebel-yskim in #413
fix(specdec): remove block size limitation by @junstar92 in #440
other: replace vLLM GPU with CPU dependency by @rebel-eunji in #439
fix: set num_threads in TP=1 & DP=1, and change env variable by @rebel-yskim in #453
fix: hotfix for dependencies by @rebel-seinpark in #455
fix: use cpu tensor when not compiling by @rebel-jaehwang in #457
other: example script to test prefix cache by @rebel-jaehwang in #423
fix: lm_head initialization for Qwen2 dense model by @rebel-ykchoi in #415
feature: add qwen3 series to PR CI by @rebel-seinpark in #458
fix: pyproject.toml by @rebel-seinpark in #461
fix: fix RSD context issue in torch-rbln environment by @rebel-ykchoi in #463
feature(specdec): support Medusa on RBLN backend by @junstar92 in #454
fix: fp8 available memory estimation by @rebel-mhkang in #465
other: auto-update optimum-rbln to 0.10.2a2 by @rebel-develop in #464
other: auto-update optimum-rbln to 0.10.2a3 by @rebel-develop in #467
fix: free local block table id in _update_states function by @rebel-eunji in #460
fix: param of cache_config to align with vllm by @rebel-eunji in #456
feature: compile optimum model in vLLM if not present by @rebel-eunji in #384
model: support for Qwen3-VL models by @rebel-eunji in #468
feature: add E2E performance tracker by @rebel-eunji in #469
feature: set rbln_config using additional_config by @rebel-eunji in #470
other: auto-update optimum-rbln to 0.10.2a4 by @rebel-develop in #478
other: auto-update optimum-rbln to 0.10.2 by @rebel-develop in #487
release: v0.10.2 by @rebel-seinpark in #488

New Contributors

@rebel-kblee made their first contribution in #435
@rebel-mhkang made their first contribution in #465

Full Changelog: v0.10.1...v0.10.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.10.2

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

New Contributors

Contributors

Uh oh!