v0.10.2
What's Changed
- other: sync dev with main by @rebel-eunji in #421
- fix(kernel): added sinks as parameters for attention and causal attention by @rebel-jindol21 in #422
- release: v0.10.1post1 by @rebel-eunji in #424
- other: Merge pull request #424 from RBLN-SW/dev by @rebel-eunji in #425
- feature(spec-dec): initial works on ngram and suffix by @huijjj in #408
- fix: add sink argument for swa prefill by @rebel-jaehunryu in #427
- fix: add sink argument for swa prefill by @rebel-eunji in #428
- other: sync with dev by @rebel-eunji in #429
- other: sync main with dev by @rebel-eunji in #432
- other: sync dev with main by @rebel-eunji in #433
- fix: update format of auto-created PR title by @rebel-seinpark in #426
- model: support minimax by @rebel-kblee in #435
- fix: guide what to do on oom by @rebel-jaehwang in #437
- fix: update batch attention logic to handle padding and size conditions by @rebel-jaehunryu in #436
- fix(moe): order of token mask by @rebel-kblee in #441
- fix(context): Quickfix for global context by @rebel-yskim in #444
- fix: enable to turn off RBLN_FORCE_CCL_ASYNC in debugging by @rebel-eunji in #446
- fix: input ref for pr ci by @rebel-seinpark in #447
- other: auto-update optimum-rbln to 0.10.2a0 by @rebel-develop in #452
- feature(context): always use global_ctx by @rebel-yskim in #413
- fix(specdec): remove block size limitation by @junstar92 in #440
- other: replace vLLM GPU with CPU dependency by @rebel-eunji in #439
- fix: set num_threads in TP=1 & DP=1, and change env variable by @rebel-yskim in #453
- fix: hotfix for dependencies by @rebel-seinpark in #455
- fix: use cpu tensor when not compiling by @rebel-jaehwang in #457
- other: example script to test prefix cache by @rebel-jaehwang in #423
- fix: lm_head initialization for Qwen2 dense model by @rebel-ykchoi in #415
- feature: add qwen3 series to PR CI by @rebel-seinpark in #458
- fix: pyproject.toml by @rebel-seinpark in #461
- fix: fix RSD context issue in torch-rbln environment by @rebel-ykchoi in #463
- feature(specdec): support Medusa on RBLN backend by @junstar92 in #454
- fix: fp8 available memory estimation by @rebel-mhkang in #465
- other: auto-update optimum-rbln to 0.10.2a2 by @rebel-develop in #464
- other: auto-update optimum-rbln to 0.10.2a3 by @rebel-develop in #467
- fix: free local block table id in _update_states function by @rebel-eunji in #460
- fix: param of cache_config to align with vllm by @rebel-eunji in #456
- feature: compile optimum model in vLLM if not present by @rebel-eunji in #384
- model: support for Qwen3-VL models by @rebel-eunji in #468
- feature: add E2E performance tracker by @rebel-eunji in #469
- feature: set rbln_config using additional_config by @rebel-eunji in #470
- other: auto-update optimum-rbln to 0.10.2a4 by @rebel-develop in #478
- other: auto-update optimum-rbln to 0.10.2 by @rebel-develop in #487
- release: v0.10.2 by @rebel-seinpark in #488
New Contributors
- @rebel-kblee made their first contribution in #435
- @rebel-mhkang made their first contribution in #465
Full Changelog: v0.10.1...v0.10.2