v0.0.16
What's Changed
- chore: cp313 by @UranusSeven in #101
- fix: allow configurable grpc memssage size by @Kevin-XiongC in #103
- refactor(sglang): remove pipeline parallelism support from radix cache by @jimmy-evo in #104
- perf(save): optimize save path with batched GPU copies and hash-first insert by @xiaguan in #106
- feat(connector): add MLA support for vLLM KV connector by @xiaguan in #107
- refactor(trace): replace cfg boilerplate with trace helper macros and add sampling by @xiaguan in #108
- Add MetaServer for multi-node block hash coordination by @jimmy-evo in #88
- fix(connector): use actual layer count for DSA models by @wz1qqx in #112
- test(core): add block_key_copy benchmark by @jimmy-evo in #114
- chore: extract python-binding and server-ops skills from CLAUDE.md by @xiaguan in #115
- fix(python): restore gRPC client message size limits by @wz1qqx in #116
- feat(core): support SSD alignment padding for unaligned block sizes by @xiaguan in #117
- fix(core): per-slot NUMA affinity for SSD prefetch allocation by @xiaguan in #119
- feat: pegaflow-transfer RDMA path with Mooncake-aligned perf by @xiaguan in #102
- fix(core): fix inflight bytes overcount and move insert worker to std::thread by @xiaguan in #122
- refactor(core): tighten visibility and remove dead code by @xiaguan in #125
- test(core): add GPU integration tests for PegaEngine by @xiaguan in #127
- feat(connector): add DCP/PCP context-parallel support for vLLM KV connector by @Kevin-XiongC in #126
- feat(connector): add cross-layer block support by @xiaguan in #128
- feat(connector): enable cross-layer blocks by default by @xiaguan in #129
- chore: bump version to 0.0.15 by @xiaguan in #133
- chore: bump version to 0.0.16 by @xiaguan in #134
New Contributors
- @UranusSeven made their first contribution in #101
- @Kevin-XiongC made their first contribution in #103
Full Changelog: v0.0.15...v0.0.16