Release Summary: Version 0.6.0

This release introduces significant capabilities for multimodal benchmarking, OTel/Weka trace & session replays, and major improvements to streaming measurement correctness and robustness.

1. Major Features

Multimodal Benchmarking & Vision Datasets

Multimodal Payloads: Added native support for generating and benchmark-testing image, video, and audio inputs (#450, #477).
VisionArena Dataset: Integrated the VisionArena dataset/generator format out of the box to validate multimodal visual reasoning workflows (#525).
Prefix Caching for Media: Fixed cache key hashing for media inputs in shared-prefix workloads (#485).

Advanced Session & Trace Replays (OTel & Weka)

Trace Replay Engines: Added support for replaying OpenTelemetry (OTel) and Weka JSON transaction traces (#468, #550).
Agentic session features: Added tool-call simulation, session-to-worker affinity, and session context recovery protocols to replicate real-world multi-turn conversational agents.
Filtering & Personalization: Enabled lambda-based query filters for trace records (#507) and injected custom headers for multi-tenant and session routing (#504, #523).

2. Metrics, Correctness & Streaming Enhancements

Server-Sourced Token Counts: Added server-sourced usage statistics (e.g. prompt cached tokens, completion tokens) to eliminate discrepancies from client-side re-tokenization (#473, #565).
SSE Stream Correctness: Addressed inter-token latency (ITL) deflation caused by leading BOS tokens (#566) and resolved SSE stream parsing issues for failed request bodies (#495, #530).
CLI Report Enhancements: Added session-level stats and aggregate summaries to the standard CLI table layout (#493).

3. Bug Fixes & Correctness

Distribution Types Fix: Fixed RandomDataGenerator and SyntheticDataGenerator to respect configured distribution types (e.g., fixed, uniform) instead of always defaulting to normal distribution (#572).
RNG Synchronization: Fixed RNG synchronization issues across multiple subprocess workers to guarantee workload determinism (#539).
Prefix Off-by-One: Standardized prompt prefix rendering at the token level instead of string slices to avoid off-by-one mismatches on different vocabularies (#591).

4. Developer Experience & Cleanups

Config Restructuring: Segmented config files for cleaner imports and validation (#505).
Package Reorganizations: Relocated and consolidated logging and distribution utility components under structured namespaces (#541, #544).

Docker Image

quay.io/inference-perf/inference-perf:v0.6.0

Python Package

pip install inference-perf==v0.6.0

What's Changed

Re-prime conversation_replay sessions after clear_instances() by @kaushikmitr in #466
Add Multimodal Benchmarking by @Bslabe123 in #450
feat: tool-call replay, HuggingFace trace loading, and session replay hardening by @alonh in #468
Increase context length on code-generation use case by @achandrasekar in #470
fix: per-group cache_key for shared_prefix multimodal bytes by @Bslabe123 in #485
Make per-stage progress visible in non-TTY logs by @Bslabe123 in #487
Log synthetic datagen materialization progress by @Bslabe123 in #496
Surface prompt cache token metrics in lifecycle summary by @MikeTomlin19 in #473
Payload specs and measurement rules for each (media, provenance) pair by @Bslabe123 in #477
Add session-level metrics to CLI summary tables by @alonh in #493
fix: _record_otel_metrics() correctly parse SSE streaming responses by @alonh in #495
fix: shared_prefix off-by-one by composing at token level by @Bslabe123 in #491
Fixed failing e2e test for nix pdm sync error by @tico88612 in #521
Analyze charts font size increase and paper update by @SachinVarghese in #513
Add K8s Slack invitation link by @tico88612 in #510
Split config file by @Bslabe123 in #505
Feat: Add filtering support for OTel trace replay across all data sources by @lenadankin in #507
regenerate system prompts per stage in conversation replay by @zetxqx in #480
feat: add reasoning output support for OTEL trace replay by @oritht in #499
Copy-editing of article text by @logological in #532
Copy-editing of bibliography by @logological in #533
Inject session identity header for session replay requests by @pavanipenumalla in #504
Add Support for VisionArena Dataset by @Bslabe123 in #525
Support multi-tenant headers and OTel mapping by @LukeAVanDrie in #523
fix: RNG state synchronization across multiworkers by @changminbark in #539
cleanup: move logger.py to /observability/logging by @Bslabe123 in #541
Address JOSS paper reviewer feedback: clarify extensions sentence, co… by @jjk-g in #534
fix: preserve partial response body when a streaming request fails by @Bslabe123 in #530
docs(otel-replay): update for accuracy and feature coverage by @alonh in #536
Add weka trace replay support by @achandrasekar in #550
fix: catch TimeoutError in predecessor wait to prevent session hang by @pavanipenumalla in #555
Add SLOs to workload catalog by @namasl in #554
Test/optional live tier by @Bslabe123 in #529
cleanup: Move utils/distribution.py to utils/numeric/distribution by @Bslabe123 in #544
Skip code coverage check on markdown-only changes by @jjk-g in #557
Addressed inflated streamed output_len by adding server-sourced output_tokens metric by @Bslabe123 in #565
Address ITL deflation from per-chunk BOS in streamed timing by @Bslabe123 in #566
Fix: Randomize prompt prefixes in SyntheticDataGenerator by @jjk-g in #570
fix(datagen): respect configured distribution types in Random/Synthetic generators by @jjk-g in #572

New Contributors

@MikeTomlin19 made their first contribution in #473
@tico88612 made their first contribution in #521
@oritht made their first contribution in #499
@logological made their first contribution in #532
@pavanipenumalla made their first contribution in #504

Full Changelog: v0.5.0...v0.6.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Release v0.6.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Release Summary: Version 0.6.0

1. Major Features

Multimodal Benchmarking & Vision Datasets

Advanced Session & Trace Replays (OTel & Weka)

2. Metrics, Correctness & Streaming Enhancements

3. Bug Fixes & Correctness

4. Developer Experience & Cleanups

Docker Image

Python Package

What's Changed

New Contributors

Contributors

Uh oh!