Skip to content

Release v0.6.0

Latest

Choose a tag to compare

@jjk-g jjk-g released this 25 Jun 02:13
· 2 commits to main since this release
e28d9a0

Release Summary: Version 0.6.0

This release introduces significant capabilities for multimodal benchmarking, OTel/Weka trace & session replays, and major improvements to streaming measurement correctness and robustness.


1. Major Features

Multimodal Benchmarking & Vision Datasets

  • Multimodal Payloads: Added native support for generating and benchmark-testing image, video, and audio inputs (#450, #477).
  • VisionArena Dataset: Integrated the VisionArena dataset/generator format out of the box to validate multimodal visual reasoning workflows (#525).
  • Prefix Caching for Media: Fixed cache key hashing for media inputs in shared-prefix workloads (#485).

Advanced Session & Trace Replays (OTel & Weka)

  • Trace Replay Engines: Added support for replaying OpenTelemetry (OTel) and Weka JSON transaction traces (#468, #550).
  • Agentic session features: Added tool-call simulation, session-to-worker affinity, and session context recovery protocols to replicate real-world multi-turn conversational agents.
  • Filtering & Personalization: Enabled lambda-based query filters for trace records (#507) and injected custom headers for multi-tenant and session routing (#504, #523).

2. Metrics, Correctness & Streaming Enhancements

  • Server-Sourced Token Counts: Added server-sourced usage statistics (e.g. prompt cached tokens, completion tokens) to eliminate discrepancies from client-side re-tokenization (#473, #565).
  • SSE Stream Correctness: Addressed inter-token latency (ITL) deflation caused by leading BOS tokens (#566) and resolved SSE stream parsing issues for failed request bodies (#495, #530).
  • CLI Report Enhancements: Added session-level stats and aggregate summaries to the standard CLI table layout (#493).

3. Bug Fixes & Correctness

  • Distribution Types Fix: Fixed RandomDataGenerator and SyntheticDataGenerator to respect configured distribution types (e.g., fixed, uniform) instead of always defaulting to normal distribution (#572).
  • RNG Synchronization: Fixed RNG synchronization issues across multiple subprocess workers to guarantee workload determinism (#539).
  • Prefix Off-by-One: Standardized prompt prefix rendering at the token level instead of string slices to avoid off-by-one mismatches on different vocabularies (#591).

4. Developer Experience & Cleanups

  • Config Restructuring: Segmented config files for cleaner imports and validation (#505).
  • Package Reorganizations: Relocated and consolidated logging and distribution utility components under structured namespaces (#541, #544).

Docker Image

quay.io/inference-perf/inference-perf:v0.6.0

Python Package

pip install inference-perf==v0.6.0

What's Changed

  • Re-prime conversation_replay sessions after clear_instances() by @kaushikmitr in #466
  • Add Multimodal Benchmarking by @Bslabe123 in #450
  • feat: tool-call replay, HuggingFace trace loading, and session replay hardening by @alonh in #468
  • Increase context length on code-generation use case by @achandrasekar in #470
  • fix: per-group cache_key for shared_prefix multimodal bytes by @Bslabe123 in #485
  • Make per-stage progress visible in non-TTY logs by @Bslabe123 in #487
  • Log synthetic datagen materialization progress by @Bslabe123 in #496
  • Surface prompt cache token metrics in lifecycle summary by @MikeTomlin19 in #473
  • Payload specs and measurement rules for each (media, provenance) pair by @Bslabe123 in #477
  • Add session-level metrics to CLI summary tables by @alonh in #493
  • fix: _record_otel_metrics() correctly parse SSE streaming responses by @alonh in #495
  • fix: shared_prefix off-by-one by composing at token level by @Bslabe123 in #491
  • Fixed failing e2e test for nix pdm sync error by @tico88612 in #521
  • Analyze charts font size increase and paper update by @SachinVarghese in #513
  • Add K8s Slack invitation link by @tico88612 in #510
  • Split config file by @Bslabe123 in #505
  • Feat: Add filtering support for OTel trace replay across all data sources by @lenadankin in #507
  • regenerate system prompts per stage in conversation replay by @zetxqx in #480
  • feat: add reasoning output support for OTEL trace replay by @oritht in #499
  • Copy-editing of article text by @logological in #532
  • Copy-editing of bibliography by @logological in #533
  • Inject session identity header for session replay requests by @pavanipenumalla in #504
  • Add Support for VisionArena Dataset by @Bslabe123 in #525
  • Support multi-tenant headers and OTel mapping by @LukeAVanDrie in #523
  • fix: RNG state synchronization across multiworkers by @changminbark in #539
  • cleanup: move logger.py to /observability/logging by @Bslabe123 in #541
  • Address JOSS paper reviewer feedback: clarify extensions sentence, co… by @jjk-g in #534
  • fix: preserve partial response body when a streaming request fails by @Bslabe123 in #530
  • docs(otel-replay): update for accuracy and feature coverage by @alonh in #536
  • Add weka trace replay support by @achandrasekar in #550
  • fix: catch TimeoutError in predecessor wait to prevent session hang by @pavanipenumalla in #555
  • Add SLOs to workload catalog by @namasl in #554
  • Test/optional live tier by @Bslabe123 in #529
  • cleanup: Move utils/distribution.py to utils/numeric/distribution by @Bslabe123 in #544
  • Skip code coverage check on markdown-only changes by @jjk-g in #557
  • Addressed inflated streamed output_len by adding server-sourced output_tokens metric by @Bslabe123 in #565
  • Address ITL deflation from per-chunk BOS in streamed timing by @Bslabe123 in #566
  • Fix: Randomize prompt prefixes in SyntheticDataGenerator by @jjk-g in #570
  • fix(datagen): respect configured distribution types in Random/Synthetic generators by @jjk-g in #572

New Contributors

Full Changelog: v0.5.0...v0.6.0