Skip to content

v1.3.0

Choose a tag to compare

@jztan jztan released this 19 Mar 13:31
· 296 commits to develop since this release

What's New in v1.3.0

Added

  • Smart context management: ObservationMaskingManager replaces SummarizingConversationManager as the default conversation manager. Based on Lindenbauer et al. 2025 "The Complexity Trap" — observation masking halves per-run costs with no quality loss
  • Three configurable strategies via blueclaw.yaml: mask (default, replaces old tool outputs with placeholders), summarize (legacy LLM summarization), hybrid (mask first, summarize only after N turns)
  • context section in blueclaw.yaml: strategy, mask_after (default 10), summarize_after (hybrid only, default 43)
  • Context metrics in traces: context_masked_chars and context_strategy fields on RunTrace
  • trace show displays context strategy and masked char count when present
  • trace stats shows Context Management section: runs with masking, avg/total chars masked, strategy breakdown
  • BeforeModelCallEvent hook for proactive masking within multi-tool invocations (same pattern as Strands' SlidingWindowConversationManager)
  • reduce_context fallback chain: aggressive mask (M=0) then delegate to SummarizingConversationManager
  • 35 new tests (400 total)
  • scripts/bench_context.py — multi-turn benchmark runner for comparing context strategies. Delta token tracking, isolated workspaces per strategy, error recovery, response capture. Supports --strategy, --model, --mask-after, --output flags
  • Benchmark prompt files for 3 workload categories: search-small (small outputs), retrieval-large (full page fetches), mixed-workflow (search + fetch)

Changed

  • Default conversation manager switched from SummarizingConversationManager to ObservationMaskingManager — existing behavior available via context.strategy: summarize

Installation

pip install blueclaw==1.3.0

Links