Skip to content

v2.7.3

Choose a tag to compare

@Eigenwise Eigenwise released this 17 Mar 17:05
· 80 commits to main since this release

What's Changed

Bug Fix

  • fix: use max_input_tokens for context window size (#216)
    • TokenCounter.get_max_tokens() was using litellm.get_max_tokens() which returns the output token limit (max_tokens), not the input context window (max_input_tokens)
    • Switched to litellm.get_model_info() and reading max_input_tokens, with fallback to max_tokens if unavailable
    • Uses explicit None check instead of or operator to correctly handle falsy values like 0
    • This fixes utilization being overstated by up to 8x for models where output limits are much smaller than context windows

Closes #215

Full Changelog: v2.7.1...v2.7.3