v2.7.3

Eigenwise released this 17 Mar 17:05

· 80 commits to main since this release

268bca0

What's Changed

Bug Fix

fix: use max_input_tokens for context window size (#216)
- TokenCounter.get_max_tokens() was using litellm.get_max_tokens() which returns the output token limit (max_tokens), not the input context window (max_input_tokens)
- Switched to litellm.get_model_info() and reading max_input_tokens, with fallback to max_tokens if unavailable
- Uses explicit None check instead of or operator to correctly handle falsy values like 0
- This fixes utilization being overstated by up to 8x for models where output limits are much smaller than context windows

Closes #215

Full Changelog: v2.7.1...v2.7.3

Assets 2