v2.7.3
What's Changed
Bug Fix
- fix: use
max_input_tokensfor context window size (#216)TokenCounter.get_max_tokens()was usinglitellm.get_max_tokens()which returns the output token limit (max_tokens), not the input context window (max_input_tokens)- Switched to
litellm.get_model_info()and readingmax_input_tokens, with fallback tomax_tokensif unavailable - Uses explicit
Nonecheck instead oforoperator to correctly handle falsy values like0 - This fixes utilization being overstated by up to 8x for models where output limits are much smaller than context windows
Closes #215
Full Changelog: v2.7.1...v2.7.3