fix: track token count of first document in new batch in TokenCountBatchingStrategy#5528
Merged
sobychacko merged 1 commit intospring-projects:mainfrom Mar 6, 2026
Conversation
When a document causes currentSize to exceed maxInputTokenCount, the batch is split and currentSize is reset to 0. However, the triggering document is still added to the new batch, so its token count is silently dropped. This can cause subsequent batches to exceed the configured token limit. Reset currentSize to entry.getValue() instead of 0 so the first document in each new batch is properly accounted for. Closes spring-projects#5525 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: liweiguang <codingpunk@gmail.com>
1d1dfc3 to
7604c9f
Compare
spring-builds
pushed a commit
that referenced
this pull request
Mar 6, 2026
Fixes: #5525 When a document causes currentSize to exceed maxInputTokenCount, the batch is split and currentSize is reset to 0. However, the triggering document is still added to the new batch, so its token count is silently dropped. This can cause subsequent batches to exceed the configured token limit. Reset currentSize to entry.getValue() instead of 0 so the first document in each new batch is properly accounted for. Signed-off-by: liweiguang <codingpunk@gmail.com> (cherry picked from commit 3007f57)
Contributor
|
PR merged upstream to |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #5525
Summary
When a document's token count causes
currentSizeto exceedmaxInputTokenCountinTokenCountBatchingStrategy.batch(), the current batch is saved and a new one is created. However,currentSizeis reset to0while the triggering document is still added to the new batch — its token count is silently dropped from the running total.This means each new batch can accept one extra document's worth of tokens beyond the configured limit before the next split is triggered.
Fix
Change
currentSize = 0tocurrentSize = entry.getValue()so the first document in each new batch is properly counted.Test
Added
batchShouldTrackTokenCountAcrossBatchBoundariestest that creates multiple documents with a small token limit (10 tokens), verifying that batches are correctly split and no documents are lost.