Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tsdb codec when doc-values spread in two blocks #108276

Merged
merged 5 commits into from
May 4, 2024

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented May 4, 2024

Currently, loading ordinals multiple times (after advanceExact) for documents with values spread across multiple blocks in the TSDB codec will fail due to the absence of re-seeking for the ordinals block. Doc-values of a document can spread across multiple blocks in two cases: when it has more than 128 values or when it exceeds the remaining space in the current block.

@elasticsearchmachine
Copy link
Collaborator

Hi @dnhatn, I've created a changelog YAML for you.

@dnhatn dnhatn added the v8.13.4 label May 4, 2024
@dnhatn dnhatn marked this pull request as ready for review May 4, 2024 05:27
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@dnhatn dnhatn added the auto-backport-and-merge Automatically create backport pull requests and merge when ready label May 4, 2024
Comment on lines -1109 to -1110
assert blockIndex > currentBlockIndex;
if (blockIndex - 1 > currentBlockIndex) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this exact pattern in NumericDocValues#longValue. Do we have off-by-one errors there as well?

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Bent. These are single value paths, so they shouldn't be affected by this bug. Anyway, I've extended the tests and made the seeking patterns more consistent in this class.

@dnhatn dnhatn requested a review from benwtrent May 4, 2024 15:43
@dnhatn
Copy link
Member Author

dnhatn commented May 4, 2024

@benwtrent Thanks for review.

@dnhatn dnhatn merged commit 16884a3 into elastic:main May 4, 2024
15 checks passed
@dnhatn dnhatn deleted the tsdb-codec branch May 4, 2024 20:54
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request May 4, 2024
Currently, loading ordinals multiple times (after advanceExact) for 
documents with values spread across multiple blocks in the TSDB codec
will fail due to the absence of re-seeking for the ordinals block.

Doc-values of a document can spread across multiple blocks in two cases:
when it has more than 128 values or when it exceeds the remaining space
in the current block.
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request May 4, 2024
Currently, loading ordinals multiple times (after advanceExact) for 
documents with values spread across multiple blocks in the TSDB codec
will fail due to the absence of re-seeking for the ordinals block.

Doc-values of a document can spread across multiple blocks in two cases:
when it has more than 128 values or when it exceeds the remaining space
in the current block.
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.14
8.13

elasticsearchmachine pushed a commit that referenced this pull request May 4, 2024
Currently, loading ordinals multiple times (after advanceExact) for 
documents with values spread across multiple blocks in the TSDB codec
will fail due to the absence of re-seeking for the ordinals block.

Doc-values of a document can spread across multiple blocks in two cases:
when it has more than 128 values or when it exceeds the remaining space
in the current block.
elasticsearchmachine pushed a commit that referenced this pull request May 4, 2024
Currently, loading ordinals multiple times (after advanceExact) for 
documents with values spread across multiple blocks in the TSDB codec
will fail due to the absence of re-seeking for the ordinals block.

Doc-values of a document can spread across multiple blocks in two cases:
when it has more than 128 values or when it exceeds the remaining space
in the current block.
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM2

@jpountz
Copy link
Contributor

jpountz commented May 6, 2024

Thanks for fixing! I tried to understand why BaseDocValuesFormatTestCase#testSortedSetVariableLengthVsStoredFields did not catch this bug, and this seems to be because it only uses nextDoc(), not advance()/advanceExact()? Could you improve BaseDocValuesFormatTestCase#doTestSortedSetVsStoredFields in Lucene to add more coverage for this?

@dnhatn
Copy link
Member Author

dnhatn commented May 6, 2024

@jpountz +1, I will open a PR in the Lucene repository.

@dnhatn
Copy link
Member Author

dnhatn commented May 6, 2024

I've opened apache/lucene#13346

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport-and-merge Automatically create backport pull requests and merge when ready >bug :StorageEngine/Metrics You know, for Metrics Team:StorageEngine v8.13.4 v8.14.1 v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants