Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug occuring when merging JSON object indexed with positions. #2253

Merged
merged 2 commits into from
Nov 14, 2023

Conversation

fulmicoton
Copy link
Collaborator

In JSON Object field the presence of term frequencies depend on the field.
Typically, a string with postiions indexed will have positions while numbers won't.

The presence or absence of term freqs for a given term is unfortunately encoded in a very passive way.

It is given by the presence of extra information in the skip info, or the lack of term freqs after decoding vint blocks.

Before, after writing a segment, we would encode the segment correctly (without any term freq for number in json object field). However during merge, we would get the default term freq=1 value. (this is default in the absence of encoded term freqs)

The merger would then proceed and attempt to decode 1 position when there are in fact none.

This PR requires to explictly tell the posting serialize whether term frequencies should be serialized for each new term.

Closes #2251

In JSON Object field the presence of term frequencies depend on the
field.
Typically, a string with postiions indexed will have positions
while numbers won't.

The presence or absence of term freqs for a given term is unfortunately
encoded in a very passive way.

It is given by the presence of extra information in the skip info, or
the lack of term freqs after decoding vint blocks.

Before, after writing a segment, we would encode the segment correctly
(without any term freq for number in json object field).
However during merge, we would get the default term freq=1 value.
(this is default in the absence of encoded term freqs)

The merger would then proceed and attempt to decode 1 position when
there are in fact none.

This PR requires to explictly tell the posting serialize whether
term frequencies should be serialized for each new term.

Closes #2251
@fulmicoton fulmicoton force-pushed the issue/2251-bug-merge-json-object-with-number branch from af8be21 to 6b59ec6 Compare November 14, 2023 13:46
Copy link
Contributor

@trinity-1686a trinity-1686a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@fmassot fmassot merged commit 5319977 into main Nov 14, 2023
4 checks passed
@fmassot fmassot deleted the issue/2251-bug-merge-json-object-with-number branch November 14, 2023 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Merge bug for objects with position enabled and including at least one number
3 participants