Skip to content

Conversation

@Kubik42
Copy link
Contributor

@Kubik42 Kubik42 commented Oct 31, 2025

@Kubik42 Kubik42 force-pushed the store-keyword-fields-in-binary-doc-values-ignore-above branch 3 times, most recently from 2a64367 to 8bf42f4 Compare November 5, 2025 01:00
@Kubik42 Kubik42 force-pushed the store-keyword-fields-in-binary-doc-values-ignore-above branch 2 times, most recently from 980e6c9 to ffc2498 Compare November 5, 2025 05:01
@Kubik42 Kubik42 marked this pull request as ready for review November 5, 2025 06:40
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@Kubik42
Copy link
Contributor Author

Kubik42 commented Nov 5, 2025

The test is failing bc its expecting duplicate values, but with this change we're now deduplicating multi-valued fields in the same document. This is done for better storage efficiency.

We can either change the test or keep the duplicates.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good left a few comments.

@Kubik42 Kubik42 requested review from a team as code owners November 10, 2025 17:00
@Kubik42 Kubik42 force-pushed the store-keyword-fields-in-binary-doc-values-ignore-above branch from b074b90 to 3d83988 Compare November 10, 2025 17:01
@Kubik42 Kubik42 removed request for a team November 10, 2025 17:01
@Kubik42 Kubik42 force-pushed the store-keyword-fields-in-binary-doc-values-ignore-above branch from 3d83988 to a1d1d2a Compare November 10, 2025 17:02
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's wait for compression of binary doc values to have been merged and its feature flag removed, otherwise LGTM

@parkertimmins parkertimmins merged commit 6783135 into elastic:main Nov 25, 2025
34 checks passed
ncordon pushed a commit to ncordon/elasticsearch that referenced this pull request Nov 26, 2025
…stic#137483)

* Store keyword fields that trip ignore_above in binary doc values

* Addressed feedback

* Moved ignore values doc value field fetcher inside of existing fetcher function
b.utf8Value(ref.bytes, ref.offset, ref.length);
}
});
layers.add(new BinaryDocValuesSyntheticFieldLoaderLayer(fieldName));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this break BWC? Because old indices will have their ignored values stored in a stored field, and now the mapper can no longer load those values?

Comment on lines -1156 to +1166
context.doc().add(new StoredField(fieldName, bytesRef));

// store the value in a binary doc values field, create one if it doesn't exist
MultiValuedBinaryDocValuesField field = (MultiValuedBinaryDocValuesField) context.doc().getByKey(fieldName);
if (field == null) {
field = new MultiValuedBinaryDocValuesField(fieldName);
context.doc().addWithKey(fieldName, field);
}
field.add(bytesRef);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another BWC issue - old indices will have conflicting fieldinfos for <name>._original since we're switching from using a stored field to binary doc values.

martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Dec 2, 2025
martijnvg added a commit that referenced this pull request Dec 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants