Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track source for arrays of objects #108417

Merged
merged 30 commits into from
May 17, 2024

Conversation

kkrik-es
Copy link
Contributor

@kkrik-es kkrik-es commented May 8, 2024

This PR uses option store_array_source for objects to track the source of (sub)object arrays in synthetic mode. This allows preserving the original source, as synthetic mode merges, deduplicates and sorts array entries by default. The downside is that the whole object array (including children objects and fields) gets stored twice, while there's additional overhead while synthesizing source at query time.

This functionality will be experimental initially, to get some experience before documenting and opening up its use.

Fixes #90708

@elasticsearchmachine
Copy link
Collaborator

Hi @kkrik-es, I've created a changelog YAML for you.

XContentParserConfiguration configuration,
XContentBuilder builder
) throws IOException {
DocumentParserContext subcontext = context.switchParser(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new usage of the switchParser() method. However it is currently deprecated. I think if we like to use then we should un-deprecate this method. I think this new usage is a valid use case. cc: @javanna

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the concerns we had with switchParser, i think your assessment is right. Also, its javadocs says

we are actively deprecating and removing the ability to pass complex objects to multifields, so try and avoid using this method

Given that this was committed 4 years ago, the "actively" part no longer holds true. I reviewed the current usages of switchParser and they seem ok to me. I have a slight concern that we are now using the method in a general utility method, that may spread its usage instead of limiting it to the only places where it's strictly needed. I am not sure what to do about that though, and I am ok with the current plan. Thanks for raising this.

@kkrik-es kkrik-es marked this pull request as ready for review May 14, 2024 12:36
@kkrik-es kkrik-es requested a review from lkts May 14, 2024 13:51
Copy link
Contributor

@lkts lkts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add documentation for new parameter?

}
}
hasValue = false;
if (ignoredValues != null) {
for (IgnoredSourceFieldMapper.NameValue ignored : ignoredValues) {
b.field(ignored.getFieldName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think about this before - doesn't this break the alphabetic sorting of fields in the source? We say in documentation that it is sorted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. I'll work on a follow-up PR to restore ordering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I gave it a try here. While it's easy to order fields at the object level, we don't do so for the contents of the object array that get stored and retrieved verbatim. I added tests for this, not sure if there's an easy way to fix this..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's exactly what we need, we only specify ordering of fields.

@kkrik-es
Copy link
Contributor Author

Should we add documentation for new parameter?

I was thinking of keeping it experimental initially, until Observability provides feedback on its use. @felixbarny fyi.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few more questions and comments.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@kkrik-es kkrik-es merged commit 99e8c27 into elastic:main May 17, 2024
14 checks passed
@kkrik-es kkrik-es deleted the fix/synthetic-source/array branch May 17, 2024 11:55
parkertimmins pushed a commit to parkertimmins/elasticsearch that referenced this pull request May 17, 2024
* Track source for arrays of objects

* Update docs/changelog/108417.yaml

* add missing field name

* add missing field name

* refactor and add test

* merge changes from main

* revert unintended

* add tests

* small refactor

* small refactor

* check for synthetic source

* test setting sourceStored

* order fields

* small refactor

* small refactoring and more tests

* small test fix
@felixbarny
Copy link
Member

The store_array_source flag looks good to me, I've pinged the team to test it and provide feedback.

elasticsearchmachine pushed a commit that referenced this pull request May 27, 2024
Covers setting `dynamic` to `false` or `runtime`.

Related to #106825,
#108417
craigtaverner pushed a commit to craigtaverner/elasticsearch that referenced this pull request Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Synthetic _source: preserve source for specific fields (object arrays)
6 participants