Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add generic fallback implementation for synthetic source #108222

Merged
merged 16 commits into from
May 21, 2024

Conversation

lkts
Copy link
Contributor

@lkts lkts commented May 2, 2024

This PR uses infrastructure from #107567 to implement a fallback implementation of synthetic source for field mappers that don't support it natively. In that case we will store source of such field as is in a separate stored field.

Pending work:

  • Tests for more fields that now support synthetic source
  • There are differences between native synthetic source that is usually generated using doc values and fallback. The fallback is "too precise", f.e. if field value is an empty array, there is no way to know that when using doc values. When source is stored as is, it will be precisely an empty array. That leads to assumptions done in synthetic source tests not matching reality. We may want to create a different set of tests or try to replicate the behavior.
  • Unification of copy_to handling. copy_to is currently not allowed on fields when synthetic source is used, should this be applied uniformly?
  • Documentation

Note that this PR does not cover nested as it is not a FieldMapper however it can be done using similar approach separately.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Collaborator

Hi @lkts, I've created a changelog YAML for you.

*/
@Override
public SourceLoader.SyntheticFieldLoader syntheticFieldLoader() {
return SourceLoader.SyntheticFieldLoader.NOTHING;
Copy link
Contributor Author

@lkts lkts May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nothing because ObjectMapper already nicely adds all ignored source fields to the source.

@@ -219,7 +229,138 @@ protected Object generateRandomInputValue(MappedFieldType ft) {

@Override
protected SyntheticSourceSupport syntheticSourceSupport(boolean ignoreMalformed) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started adding tests here because it looks like the most complex type. I am not sure how should we proceed - if this PR contains tests for all fields that now have support, it will be quite big.


@Override
protected BlockReaderSupport getSupportedReaders(MapperService mapper, String loaderFieldName) {
// TODO are we okay that we don't support synthetic source here?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of block loader stuff is still quite weak, i am not sure what's expected here.

protected boolean supportsCopyTo() {
// TODO this is so that `testSyntheticSourceInvalid` does not complain
// in theory geo_shape could support copy_to
// should we fail to construct synthetic source in `FieldMapper` if there is copy_to ?
Copy link
Contributor Author

@lkts lkts May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question here.

@lkts
Copy link
Contributor Author

lkts commented May 3, 2024

Here is another interesting observation. Currently geo_shape mapper (all of them) defines a block loader that only works with _source. It does not look like that infrastructure supports synthetic source to me. So i think that means that while we can "seamlessly" support synthetic source for geo_shape we will end up breaking ESQL.

@lkts
Copy link
Contributor Author

lkts commented May 3, 2024

I am thinking we should opt-in into this one field at a time by setting supportsSyntheticSourceNatively (maybe should rename this too).

@lkts lkts requested a review from a team as a code owner May 3, 2024 22:00
@lkts lkts force-pushed the feature/synthetic_source_catch_all branch from cb1626e to d14b959 Compare May 3, 2024 22:02
@lkts lkts requested review from martijnvg and kkrik-es and removed request for a team May 3, 2024 22:02
* If a mapper opts in to use fallback synthetic source implementation.
* @return true or false
*/
protected boolean fallbackSyntheticSourceOptIn() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If implementations overwrite this method, why would these implementations implement synthetic source natively? It is the same as supportsSyntheticSourceNatively() returning false?

Maybe we can just have supportsSyntheticSourceNatively()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a temporary flag just to be able to introduce support on a per-field basis instead of one big PR. We will remove this when we enable synthetic source support for the last field.

@lkts
Copy link
Contributor Author

lkts commented May 13, 2024

@elasticmachine update branch

@lkts lkts requested review from martijnvg and kkrik-es May 17, 2024 20:17
index: test
id: "2"

- match: { _source.a_field: null }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an array test too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! I for some reason assumed that this field does not support arrays.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll actually do that in a small follow-up.

@@ -54,6 +54,11 @@ public class IgnoredSourceFieldMapper extends MetadataFieldMapper {
* - the value, encoded as a byte array
*/
public record NameValue(String name, int parentOffset, BytesRef value) {
public static NameValue fromContext(DocumentParserContext context, String name, BytesRef value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only applies to fields, not objects.. Maybe rename to fromContextField or something like this, and add a short comment describing what it does.

Copy link
Contributor Author

@lkts lkts May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure i understand why would it not work for objects. I'll rename in a follow-up.

@kkrik-es
Copy link
Contributor

Looks good. I'm not sure if array fields are covered too, maybe add a few tests for it. It's fine to deal with them in a follow-up if needed, though.

@lkts
Copy link
Contributor Author

lkts commented May 21, 2024

@elasticmachine update branch

@lkts lkts merged commit 91d502c into elastic:main May 21, 2024
15 checks passed
@lkts lkts deleted the feature/synthetic_source_catch_all branch May 21, 2024 18:30
lkts added a commit that referenced this pull request May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants