Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create aliases for PassThroughObjectMapper subfields in FieldTypeLookup #106829

Merged
merged 23 commits into from May 17, 2024

Conversation

kkrik-es
Copy link
Contributor

@kkrik-es kkrik-es commented Mar 27, 2024

Move the logic for creating aliases from RootObjectMapper to FieldTypeLookup. We no longer create FieldAliasMapper fields but just the field names under the pass-through path to the same MappedFieldType - just as it happens for FieldAliasMapper objects. As a bonus, since aliases are created on-the-fly, it's now possible to handle conflicts gracefully using a superseded_by param in the PassThroughObjectMapper spec.

FieldTypeLookup is used in fieldcaps API so the aliases are expected to show up there, but not in the index mappings.

Related to #103648, #103567

@kkrik-es kkrik-es added >non-issue :Search/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team Team:StorageEngine :StorageEngine/Mapping The storage related side of mappings labels Mar 27, 2024
@kkrik-es kkrik-es self-assigned this Mar 27, 2024
@kkrik-es kkrik-es marked this pull request as ready for review April 2, 2024 10:55
@kkrik-es kkrik-es requested a review from martijnvg April 2, 2024 10:55
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did an initial review round. I like the fact that this avoids updating the mapping with alias field mappers as part of using passthrough field mapper..

// PassThroughObjectMapper name. This is achieved by adding a second reference to their
// MappedFieldType using the remaining suffix.
Map<String, PassThroughObjectMapper> passThroughFieldAliases = new HashMap<>();
for (FieldMapper fieldMapper : fieldMappers) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can potentially be very expensive loop (depending on the number of fields and passthrough fields). Is there a less expensive way to do this?

This is achieved by adding a second reference to their MappedFieldType using the remaining suffix

Can you share a mapping that would look like this? I'm trying to understand this better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, switched to just check the passthrough subfields.

This is achieved by adding a second reference to their MappedFieldType using the remaining suffix

This is the alias that's added to the lookup, just like through the FieldAliasMappers before.

PassThroughObjectMapper(
String name,
String fullPath,
Explicit<Boolean> enabled,
Dynamic dynamic,
Map<String, Mapper> mappers,
Explicit<Boolean> timeSeriesDimensionSubFields
Explicit<Boolean> timeSeriesDimensionSubFields,
Set<String> supersededBy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe an alternative way to deal with conflicts could be having a priority mapping attribute instead? (range from: 0 to max integer)

If there are multiple passthrough fields then a priority attribute is required on all passthrough fields. If there're passthough fields with the same priority we fail with a validation error. If multiple passthrough fields match we pick the passthough field with the highest priority.

I think this could simplify to validation logic / logic in FieldTypeLookup?
Note that index templates have a similar notion of priority.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this is that a user may start with one passthrough field, then add another one, then notice that there are conflicts. I don't think we should require users to have a finalized list of pass-through objects from the get go, @felixbarny too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this is that a user may start with one passthrough field, then add another one, then notice that there are conflicts.

If the second passthrough field is added, the new passthrough field will need a priority attribute and the existing passthrough field as well. I don't think that this is an issue? Passthrough fields are defined on templates, and template validation will fail when the priorities are missing or misaligned.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without having thought very deeply about this, I was also thinking of a numeric priority mapping parameter. But need to read more to get a better understanding of what supersededBy is doing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For OTel mappings, the precedence (from highest to lowest) would look something like this:

  • Top-level/non passthrough fields
  • resource.attributes.*
  • scope.attributes.*
  • attributes.*

To me, it seems that could be achieved with a simple numeric priority mapping parameter, or, at least with a single value for supersededBy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to using priorities. An issue that I see is what happens with existing pass-through objects missing priorities.. I can check for the existence of aliases and bypass the check for priorities, but that's not great either.. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we haven't really used the passthrough field type outside of testing, so this breaking change is probably fine. Alternatively, we could not require passthrough fields to have unique priorities and fall back to lexicographical ordering if the priority is the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make priority required. This field type is undocumented and not ga-ed. This field type is only used for cross team testing at the moment.

if (subfield instanceof FieldMapper fieldMapper) {
String name = fieldMapper.simpleName();
// Check for conflict between PassThroughObjectMapper subfields.
PassThroughObjectMapper conflict = passThroughFieldAliases.put(name, passThroughMapper);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, you could also sort the passThroughMappers in ascending order and simply let mappers with a higher priority override i.e. put without checking for conflicts.

Your solution works just as well so feel free to ignore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If priority is going to be required, this does feel simpler to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to check for conflicts, to avoid cases with inconsistent behavior, depending on the order they show up in the FieldTypeLookup.

@@ -172,13 +184,14 @@ private MappingLookup(
}
}

PassThroughObjectMapper.checkForInvalidPriorities(passThroughMappers);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is called when creating or updating an index template as it doesn't affect a concrete index, yet. In any case, I think it would be prudent to add a test case for an index template with two passthrough field types that don't specify a priority to see if this gets caught.

If this indeed turns out not to be the appropriate place for the validation, an alternative may be to validate this in the root object mapper. Yet another approach could be to allow duplicate priorities and then fall back to lexicographical ordering if the priority is the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So templates are validated in MetaIndexTemplateServervice#validateTemplate() line 1812.
The merge method at some point creates a new DocumentMapper instance and then the MappingLookup gets created.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for double-checking!

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting close, let's make priority attribute required?


public static class Builder extends ObjectMapper.Builder {

// Controls whether subfields are configured as time-series dimensions.
protected Explicit<Boolean> timeSeriesDimensionSubFields = Explicit.IMPLICIT_FALSE;

// Controls which pass-through fields take precedence in case of conflicting aliases.
protected int priority = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this attribute required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to put another idea out there. We could define that the default is 0, basically this is already what it does now. When adding another passthrough field, you don't need to update the existing one if the new field has a higher priority. Not defining the priority on the second field would fail because of the validation that the priority needs to be unique.

But I'd also be fine with making the priority required, even if there's only one passthrough field in the mappings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's cleaner to just make it required.

if (subfield instanceof FieldMapper fieldMapper) {
String name = fieldMapper.simpleName();
// Check for conflict between PassThroughObjectMapper subfields.
PassThroughObjectMapper conflict = passThroughFieldAliases.put(name, passThroughMapper);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If priority is going to be required, this does feel simpler to me.

*
* In case different pass-through objects contain subfields with the same name (excluding the pass-through prefix), their aliases conflict.
* To resolve this, the pass-through spec specifies which object takes precedence through required parameter "priority"; non-negative
* integer values are accepted, with the highest priority value winning in case of conflicting aliases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Priority 1 is the highest priority, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No 0 is the lowest. I can change it if you feel strongly, do we have another case where we use priority in a field?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. No strong opinion here, mainly wanted to understand whether priority is ascending or descending. Maybe a slight preference for only allowing positive integers (excluding zero).

do we have another case where we use priority in a field?

I've found
https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/common/Priority.java. Interesting it does have a priority 0 (IMMEDIATE). But that seems to be a bit of a different use case.

I've also found we're using priority for index templates and the higher the number, the higher the priority, so priority 1 would be a very low priority.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think Martijn suggested to use the same approach as the last example.

@kkrik-es kkrik-es requested a review from javanna April 8, 2024 07:49
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left two comments. LGTM otherwise.

import java.util.Locale;
import java.util.Map;

import static org.elasticsearch.common.xcontent.support.XContentMapValues.nodeBooleanValue;
import static org.elasticsearch.common.xcontent.support.XContentMapValues.nodeIntegerValue;

/**
* Mapper for pass-through objects.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a disclaimer that this is a field type that is experimental, undocumented and shouldn't be used in production and that it exists today for prototyping purposes only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

PassThroughObjectMapper conflict = passThroughFieldAliases.put(name, passThroughMapper);
if (conflict != null) {
// Check the priorities of the conflicting objects.
if (conflict.priority() > passThroughMapper.priority()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's reverse the priority? A field from a passthrough field with a higher priority wins.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the case? If the inserted object has higher priority we leave it there, otherwise the conflicting one is put back. In both cases, fullNameToFieldType will contain the alias for the object with the highest priority?

@kkrik-es
Copy link
Contributor Author

kkrik-es commented Apr 9, 2024

@elasticsearchmachine run elasticsearch-ci/packaging-tests-unix-sample

@kkrik-es
Copy link
Contributor Author

kkrik-es commented Apr 9, 2024

@elasticsearchmachine run elasticsearch-ci/bwc-snapshots

# Conflicts:
#	modules/data-streams/src/yamlRestTest/resources/rest-api-spec/test/data_stream/150_tsdb.yml
#	server/src/main/java/org/elasticsearch/index/mapper/MappingLookup.java
@kkrik-es
Copy link
Contributor Author

Gentle ping for @javanna, ptal.

# Conflicts:
#	server/src/main/java/org/elasticsearch/index/mapper/MapperFeatures.java
# Conflicts:
#	server/src/test/java/org/elasticsearch/index/mapper/RootObjectMapperTests.java
@kkrik-es
Copy link
Contributor Author

@javanna I'm submitting this to get some mileage with testing. Please let me know if you have comments, I'm happy to address them in a follow-up.

@kkrik-es kkrik-es merged commit 25e3551 into elastic:main May 17, 2024
15 checks passed
@kkrik-es kkrik-es deleted the fix/passthrough-no-alias branch May 17, 2024 10:23
@javanna
Copy link
Member

javanna commented May 17, 2024

I just reviewed this and I am happy with the changes made, I think it's a net improvement. Thanks for taking the time!

@kkrik-es
Copy link
Contributor Author

Great to hear Luca, thanks for the guidance here!

parkertimmins pushed a commit to parkertimmins/elasticsearch that referenced this pull request May 17, 2024
…up (elastic#106829)

* Alias PassThroughObjectMapper subfields in FieldTypeLookup

* small refactor

* conflict resolution

* add check for circular deps

* use lexicographical ordering for conflicting aliases

* use lexicographical ordering for conflicting aliases

* simplify loop producing aliases

* use priorities for passthrough deduplication

* make priority param required

* add passthrough disclaimer

* small refactor

* small refactor

* revert refactor

* node feature for pass-through priotity
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :Search/Mapping Index mappings, including merging and defining field types :StorageEngine/Mapping The storage related side of mappings Team:Search Meta label for search team Team:StorageEngine v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants