Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce experimental pass-through field type #103648

Merged
merged 30 commits into from Feb 1, 2024
Merged

Conversation

kkrik-es
Copy link
Contributor

@kkrik-es kkrik-es commented Dec 21, 2023

PassThoughObjectMapper extends ObjectMapper to create a container for fields that also need to be referenced as if they were at the root level. This is done by creating aliases for all its subfields.

It also supports an option of annotating all its subfields as dimensions. This will be leveraged in TSDB, where dimension fields can be dynamically defined as nested under a passthrough object - and still referenced directly (i.e. without prefixes) in aggregation queries. More so, the pass-through object is added as a path match to the routing path, so that dynamically-defined subfields get including in routing and _tsid calculations.

The initial implementation is experimental, missing the following:

  • Subfield dimensions still need to be keyword fields, so as to be used in the routing path.
  • Pass-through objects can only be defined at the root level. They can't be nested in other objects.
  • Pass-through objects don't support subobjects, only subfields.
  • When there are conflicting aliases at the root level (same name, different paths), the first one wins. Subsequent conflicting fields will lack an alias, but can still be referenced using their full path.
  • Field caps include both pass-through subfields and their aliases.
  • Pass-through subfields are double-counted against the field count limit, as their aliases are also included.
  • There's no documentation for the new type. We'll defer promoting its use until we get some mileage with using it internally.

Related to #103567

`PassthoughObjectMapper` extends `ObjectMapper` to create a container
for fields that also need to be referenced as if they were at the root
level. This is done by creating aliases for all its subfields.

It also supports an option of annotating all its subfields as
dimensions. This will be leveraged in TSDB, where dimension fields can
be dynamically defined as nested under a passthrough object - and still
referenced directly (i.e. without prefixes) in aggregation queries.

Related to elastic#103567
@kkrik-es kkrik-es added >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :StorageEngine/TSDB You know, for Metrics labels Dec 21, 2023
@kkrik-es kkrik-es self-assigned this Dec 21, 2023
@elasticsearchmachine
Copy link
Collaborator

Hi @kkrik-es, I've created a changelog YAML for you.

@felixbarny
Copy link
Member

felixbarny commented Jan 4, 2024

This looks great already 👏

Some ideas for potential improvements:

  • Add ability to specify precedence if there are multiple passthrough field types with the same attribute name, instead of failing because an alias has already been specified.
  • As an extension of the above, allow a field in a passthrough object to shadow a top-level field and a field from another passthrough object. Edit: I don't think this is a good idea anymore.
  • Think about whether the passthrough field type should allow nested object mappers or if it should implicitly set subobjects: false.
  • Allow non-keyword fields in a passthrough object to be a dimension.
  • In field caps, only return the top-level field names to avoid having two field suggestions for each attribute within a passthrough object.
  • Avoid attributes in passthrough field types to count as two fields towards the field limit.

# Conflicts:
#	server/src/main/java/org/elasticsearch/index/mapper/DynamicFieldsBuilder.java
@kkrik-es
Copy link
Contributor Author

@elasticsearchmachine run elasticsearch-ci/docs

@kkrik-es kkrik-es changed the title Introduce passthrough field type Introduce pass-through field type Jan 11, 2024
@kkrik-es kkrik-es added Team:StorageEngine and removed Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Jan 11, 2024
@kkrik-es kkrik-es marked this pull request as ready for review January 11, 2024 14:42
@kkrik-es kkrik-es changed the title Introduce pass-through field type Introduce experimental pass-through field type Jan 17, 2024
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few more comments.

/**
* Are these field mappings being built dimensions?
*/
public boolean contaisDimensions() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename to isParentFieldDimension()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm this won't be accurate.. Parent field would be a pass-through object that is not technically a dimension but can be marked to contain dimension fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to parentFieldContainsDimensions, wdyt?

if (internalMapper instanceof FieldMapper fieldMapper) {
// If there's a conflicting alias with the same name at the root level, we don't want to throw an error
// to avoid indexing disruption.
// TODO: record an error without affecting document indexing, so that it can be investigated later.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's log a warning for now?

Copy link
Contributor Author

@kkrik-es kkrik-es Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a HeaderWarning, is this what you had in mind?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like warning header as well. But I was actually thinking of logging a warning, the warning header isn't around when we do debugging. For example when we need to investigate why an alias points to the wrong field, this log warning could help us with debugging. Maybe also log a warning here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, ptal.

# Conflicts:
#	server/src/test/java/org/elasticsearch/index/mapper/RootObjectMapperTests.java
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I think we're almost there.

if (internalMapper instanceof FieldMapper fieldMapper) {
// If there's a conflicting alias with the same name at the root level, we don't want to throw an error
// to avoid indexing disruption.
// TODO: record an error without affecting document indexing, so that it can be investigated later.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like warning header as well. But I was actually thinking of logging a warning, the warning header isn't around when we do debugging. For example when we need to investigate why an alias points to the wrong field, this log warning could help us with debugging. Maybe also log a warning here?

* Are these field mappings being built dimensions?
*/
public boolean parentFieldContainsDimensions() {
return parentFieldContainsDimensions;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for another naming comment :)
What about parentFieldEnablesDimension? If time_series_dimension is configured on a passthrough about then this automatically makes any number, keyword, ip field a dimension too. So a parent passthrough field essentially enabled the whether a field is a time series dimension.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to parentObjectContainsDimension, matching the containsDimension param of PassThroughObject. parentFieldEnablesDimension is not accurate imho, this doesn't just enable but enforce defining subfields as dimensions.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@felixbarny
Copy link
Member

I think the only improvement that we need before we can use this would be support at arbitrary levels, not only the root.

@kkrik-es kkrik-es merged commit 149ec37 into elastic:main Feb 1, 2024
14 checks passed
@kkrik-es kkrik-es deleted the fix/103567 branch February 1, 2024 13:33
jedrazb pushed a commit to jedrazb/elasticsearch that referenced this pull request Feb 2, 2024
* Introduce passthrough field type

`PassthoughObjectMapper` extends `ObjectMapper` to create a container
for fields that also need to be referenced as if they were at the root
level. This is done by creating aliases for all its subfields.

It also supports an option of annotating all its subfields as
dimensions. This will be leveraged in TSDB, where dimension fields can
be dynamically defined as nested under a passthrough object - and still
referenced directly (i.e. without prefixes) in aggregation queries.

Related to elastic#103567

* Update docs/changelog/103648.yaml

* no subobjects

* create dimensions dynamically

* remove unused method

* restore ignoreAbove incompatibility with dimension

* fix test

* refactor, skip aliases on conflict

* fix branch

* fix branch

* add tests

* update test

* remove unused variable

* add yaml test for subobject

* minor refactoring

* add unittest for PassThroughObjectMapper

* suggested fixes

* suggested fixes

* update yaml with warning for duplicate alias

* updates from review

* add withoutMappers()
Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed this post-merge and left some comments.

new TextFieldMapper.Builder(name, context.indexAnalyzers()).addMultiField(
new KeywordFieldMapper.Builder("keyword", context.indexSettings().getIndexVersionCreated()).ignoreAbove(256)
),
context
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how relevant is this change to the passthrough object mapper? This affects how fields are dynamically mapped and looks to me like a separate change. Also, I am not sure what the impact of it is for existing users. Namely, should the new behaviour only be activated for indices created from the version that the change was introduced in? Otherwise fields are mapped differently for the same index before and after the upgrade, which seems unexpected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR indeed both introduces the passthrough mapper and includes changes to support its functionality for defining dimension fields underneath. It could definitely be split, but it did help with testing - and ensuring that the implementation addressed the requirements around dynamic dimension support.

Since parentObjectContainsDimensions() is introduced in this PR and the else branch is the old logic, I'd think this is bwc?

+ objBuilder.name()
+ "], passthrough is not supported as a subobject"
);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this because the passthrough type can only be used under the root object? Does it mean it should rather become part of the RootObjectMapper explicitly then? This check within ObjectMapper is not great, because the base class needs to know of its own subclass impl? I think we should find a different way to perform this check. I also read that we want to support it in an arbitrary level in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed! I think these comments were mostly addressed in #105062. Ptal, I'm happy to iterate on feedback there too.

Mapper.TypeParser typeParser = parserContext.typeParser(type);
if (typeParser == null) {
throw new MapperParsingException("No handler for type [" + type + "] declared on field [" + fieldName + "]");
}
Mapper.Builder fieldBuilder;
if (objBuilder.subobjects.value() == false) {
if (objBuilder.subobjects.value() == false || type.equals(FieldAliasMapper.CONTENT_TYPE)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this change required? Is this about "ObjectMapper was modified to use the full name for FieldAliasMappers"? Is it a bug around field aliases, or an enhancement around them? How does it affect existing users? I think that it's a change that should probably have been reviewed separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just my bad, I misunderstood how aliases are supposed to work with dotted names (they use subobjects). I iterated on this in #105298 and #105062, ptal.

@@ -106,19 +110,54 @@ public RootObjectMapper.Builder addRuntimeFields(Map<String, RuntimeField> runti

@Override
public RootObjectMapper build(MapperBuilderContext context) {
Map<String, Mapper> mappers = buildMappers(context);
mappers.putAll(getAliasMappers(mappers, context));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this way won't the field aliases become visible in the resulting mappings? Was that the intention?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want the aliases to be visible, but maybe I misunderstood the question. What would be the alternative here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that the intention was to have the passthrough mapper, and field aliases hidden somehow in the resulting mappings. Maybe I misunderstood.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe that the alias field mappers would be visible in the mappings. That's because the get mapping request doesn't return a serialized version of the RootObjectMapper. Instead, we're returning the source as a blob, as it has been provided in the put mapping request.

}
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this functionality is misplaced. It should not be RootObjectMapper becoming aware of a specific object mapper subclass and special case on it. Ideally, this could be all embedded into the custom field mapper we are adding. I can see some challenges that don't make this directly possible with the existing infra: an object mapper cannot emit mapped field types, and a field mapper can only emit a single mapped field type. We do have more flexibility in the runtime section though. Yet I don't think we have a precedent for a mapper being able to emit fields that are outside of its own path (it's something we have discussed as part of the composite runtime field impl but have decided not to implement - meaning it can only produce fields that are part of its path).

If I am not mistaken, all we'd like is to expose additional runtime behaviour. Namely what the field alias mapper does is take the MappedFieldType registered at the target, and register it also with the path of the alias. Ideally, the passthrough mapper would be able to do just that, without special casing outside of it. If it could emit mapped field types, similar to how CompositeRuntimeField does through its asMappedFieldTypes method.

This may address two additional concerns I have with the current implementation:

  • it relies on field aliases, which have some limitations, and I am thinking that if we could generate runtime fields or mapped field types instead of full-blown field alias mapper, that would give us more flexibility
  • it subclasses ObjectMapper, which is maybe still necessary because of the need to accept sub-fields, but I'd like to consider moving away from it. We have been discussing for quite a while to not have subclasses of ObjectMapper, and trying to remove special casing all over the codebase. This PR introduces more special casing and reliance on ObjectMapper which we should reconsider.

This is a bit of a braindump, and not a complete solution yet, but something that we can take as a starting point to improve the current design.

Another idea, possibly less of a redesign, would be to move the instanceof check to MappingLookup/FieldTypeLookup where we already check for aliases and we special case them. That's where we could do what field aliases do, without explicitly adding field alias mappers to mappings (which I cannot tell if it's needed or more of a side-effect of the current impl). I am mentioning this as a stop-gap measure to improve things, as the redesign I have described above requires considerable changes and additional thinking, but I still think that would be a good discussion to have.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the long and detailed writeup, Luca.

@felixbarny @martijnvg fyi, my understanding is that it'd help users or Kibana to see both the nested objects under pass-through and the aliased fields at the root level. If this is not strictly needed, I can look into removing the aliases and update the MappingLookup/FieldTypeLookup logic instead, as suggested above.

Fwiw, I think we still need the Passthrough object as it provides a good solution to the problem of supporting dynamic dimensions in TSDB.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also thinking of doing a special case in MappingLookup/FieldTypeLookup instead of adding individual alias field mappers to the root object mapper. IINM, field caps uses this internally, so we'd still be able have field suggestions for this in Kibana.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw, I think we still need the Passthrough object as it provides a good solution to the problem of supporting dynamic dimensions in TSDB.

I agree and I didn't get the impression that Luca was questioning the existence of the new field type, just how it's implemented. For example, the passthrough field type doesn't necessarily need to extend ObjectMapper, especially since it doesn't support subobjects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting, I see that field caps does indeed use FieldTypeLookup. If this suffices, it definitely looks cleaner to just update these classes and skip adding aliases at the root level (so many issues with the latter part already..).

I'll put together a PR for this first, then see if it's better to make PassThrough a standalone mapper instead of an ObjectMapper subclass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow all properties of an object to be queried and aggregated as top-level attributes
5 participants