-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dotexpander processor #20078
Add dotexpander processor #20078
Conversation
UPDATED |
I'm confused on the purpose of this feature? Dots in field names don't matter in master. They all get read the same way, as field name separators. So to document parsing, this:
will be no different than:
|
This is for pre-indexing, we have our own fieldname resolution and do not have such a differentiator without this PR. am I missing something? |
@talevy There is no reason to structure the fields with dots in the names, they are not treated any different by document parsing than if the structure was there as in my example. |
@rjernst, I agree that there is no reason, but users may have some other reasons for having them be that way and it makes sense for us to have a way to support it; no? |
IMO this is untenable long term. The point of ingest is to get the data setup to be searchable in a particular way. What the source looks like should not matter, and continuing to add things which allow users to rely on the structure of _source prevent improvements like #9034. |
I kind of view it the other way. I view this as enabling users to convert between untenable structures into better ones where they can use ingest to move their legacy sources away from the dots-in-fields structure. That being said, responsibility for resolving this on the client end seems acceptable for me as well. |
after speaking with @rjernst offline, I think we should pre-process the document beforehand to convert objects of this type:
into objects of this type:
Since the object mappers do this internally while indexing anyways... it makes sense for Ingest to do it upfront. This will allow us to freely describe fields with dots without the need for escaping thanks @rjernst for the chime, I think this is a friendlier solution since this is the default behavior in core anyways |
I like idea to convert fields with dots into object fields. I just wonder what the behaviour should if a document being processed has both object and with field names with dots (very rare case). Something like this:
Should we then just turn the |
I'm not sure I agree with automatically converting dots to objects here, for exactly the reason @martijnvg has pointed out: you need to deal with conflicts. The ingest processor is the point at which you can take your raw messy documents and transform them into something more useful. In other words, you can take a doc like Instead of transforming dots to objects automatically, i would provide this as function as a processor. Instead of adding support for escaping of dots, I would recommend using the |
3bbd81f
to
9e84927
Compare
@talevy @clintongormley I've updated the PR to add a |
Otherwise these fields can't accessed by any processor. | ||
|
||
[[uppercase-options]] | ||
.Uppercase Options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be Dedot Options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
happens to me all the time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my shameless copy pasting...
thanks @martijnvg - what happens when there are conflicting fields? Worth adding to the docs? |
I'm also wondering if this should be called |
I agree with @clintongormley dedot is rather confusing. especially since it used to have a different meaning. |
It turns it into an array and you need to deal with it (even when types are conflicting, and if that isn't resolved then when serializing to json or in ES it will scream).
+1 |
What about this document?
|
@clintongormley So this would need to be done in two steps, first rename |
9ee5d27
to
b87dbc9
Compare
@martijnvg what i mean is: what would this processor do if it encountered this document? Throw an exception? What does that exception look like? And should we add this info to the docs? |
The error that is now thrown is not clear ( |
b87dbc9
to
37fd8f9
Compare
|
||
Expands a field with dots into an object field. This processor allows fields | ||
with dots in the name to be accessible by other processors in the pipeline. | ||
Otherwise these fields can't accessed by any processor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing a be
Otherwise these fields can't be accessed by any processor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a side note, should we link to this section here? https://www.elastic.co/guide/en/elasticsearch/reference/master/accessing-data-in-pipelines.html
37fd8f9
to
ced55e9
Compare
LGTM |
import org.elasticsearch.ingest.Processor; | ||
|
||
import java.util.Map; | ||
import java.util.regex.Pattern; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused import
ced55e9
to
ee4c485
Compare
…n the field name into object fields.
ee4c485
to
6f6d17d
Compare
Adds a processor the turns fields with dots into object fields, so that other processors can clean the data up before indexing.