Add dotexpander processor #20078

martijnvg · 2016-08-19T13:47:28Z

Adds a processor the turns fields with dots into object fields, so that other processors can clean the data up before indexing.

talevy · 2016-08-22T17:35:40Z

~~LGTM~~

UPDATED

rjernst · 2016-08-22T17:54:27Z

I'm confused on the purpose of this feature? Dots in field names don't matter in master. They all get read the same way, as field name separators. So to document parsing, this:

{
  "foo" : {
      "bar.baz" : "value"
   }
}

will be no different than:

{
  "foo" : {
      "bar": {
          "baz" : "value"
      }
   }
}

talevy · 2016-08-22T17:57:00Z

@rjernst

This is for pre-indexing, we have our own fieldname resolution and do not have such a differentiator without this PR. am I missing something?

rjernst · 2016-08-22T17:58:41Z

@talevy There is no reason to structure the fields with dots in the names, they are not treated any different by document parsing than if the structure was there as in my example.

talevy · 2016-08-22T18:23:15Z

@rjernst, I agree that there is no reason, but users may have some other reasons for having them be that way and it makes sense for us to have a way to support it; no?

rjernst · 2016-08-22T18:25:59Z

but users may have some other reasons for having them be that way

IMO this is untenable long term. The point of ingest is to get the data setup to be searchable in a particular way. What the source looks like should not matter, and continuing to add things which allow users to rely on the structure of _source prevent improvements like #9034.

talevy · 2016-08-22T18:28:14Z

I kind of view it the other way. I view this as enabling users to convert between untenable structures into better ones where they can use ingest to move their legacy sources away from the dots-in-fields structure.

That being said, responsibility for resolving this on the client end seems acceptable for me as well.

talevy · 2016-08-22T18:46:31Z

after speaking with @rjernst offline, I think we should pre-process the document beforehand to convert objects of this type:

{ "foo.bar": "baz" }

into objects of this type:

{ "foo": { "bar" : "baz" } }

Since the object mappers do this internally while indexing anyways... it makes sense for Ingest to do it upfront. This will allow us to freely describe fields with dots without the need for escaping

thanks @rjernst for the chime, I think this is a friendlier solution since this is the default behavior in core anyways

martijnvg · 2016-08-22T22:44:59Z

I like idea to convert fields with dots into object fields.

I just wonder what the behaviour should if a document being processed has both object and with field names with dots (very rare case). Something like this:

{
  "foo.bar" : "value1",
   "foo" : {
      "bar" : "value2"
    }
}

Should we then just turn the bar field under foo field into an array field holding both values?

clintongormley · 2016-08-24T05:57:59Z

I'm not sure I agree with automatically converting dots to objects here, for exactly the reason @martijnvg has pointed out: you need to deal with conflicts. The ingest processor is the point at which you can take your raw messy documents and transform them into something more useful. In other words, you can take a doc like { "foo": 5, "foo.bar": 10 } and rewrite it into something that can be indexed into Elasticsearch.

Instead of transforming dots to objects automatically, i would provide this as function as a processor. Instead of adding support for escaping of dots, I would recommend using the script processor (which already has syntax for accessing fields with dots). This would allow the user to fix their document as they see fit, after which they can apply the dots-to-objects processor.

martijnvg · 2016-08-29T15:45:45Z

@talevy @clintongormley I've updated the PR to add a dedot processor instead of adding general support for dots in field names.

clintongormley · 2016-08-29T16:04:12Z

docs/reference/ingest/ingest-node.asciidoc

+Otherwise these fields can't accessed by any processor.
+
+[[uppercase-options]]
+.Uppercase Options


I think this should be Dedot Options?

happens to me all the time

my shameless copy pasting...

clintongormley · 2016-08-29T16:04:47Z

thanks @martijnvg - what happens when there are conflicting fields? Worth adding to the docs?

clintongormley · 2016-08-29T16:05:49Z

I'm also wondering if this should be called dedot given that its behaviour is very different from the logstash version. What about dot_expander?

talevy · 2016-08-29T16:10:53Z

I agree with @clintongormley dedot is rather confusing. especially since it used to have a different meaning.

martijnvg · 2016-08-29T16:12:10Z

what happens when there are conflicting fields?

It turns it into an array and you need to deal with it (even when types are conflicting, and if that isn't resolved then when serializing to json or in ES it will scream).

What about dot_expander?

+1

clintongormley · 2016-08-29T16:19:47Z

It turns it into an array and you need to deal with it (even when types are conflicting, and if that isn't resolved then when serializing to json or in ES it will scream).

What about this document?

{
  "foo": 5,
  "foo.bar": 10
}

martijnvg · 2016-08-29T20:08:05Z

@clintongormley So this would need to be done in two steps, first rename foo to foo.bar and then dot expend foo.bar (field with the actual dot) field. This way the actual implementation remains simple for this processor.

clintongormley · 2016-08-30T14:06:51Z

@martijnvg what i mean is: what would this processor do if it encountered this document? Throw an exception? What does that exception look like? And should we add this info to the docs?

martijnvg · 2016-08-30T14:58:20Z

The error that is now thrown is not clear (java.lang.IllegalArgumentException: cannot set [foo] with parent object of type [java.lang.String] as part of path [foo.bar]), so I can try to improve that. I did update the docs, explaining to use the rename processor in this situation.

talevy · 2016-09-01T14:22:59Z

docs/reference/ingest/ingest-node.asciidoc

+
+Expands a field with dots into an object field. This processor allows fields
+with dots in the name to be accessible by other processors in the pipeline.
+Otherwise these fields can't accessed by any processor.


missing a be

Otherwise these fields can't be accessed by any processor.

as a side note, should we link to this section here? https://www.elastic.co/guide/en/elasticsearch/reference/master/accessing-data-in-pipelines.html

talevy · 2016-09-02T21:40:14Z

LGTM

talevy · 2016-09-02T21:48:26Z

modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/DotExpanderProcessor.java

+import org.elasticsearch.ingest.Processor;
+
+import java.util.Map;
+import java.util.regex.Pattern;


unused import

…n the field name into object fields.

martijnvg added >bug review :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v5.0.0-beta1 labels Aug 19, 2016

clintongormley added the discuss label Aug 23, 2016

martijnvg force-pushed the ingest_handle_dots_in_field_names branch from 3bbd81f to 9e84927 Compare August 29, 2016 15:43

martijnvg changed the title ~~Add support for dots in fieldnames when referring to fields in processors~~ Add dedot processor Aug 29, 2016

martijnvg removed the discuss label Aug 29, 2016

clintongormley reviewed Aug 29, 2016
View reviewed changes

martijnvg force-pushed the ingest_handle_dots_in_field_names branch 2 times, most recently from 9ee5d27 to b87dbc9 Compare August 30, 2016 07:53

martijnvg changed the title ~~Add dedot processor~~ Add dotexpander processor Aug 31, 2016

martijnvg force-pushed the ingest_handle_dots_in_field_names branch from b87dbc9 to 37fd8f9 Compare August 31, 2016 08:45

talevy reviewed Sep 1, 2016
View reviewed changes

martijnvg force-pushed the ingest_handle_dots_in_field_names branch from 37fd8f9 to ced55e9 Compare September 2, 2016 21:35

talevy reviewed Sep 2, 2016
View reviewed changes

martijnvg force-pushed the ingest_handle_dots_in_field_names branch from ced55e9 to ee4c485 Compare September 5, 2016 05:27

ingest: Add dot_expander processor that can turn fields with dots i…

6f6d17d

…n the field name into object fields.

martijnvg force-pushed the ingest_handle_dots_in_field_names branch from ee4c485 to 6f6d17d Compare September 5, 2016 05:28

martijnvg merged commit 6f6d17d into elastic:master Sep 5, 2016

joegallo mentioned this pull request Mar 2, 2023

Add an example of dot_expander's path option #94291

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dotexpander processor #20078

Add dotexpander processor #20078

martijnvg commented Aug 19, 2016 •

edited

Loading

talevy commented Aug 22, 2016 •

edited

Loading

rjernst commented Aug 22, 2016 •

edited

Loading

talevy commented Aug 22, 2016

rjernst commented Aug 22, 2016

talevy commented Aug 22, 2016

rjernst commented Aug 22, 2016

talevy commented Aug 22, 2016

talevy commented Aug 22, 2016

martijnvg commented Aug 22, 2016

clintongormley commented Aug 24, 2016

martijnvg commented Aug 29, 2016

clintongormley Aug 29, 2016

talevy Aug 29, 2016

martijnvg Aug 29, 2016

clintongormley commented Aug 29, 2016

clintongormley commented Aug 29, 2016

talevy commented Aug 29, 2016

martijnvg commented Aug 29, 2016

clintongormley commented Aug 29, 2016

martijnvg commented Aug 29, 2016

clintongormley commented Aug 30, 2016

martijnvg commented Aug 30, 2016

talevy Sep 1, 2016

talevy Sep 1, 2016

talevy commented Sep 2, 2016

talevy Sep 2, 2016

Add dotexpander processor #20078

Add dotexpander processor #20078

Conversation

martijnvg commented Aug 19, 2016 • edited Loading

talevy commented Aug 22, 2016 • edited Loading

UPDATED

rjernst commented Aug 22, 2016 • edited Loading

talevy commented Aug 22, 2016

rjernst commented Aug 22, 2016

talevy commented Aug 22, 2016

rjernst commented Aug 22, 2016

talevy commented Aug 22, 2016

talevy commented Aug 22, 2016

martijnvg commented Aug 22, 2016

clintongormley commented Aug 24, 2016

martijnvg commented Aug 29, 2016

clintongormley Aug 29, 2016

Choose a reason for hiding this comment

talevy Aug 29, 2016

Choose a reason for hiding this comment

martijnvg Aug 29, 2016

Choose a reason for hiding this comment

clintongormley commented Aug 29, 2016

clintongormley commented Aug 29, 2016

talevy commented Aug 29, 2016

martijnvg commented Aug 29, 2016

clintongormley commented Aug 29, 2016

martijnvg commented Aug 29, 2016

clintongormley commented Aug 30, 2016

martijnvg commented Aug 30, 2016

talevy Sep 1, 2016

Choose a reason for hiding this comment

talevy Sep 1, 2016

Choose a reason for hiding this comment

talevy commented Sep 2, 2016

talevy Sep 2, 2016

Choose a reason for hiding this comment

martijnvg commented Aug 19, 2016 •

edited

Loading

talevy commented Aug 22, 2016 •

edited

Loading

rjernst commented Aug 22, 2016 •

edited

Loading