pipeline processors fails when field is not present #19995

djschny · 2016-08-15T19:30:48Z

Elasticsearch version: v5.0.0-alpha5

Plugins installed: []

JVM version: 1.8.0_92-b14

OS version: Mac 10.11.15

Description of the problem including expected versus actual behavior:

The convert pipeline processor throws an error if the target field to convert does not exist in the document. Ideally adding an option of ignore_missing or similar would exist and would default to true so that way for the majority of users it just works. The ignore_failure option would not work in this scenario because we would still want to fail in the situation where there was a parse error or other unknown error that happened.

This can be illustrated with the very common/poster-child example of apache log data where we would have the equivalent Logstash config in an ingest node pipeline:

PUT _ingest/pipeline/apachelogs
{
  "description": "Pipeline to parse Apache logs",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": [
          "%{COMBINEDAPACHELOG}"
        ]
      }
    },
    {
      "date": {
        "field": "timestamp",
        "target_field": "timestamp",
        "formats": [
          "dd/MMM/YYYY:HH:mm:ss Z"
        ]
      }
    },
    {
      "convert": {
        "field": "response",
        "type": "integer"
      }
    },
    {
      "convert": {
        "field": "bytes",
        "type": "integer"
      }
    }
  ]
}

In this situation bytes is optional and not present in the case of HEAD/OPTIONS/DELETE requests. If the bytes field is not present ideally the pipeline should still work for an end user. See related elastic/beats#2229 for original discovery and background information.

CC @talevy as talked briefly about it earlier this morning

The text was updated successfully, but these errors were encountered:

clintongormley · 2016-08-16T13:47:48Z

I can imagine this situation is pretty common. Wondering if convert should know to ignore missing fields and null values out of the box? Not sure how consistent this is with other processors?

talevy · 2016-08-23T18:30:57Z

Will open a PR for this shortly. a new ignore_missing boolean option will be added to processors so that we can handle such field-missing exceptions separately from the broader ignore_failure.

djschny · 2016-08-26T21:52:56Z

Hitting this a lot across many processors. For example running into the issue with convert processor as well. Looking forward to the addition of ignore_missing.

talevy · 2016-08-27T17:30:25Z

@djschny I am scared of there being too many ways to control pipeline flow using exception handling. Mind sharing how you are hitting this with other processors and why the ignore_failure option is not sufficient?

djschny · 2016-08-28T13:51:27Z

Mind sharing how you are hitting this with other processors and why the ignore_failure option is not sufficient?

Sure, consider the following simple example and processor:

      "convert": {
        "field": "test_score",
        "type": "float"
      }

In the above the field programming_ability is a string and want to convert it to a float. Some of the documents do not have this field. For those missing the field I want the processor to not throw an error. If I use ignore_failure the issue is that it will ignore problems where a document does have a field and the value does not parse due to a string that is not an float (for example "none".

For many of these processors the absence of a field, a field with a null value, and potentially even an empty string field should all be ignored and not throw an error IMO.

djschny · 2016-08-29T03:46:03Z

I'm augmenting the title to be generic as so far I have run into this issue when implementing pipelines and for the following processors:

convert
trim
grok
rename

talevy · 2016-09-07T06:16:19Z

@djschny I've update the PR to include these, except for rename. I feel that ignore_failure captures the escape behavior well enough for that case (only other way it would fail is if you supply an invalid path as the destination)

djschny · 2016-09-07T11:57:30Z

Cool thanks @talevy

However from an external user perspective, consistency would be better. Externally ignore_missing feels like a global setting that would apply to all processors (regardless of how it is implemented internally).

To have to set it one way on some processors and another way on others just seems odd IMO.

talevy · 2016-09-07T16:30:26Z

However from an external user perspective, consistency would be better

I can't disagree there, I am all for consistency. It is just not the case that all processors
operate by extracting information from one field.

feels like a global setting that would apply to all processors

Although, this may be a common theme (extracting data from one field) for many (if not all) of the existing processors, this is more of a coincidence rather than a rule.

Since processors are can be as flexible as "fetch data from somewhere else, and inject in document"(#20340) or "translate all field values to Spanish", the ignore_missing would not make sense in all those cases, while something like ignore_failure does.

As you can see by the search-processor discussion, the scope of what processors can/will/should do is still contested. I would like to avoid introducing further generalizations that may become difficult to move away from in the future, as things stabilize.

TL;DR I think the ignore_missing should be applied in a case-by-case basis and the documentation should help overcome any assumptions that one may make about such a field existing in all processors.

what do you think?

djschny · 2016-09-07T16:46:32Z

I'm all fine with not making it a global/common property. No problem there and understand the concern.

However for the rename processor I believe it should still be applied there. In addition to the consistency item mentioned, it would help distinguish between situations where the field was missing and there was an actual error. For example as a user I would want a document that did not have the presence of a field to just be a no-op. However if the field was renamed to something that caused an error (which is possible if the target name is a field or ingest metadata reference that is a non-valid json field name), then I would want to that to actually cause an error, but if ignore_failure was set to true, then this would be masked.

talevy · 2016-09-07T16:52:00Z

++ I'll add it to the rename

djschny · 2016-09-10T00:46:20Z

Saw that the PR merged, thanks! However I noticed ignore_missing defaults to false. From my experience thus far working with pipelines to perform tasks, majority of the time I need ignore_missing to be true.

Were there any particular considerations into why defaulting to false was chosen?

clintongormley · 2016-09-13T11:25:07Z

Were there any particular considerations into why defaulting to false was chosen?

because it is explicit. we tell you by default if there is a problem, but we give you the tools to deal with it as you need.

talevy · 2016-09-13T21:11:23Z

Closing since this has been added to the discussed processors here: #20194

djschny · 2016-10-10T18:48:01Z

Looks like this is a problem on the split processor as well. I have opened a new issue to address.

talevy added :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement v5.0.0-beta1 labels Aug 15, 2016

talevy self-assigned this Aug 15, 2016

talevy mentioned this issue Aug 27, 2016

add ignore_missing option to convert,trim,lowercase,uppercase,grok,rename #20194

Merged

2 tasks

djschny changed the title ~~convert pipeline processor fails when field is not present~~ pipeline processors fails when field is not present Aug 29, 2016

talevy closed this as completed Sep 13, 2016

clintongormley added the v5.0.0-beta1 label Sep 14, 2016

djschny mentioned this issue Oct 10, 2016

add ignore_missing to split pipeline processor #20840

Closed

talevy mentioned this issue Oct 17, 2016

add ignore_missing option to SplitProcessor #20982

Merged

djschny mentioned this issue Dec 13, 2016

add ignore_missing property to foreach processor #22147

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pipeline processors fails when field is not present #19995

pipeline processors fails when field is not present #19995

djschny commented Aug 15, 2016 •

edited

Loading

clintongormley commented Aug 16, 2016

talevy commented Aug 23, 2016

djschny commented Aug 26, 2016

talevy commented Aug 27, 2016

djschny commented Aug 28, 2016

djschny commented Aug 29, 2016

talevy commented Sep 7, 2016

djschny commented Sep 7, 2016

talevy commented Sep 7, 2016 •

edited

Loading

djschny commented Sep 7, 2016

talevy commented Sep 7, 2016

djschny commented Sep 10, 2016

clintongormley commented Sep 13, 2016

talevy commented Sep 13, 2016

djschny commented Oct 10, 2016

pipeline processors fails when field is not present #19995

pipeline processors fails when field is not present #19995

Comments

djschny commented Aug 15, 2016 • edited Loading

clintongormley commented Aug 16, 2016

talevy commented Aug 23, 2016

djschny commented Aug 26, 2016

talevy commented Aug 27, 2016

djschny commented Aug 28, 2016

djschny commented Aug 29, 2016

talevy commented Sep 7, 2016

djschny commented Sep 7, 2016

talevy commented Sep 7, 2016 • edited Loading

djschny commented Sep 7, 2016

talevy commented Sep 7, 2016

djschny commented Sep 10, 2016

clintongormley commented Sep 13, 2016

talevy commented Sep 13, 2016

djschny commented Oct 10, 2016

djschny commented Aug 15, 2016 •

edited

Loading

talevy commented Sep 7, 2016 •

edited

Loading