Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rename processor fails when trying to rename to nested field with same parent name #19892

Closed
djschny opened this issue Aug 9, 2016 · 8 comments
Assignees
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP discuss v5.0.0-beta1

Comments

@djschny
Copy link
Contributor

djschny commented Aug 9, 2016

Elasticsearch version: v5.0.0-alpha5

Plugins installed: x-pack

JVM version: jdk1.8.0_92

OS version: Mac 10.11.5

Description of the problem including expected versus actual behavior:
When using an ingest pipeline with a rename processor that tries to rename the element to a nested one where the parent object name is the same as the current object, it fails.

Steps to reproduce:

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "rename": {
          "field": "version",
          "target_field": "version.displayName"
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "myindex",
      "_type": "doc",
      "_id": "1",
      "_source": {
        "version": "1.1.0"
      }
    }
  ]
}

Results in the following error:

{
  "docs": [
    {
      "error": {
        "root_cause": [
          {
            "type": "exception",
            "reason": "java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: cannot set [displayName] with parent object of type [java.lang.String] as part of path [version.displayName]",
            "header": {
              "processor_type": "rename"
            }
          }
        ],
        "type": "exception",
        "reason": "java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: cannot set [displayName] with parent object of type [java.lang.String] as part of path [version.displayName]",
        "caused_by": {
          "type": "illegal_argument_exception",
          "reason": "java.lang.IllegalArgumentException: cannot set [displayName] with parent object of type [java.lang.String] as part of path [version.displayName]",
          "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "cannot set [displayName] with parent object of type [java.lang.String] as part of path [version.displayName]"
          }
        },
        "header": {
          "processor_type": "rename"
        }
      }
    }
  ]
}

Desired expectation is that this would work. The workaround right now is to rename to a temporary name. So for example:

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "rename": {
          "field": "version",
          "target_field": "versionTemp"
        }
      },
      {
        "rename": {
          "field": "versionTemp",
          "target_field": "version.displayName"
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "myindex",
      "_type": "doc",
      "_id": "1",
      "_source": {
        "version": "1.1.0"
      }
    }
  ]
}
@talevy talevy added :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v5.0.0-beta1 discuss labels Aug 9, 2016
@talevy
Copy link
Contributor

talevy commented Aug 9, 2016

@djschny The reason this happens is that the rename processor only does set and remove operations on the document. It only removes the original field after it successfully sets the target_field. I believe this was done this way since we do not have a notion of a transaction within a processor's execution. The rational was that in this way, the worst case is that you have duplicate values and keep the old data instead of fail and you have just lost the field down the pipeline.

I am open to loosening this up so that you can do as you suggest above. This does seem to be a super super conservative way of operating on the document that may not be necessary, but I'd like to hear your opinion around that.

@djschny
Copy link
Contributor Author

djschny commented Aug 9, 2016

Thanks for the explanation behind the scenes @talevy. The rename processor essentially is just a convenience once then for folks?

The rational was that in this way, the worst case is that you have duplicate values and keep the old data instead of fail and you have just lost the field down the pipeline.

I'm trying to think of a situation where a user would want the processor to continue instead of fail, but am having trouble coming up with a situation where that would be useful.

IMO we are safer going out with the failure behavior always and then can loosen back to the way it is implemented now. However going the opposite way in the future (add failure) would be a much harder change for the end user I'm thinking.

Thoughts?

@talevy
Copy link
Contributor

talevy commented Aug 9, 2016

I'm trying to think of a situation where a user would want the processor to continue instead of fail, but am having trouble coming up with a situation where that would be useful.

the pipeline may continue if the rename processor or the pipeline declare on_failure processors. If, for whatever reason, the field is removed and not successfully added into the new field... then they would lose the data.

that being said, this exception can be caught, and the processor can re-insert the field back into the original field.

@martijnvg
Copy link
Member

The cause of this exception is because ingest expects version to be a json object when setting the value for version.displayName and that isn't the case (it is a plain value).

The same exception bubbles up when the set processor is used like this:

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "set": {
          "field": "version.displayName",
          "value": "{{version}}"
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "myindex",
      "_type": "doc",
      "_id": "1",
      "_source": {
        "version": "1.1.0"
      }
    }
  ]
}

I think we should fix, both examples should just work. Ingest should just plainly overwrite the version field and not care about the previous value. The version field should become a json object that has a displayName string field.

In this case only if the version field is already a json object it should not be overwritten. For example in this case (which works as expected):

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "rename": {
          "field": "version.name",
          "target_field": "version.displayName"
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "myindex",
      "_type": "doc",
      "_id": "1",
      "_source": {
        "version": {
          "name" : "1.1.0"
        }
      }
    }
  ]
}

@djschny
Copy link
Contributor Author

djschny commented Oct 10, 2016

I see this changed was merged and is labeled for v5, but with my testing on v5.0.0-rc1 the error is still thrown.

Is it intentional that the merged change was not included in v5.0.0-rc1?

@martijnvg
Copy link
Member

@djschny This change is part of of rc1 (and beta1), so that shouldn't happen any more. I just checked the request snippet you shared initially in this issue and that doesn't fail. Or can you make the rename processor fail with a different request?

@djschny
Copy link
Contributor Author

djschny commented Oct 12, 2016

@martijnvg that's great that it is included! Didn't think it was fixed since the issue was not closed.

I'm going back through my Console/Sense histories and trying to find the situation where I saw the error. Sorry I should have posted the snippet on my previous comment. You are correct everything is working great with the snippet in my original post.

If you don't mind give me another day to review with the work I was doing before and will report back with whether I made a mistake or found a different situation.

Regardless thank you for the work, it really helps make the pipelines simpler!

@djschny
Copy link
Contributor Author

djschny commented Oct 24, 2016

Sorry for the delay and was unable to reproduce, sorry for the noise, I either was using a different version or misread a different error for this one. Thanks again for getting this into v5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP discuss v5.0.0-beta1
Projects
None yet
Development

No branches or pull requests

4 participants