Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate that fields are defined only once. #15243

Merged
merged 1 commit into from Dec 15, 2015

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Dec 4, 2015

There are two ways that a field can be defined twice:

  • by reusing the name of a meta mapper in the root object (_id, _routing,
    etc.)
  • by defining a sub-field both explicitly in the mapping and through the code
    in a field mapper (like ExternalMapper does)

This commit adds new checks in order to make sure this never happens.

Close #15057

@jpountz jpountz added >bug :Search/Mapping Index mappings, including merging and defining field types labels Dec 4, 2015
@jpountz
Copy link
Contributor Author

jpountz commented Dec 4, 2015

Still to be defined: how to deal with pre-existing mappings that have the problem? Should we ignore the problem for old versions or should we try to "fix" the mapping?

@rjernst
Copy link
Member

rjernst commented Dec 4, 2015

The change LGTM. For how to deal with existing broken mappings, I think we must fail? eg if a user has an object and field mapper with the same path, what would we do? "choose" one, effectively removing the other? The configuration is broken, the user should be alerted so they can fix their mappings (which will require a reindex).

@clintongormley
Copy link

I don't know how to replicate the ExternalMapper case, but for meta-fields that are also defined as properties, I think it would be safe to drop the version inside properties when upgrading. In 1.7:

PUT t 
{
  "mappings": {
    "t": {
      "properties": {
        "_id": {
          "type": "string",
          "index": "analyzed"
        }
      }
    }
  }
}

GET t/_mapping/t/field/*?include_defaults

The _id meta field is defined correctly (with index: no) and the _id body field doesn't exist (even though the field exists in the JSON mapping).

Taking it one step further, if i define the _id body field to be an object:

DELETE t

PUT t
{
  "mappings": {
    "t": {
      "properties": {
        "_id": {
          "type": "object",
          "properties": {
            "foo": {
              "type": "string"
            }
          }
        }
      }
    }
  }
}

GET t/_mapping/t/field/*?include_defaults

... the _id meta field is still mapped correctly and the foo sub-field doesn't exist. In fact, if you try to index _id as an object:

PUT t/t/1
{
  "_id": {"foo": 5}, "bar": 5
}

GET t/_mapping

... it screws up the parsing: foo is treated as though it were top-level, not as part of the _id object, then it reaches the } before bar and thinks it has finished parsing the document, so bar doesn't get parsed at all.

@jpountz
Copy link
Contributor Author

jpountz commented Dec 8, 2015

Thanks @clintongormley. So from what I can tell, this won't be an issue when upgrading directly from 1.x as mapping updates that try to add metadata fields under the root are ignored. On the other hand 2.x allows to put such meta fields under the root object, and the upgrade would fail if we start rejecting it. However, DocumentParser rejects documents that have a metadata field at the root, so this can only happen with an explicit mapping update. So in the end I suspect that very few users would be impacted, so Ryan's proposal to fail at parsing time sounds more appealing to me than attempting to fix broken mappings (which is dangerous)?

@jpountz
Copy link
Contributor Author

jpountz commented Dec 9, 2015

I also pushed a new commit that tests that this validation also works for indices created on 2.x, which have their metadata mappers added to the root object.

@clintongormley
Copy link

@jpountz sounds good to me. Rather make this solid than introduce complications to handle edge cases.

There are two ways that a field can be defined twice:
 - by reusing the name of a meta mapper in the root object (`_id`, `_routing`,
   etc.)
 - by defining a sub-field both explicitly in the mapping and through the code
   in a field mapper (like ExternalMapper does)

This commit adds new checks in order to make sure this never happens.

Close elastic#15057
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Mapping Index mappings, including merging and defining field types v2.2.0 v5.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants