Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reindex confict: "proceed" throws error when casting malformed data to `geo_point` #17617

Closed
g00fy- opened this issue Apr 8, 2016 · 4 comments

Comments

Projects
None yet
6 participants
@g00fy-
Copy link

commented Apr 8, 2016

When reindexing malformed object to geo_point the conflict parameter is ignorred and stops the task

example doc

{"geo":{"lat":null,"lon":null}}

with mapping:

{"properties":{"geo":{"properties":{"lat":{"type":long}, "lon":{"type":long}}}}

reindexing to

{"properties":{"geo":{"type":"geo_point", "ignore_malformed": true}}}

exception:

{
   "took": 24483,
   "timed_out": false,
   "total": 5211,
   "updated": 699,
   "created": 2100,
   "batches": 28,
   "version_conflicts": 0,
   "noops": 0,
   "retries": 0,
   "failures": [
      {
         "index": "search.locations",
         "type": "location",
         "id": "86d20ff9-ea24-448c-95a4-f060a30a80ad",
         "cause": {
            "type": "mapper_parsing_exception",
            "reason": "failed to parse",
            "caused_by": {
               "type": "parse_exception",
               "reason": "latitude must be a number"
            }
         },
         "status": 400
      }
   ]
}

Elasticsearch version:
docker:latest (2.3)

example query

POST /_reindex
{
   "conflicts": "proceed",
   "source": {
      "index": "raw.locations"
   },
   "dest": {
      "index": "search.locations"
   }
}
@nik9000

This comment has been minimized.

Copy link
Contributor

commented Apr 8, 2016

That doesn't look like a bug to me. Maybe a documentation bug.
conflicts=proceed only works on version conflicts.

If you want to skip the ones with invalid Geo points maybe you can craft
the query to do so? You could probably also use a script to try to make
them valid. But reindex doesn't have support for skipping arbitrary errors.
On Apr 8, 2016 6:58 AM, "Piotrek Majewski" notifications@github.com wrote:

When reindexing malformed object to geo_point the conflict parameter is
ignorred and stops the task

example doc

{"geo":{"lat":null,"lon":null}}

with mapping:

{"properties":{"geo":{"properties":{"lat":{"type":long}, "lon":{"type":long}}}}

reindexing to

{"properties":{"geo":{"type":"geo_point", "ignore_malformed": true}}}

exception:

{
"took": 24483,
"timed_out": false,
"total": 5211,
"updated": 699,
"created": 2100,
"batches": 28,
"version_conflicts": 0,
"noops": 0,
"retries": 0,
"failures": [
{
"index": "search.locations",
"type": "location",
"id": "86d20ff9-ea24-448c-95a4-f060a30a80ad",
"cause": {
"type": "mapper_parsing_exception",
"reason": "failed to parse",
"caused_by": {
"type": "parse_exception",
"reason": "latitude must be a number"
}
},
"status": 400
}
]
}

Elasticsearch version:
docker:latest (2.3)

example query

POST /_reindex
{
"conflicts": "proceed",
"source": {
"index": "raw.locations"
},
"dest": {
"index": "search.locations"
}
}


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#17617

@eskibars eskibars added the >docs label Apr 8, 2016

@eskibars

This comment has been minimized.

Copy link
Contributor

commented Apr 8, 2016

We probably should clarify a bit further in the documentation what "conflicts" means. Its a bit implicit by the fact that the return result says "version_conflicts", but I could see how somebody could miss that and assume this will skip more problem types

@henningandersen

This comment has been minimized.

Copy link
Contributor

commented Mar 26, 2019

I tend to think there is a bug with geo_point here. Since the mapping was marked ignore_malformed : true, the index request should have succeeded. Nothing to do with the "conflicts": "proceed" _reindex setting though.

I tried reproducing this on master by indexing a doc with null lat/lon. It gave me a different error from above:

put localhost:9200/x/_doc/3?pretty
{"geo":{"lat":null,"lon":null}}
{
  "error" : {
    "root_cause" : [
      {
        "type" : "mapper_parsing_exception",
        "reason" : "failed to parse"
      }
    ],
    "type" : "mapper_parsing_exception",
    "reason" : "failed to parse",
    "caused_by" : {
      "type" : "illegal_argument_exception",
      "reason" : "Malformed content, found extra data after parsing: END_OBJECT"
    }
  },
  "status" : 400
}

It seems the ignore_malformed handling has been put in place, but the parsing then mismatches the curly braces. This PR: #16833 seems to have added the ignore_malformed handling.

henningandersen added a commit to henningandersen/elasticsearch that referenced this issue Mar 26, 2019

Geo Point parse error fix
When geo point parsing threw a parse exception, it did not consume
remaining tokens from the parser. This in turn meant that
indexing documents with malformed geo points into mappings with
ignore_malformed=true would fail in some cases, since DocumentParser
expects geo_point parsing to end on the END_OBJECT token.

Related to elastic#17617

henningandersen added a commit to henningandersen/elasticsearch that referenced this issue Mar 26, 2019

Geo Point parse error fix
When geo point parsing threw a parse exception, it did not consume
remaining tokens from the parser. This in turn meant that
indexing documents with malformed geo points into mappings with
ignore_malformed=true would fail in some cases, since DocumentParser
expects geo_point parsing to end on the END_OBJECT token.

Related to elastic#17617

henningandersen added a commit to henningandersen/elasticsearch that referenced this issue Mar 27, 2019

Geo Point parse error fix
Improved XContentSubParser to allow any token, which is useful for
wrapping in cases where both object and values are allowed.

Related to elastic#17617

henningandersen added a commit to henningandersen/elasticsearch that referenced this issue Mar 28, 2019

Geo Point parse error fix
Reverted to a minimalistic change.

Related to elastic#17617

henningandersen added a commit that referenced this issue Mar 28, 2019

Geo Point parse error fix (#40447)
When geo point parsing threw a parse exception, it did not consume
remaining tokens from the parser. This in turn meant that
indexing documents with malformed geo points into mappings with
ignore_malformed=true would fail in some cases, since DocumentParser
expects geo_point parsing to end on the END_OBJECT token.

Related to #17617

henningandersen added a commit that referenced this issue Mar 29, 2019

Geo Point parse error fix (#40447)
When geo point parsing threw a parse exception, it did not consume
remaining tokens from the parser. This in turn meant that
indexing documents with malformed geo points into mappings with
ignore_malformed=true would fail in some cases, since DocumentParser
expects geo_point parsing to end on the END_OBJECT token.

Related to #17617

henningandersen added a commit that referenced this issue Mar 29, 2019

Geo Point parse error fix (#40447)
When geo point parsing threw a parse exception, it did not consume
remaining tokens from the parser. This in turn meant that
indexing documents with malformed geo points into mappings with
ignore_malformed=true would fail in some cases, since DocumentParser
expects geo_point parsing to end on the END_OBJECT token.

Related to #17617

henningandersen added a commit that referenced this issue Apr 8, 2019

Geo Point parse error fix (#40447)
When geo point parsing threw a parse exception, it did not consume
remaining tokens from the parser. This in turn meant that
indexing documents with malformed geo points into mappings with
ignore_malformed=true would fail in some cases, since DocumentParser
expects geo_point parsing to end on the END_OBJECT token.

Related to #17617
@henningandersen

This comment has been minimized.

Copy link
Contributor

commented Apr 8, 2019

Fixed above mentioned bug in geo_point parsing and clarified meaning of conflicts: proceed in reindex. Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.