Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further improve robustness of geo shape parser for malformed shapes #34498

Conversation

imotov
Copy link
Contributor

@imotov imotov commented Oct 15, 2018

Continuation of the work in #31449. Ensures that malformed geoshapes are
reliably ignored if "ignore_malformed" is set to true instead of failing
the entire document by making sure that xcontent parse is left in a
coherent state even if a data format parsing error occurred.

Fixes #34047

Continuation of the work in elastic#31449. Ensures that malformed geoshapes are
reliably ignored if "ignore_malformed" is set to true instead of failing
the entire document by making sure that xcontent parse is left in a
coherent state even if a data format parsing error occurred.

Fixes elastic#34047
@imotov imotov added >bug :Analytics/Geo Indexing, search aggregations of geo points and shapes v7.0.0 v6.5.0 labels Oct 15, 2018
@imotov imotov requested a review from nknize October 15, 2018 21:41
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@nknize
Copy link
Contributor

nknize commented Oct 17, 2018

Pinging @jtibshirani to see if she has interest (and time) to review.

@imotov
Copy link
Contributor Author

imotov commented Oct 17, 2018

retest this please

@jtibshirani
Copy link
Contributor

Happy to take a look! I had one high-level question before I jumped in. It's a little tricky to make sure we cover all the places where an exception can occur, and recover appropriately (especially as the code evolves). One thought that occurred to me is whether we could copy the GeoJSON structure into a separate, temporary parser, and if parsing failed, we could just skip the whole object on the main parser. There are performance downsides to this approach, but wanted to put it out there in case it could be interesting.

@imotov
Copy link
Contributor Author

imotov commented Oct 18, 2018

One thought that occurred to me is whether we could copy the GeoJSON structure into a separate, temporary parser, and if parsing failed, we could just skip the whole object on the main parser.

I like the idea, let me see what I can do there.

@colings86 colings86 added v6.6.0 and removed v6.5.0 labels Oct 25, 2018
@imotov
Copy link
Contributor Author

imotov commented Nov 15, 2018

After thinking about it, I don't think we can afford coping the GeoJSON structure into a separate parser because these shapes can be quite huge. However, I think I found a more robust way of doing this that will provide similar functionality without the copying overhead. It is a dramatically different approach so I am going to open a different PR for it in a bit.

@imotov imotov closed this Nov 15, 2018
imotov added a commit to imotov/elasticsearch that referenced this pull request Nov 15, 2018
Adds a method to XContent parser to skip all children of a current
element in case of the parsing failure and applies this method to be
able to ignore the rest of the GeoJson shape if the parsing fails and
we need to ignore the geoshape due to the ignore malformed flag.

Supersedes elastic#34498

Closes elastic#34047
imotov added a commit to imotov/elasticsearch that referenced this pull request Nov 15, 2018
Adds a method to XContent parser to skip all children of a current
element in case of the parsing failure and applies this method to be
able to ignore the rest of the GeoJson shape if the parsing fails and
we need to ignore the geoshape due to the ignore malformed flag.

Supersedes elastic#34498

Closes elastic#34047
imotov added a commit that referenced this pull request Nov 21, 2018
…5603)

Adds an XContent sub parser class that can to wrap another
XContent parser at the beginning of an object and allow skiping
all children in case of the parsing failure. It also uses this
subparser to ignore the rest of the GeoJson shape if the 
parsing fails and we need to ignore the geoshape due to the 
ignore_malformed flag.

Supersedes #34498

Closes #34047
imotov added a commit that referenced this pull request Nov 22, 2018
…5603)

Adds an XContent sub parser class that can to wrap another
XContent parser at the beginning of an object and allow skiping
all children in case of the parsing failure. It also uses this
subparser to ignore the rest of the GeoJson shape if the
parsing fails and we need to ignore the geoshape due to the
ignore_malformed flag.

Supersedes #34498

Closes #34047
@imotov imotov deleted the issue-34047-more-robust-error-handling-in-geojson-parser branch May 1, 2020 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes >bug v6.6.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants