New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Further improve robustness of geo shape parser for malformed shapes #34498
Further improve robustness of geo shape parser for malformed shapes #34498
Conversation
Continuation of the work in elastic#31449. Ensures that malformed geoshapes are reliably ignored if "ignore_malformed" is set to true instead of failing the entire document by making sure that xcontent parse is left in a coherent state even if a data format parsing error occurred. Fixes elastic#34047
Pinging @elastic/es-search-aggs |
Pinging @jtibshirani to see if she has interest (and time) to review. |
retest this please |
Happy to take a look! I had one high-level question before I jumped in. It's a little tricky to make sure we cover all the places where an exception can occur, and recover appropriately (especially as the code evolves). One thought that occurred to me is whether we could copy the GeoJSON structure into a separate, temporary parser, and if parsing failed, we could just skip the whole object on the main parser. There are performance downsides to this approach, but wanted to put it out there in case it could be interesting. |
I like the idea, let me see what I can do there. |
After thinking about it, I don't think we can afford coping the GeoJSON structure into a separate parser because these shapes can be quite huge. However, I think I found a more robust way of doing this that will provide similar functionality without the copying overhead. It is a dramatically different approach so I am going to open a different PR for it in a bit. |
Adds a method to XContent parser to skip all children of a current element in case of the parsing failure and applies this method to be able to ignore the rest of the GeoJson shape if the parsing fails and we need to ignore the geoshape due to the ignore malformed flag. Supersedes elastic#34498 Closes elastic#34047
Adds a method to XContent parser to skip all children of a current element in case of the parsing failure and applies this method to be able to ignore the rest of the GeoJson shape if the parsing fails and we need to ignore the geoshape due to the ignore malformed flag. Supersedes elastic#34498 Closes elastic#34047
…5603) Adds an XContent sub parser class that can to wrap another XContent parser at the beginning of an object and allow skiping all children in case of the parsing failure. It also uses this subparser to ignore the rest of the GeoJson shape if the parsing fails and we need to ignore the geoshape due to the ignore_malformed flag. Supersedes #34498 Closes #34047
…5603) Adds an XContent sub parser class that can to wrap another XContent parser at the beginning of an object and allow skiping all children in case of the parsing failure. It also uses this subparser to ignore the rest of the GeoJson shape if the parsing fails and we need to ignore the geoshape due to the ignore_malformed flag. Supersedes #34498 Closes #34047
Continuation of the work in #31449. Ensures that malformed geoshapes are
reliably ignored if "ignore_malformed" is set to true instead of failing
the entire document by making sure that xcontent parse is left in a
coherent state even if a data format parsing error occurred.
Fixes #34047