New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot index some geo_shape geometries (before and after Geo Refactoring changes) #3909
Comments
Some chat from IRC with opinions about whether this belongs in ES or not... :)
|
Hi @mvjars, I think the GeoJson is technically valid. The problem is forced by the underlying libraries that we use to handle geo data inside ES. These libraries make some assumptions to the data which are checked before the data gets indexed. In my opinion this assumptions must not necessarily be checked if the GeoJson is indexed. But currently we use this libraries to index geo data also. I'm working on separating the GeoJson format specification from the logic layer and move this kind of exceptions to the logic layer. |
Is there any update on this issue? |
@nknize any ideas here? |
@lababidi Out of curiosity why make this a polygon type? Why not change it to a multipoint? |
@nknize Thanks for checking in. I'm not sure I understand what your question is. Let me see if I can reframe the issue, succinctly. Twitter provides the following object. It is rejected by elasticsearch for only having 4 elements and not 5. This strict requirement is not consistent with GeoJSON. Please allow elasticsearch to accept this Object, out of the box, because technically it is legal GeoJSON. Currently I must write a middle-man to add a 5th-element (cue Bruce Willis jokes here) to this array of coordinates. This is a bit kludgy in terms of workflow:
|
@lababidi Thanks for the update. I mistook the problem you're describing for the original issue (repeated points in a polygon). I agree we should accept Polygons like this and close them for you. |
This is excellent news. I'd be happy to contribute any code if your plate is full. Just point me to where the preferred location that this logic belongs in and I will pull request. |
Probably FAO @chilling .
I have a mapping that includes some geo_shape fields. My test data contains GeoJSON fields that specify points, but do so in the form of polygons with the same lon/lat repeated 4 (or sometimes 3) times. This causes a validation exception to be thrown up from within Spatial4J when it generates the polygon.
I am not sure if this is technically invalid GeoJSON or not - however, this is a form that the Twitter API generates frequently (around 1 in 1500 tweets in my dataset), and other libraries I have used can parse it OK. It would be good if ES can attempt to do what it can with data like this, rather than failing (i.e. treat it as another appropriate type like a point, or relax the verification).
This is tested against the branch that includes the Geo-Refactoring improvements at chilling/elasticsearch@0369983 (it does also happen on master, if you can get that far!).
The stack trace:
To reproduce:
I wrote a nasty, hacky workaround to fix this particular case - see below. This is obviously not an acceptable general purpose solution, as it doesn't address the issue for any other shapes, or other cases like only having 2 or 3 distinct points. I am not really familiar enough with the libraries, but I guess a real fix might involve simplifying or normalizing each geometry somehow before it gets passed to Spatial4J?
Alternatively it could be deemed that this is not ES's job, and the GeoJSON needs to be more well-formed - but I think this is likely to be a common problem due to the source of this data, and because of the different expectations of different GeoJSON libraries.
ShapeBuilder.java:
The text was updated successfully, but these errors were encountered: