Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WKT and GeoJSON's shape types need proper capitalization #49568

Closed
talevy opened this issue Nov 25, 2019 · 6 comments · Fixed by #50400
Closed

WKT and GeoJSON's shape types need proper capitalization #49568

talevy opened this issue Nov 25, 2019 · 6 comments · Fixed by #50400
Assignees
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes >bug

Comments

@talevy
Copy link
Contributor

talevy commented Nov 25, 2019

situation

Elasticsearch's WKT and GeoJSON writers lowercase a shape's type.
So WKT's POINT is written as point, and GeoJSON's Point is written
as point. The specs for these formats explicitly define proper capitalization
rules.

Another point: Elasticsearch's parsers are case-insensitive... need to determine
whether that is OK to keep.

The only place that this seems to be a user-facing problem is with the
Circle Processor:

PUT circles/_doc/1?pipeline=polygonize_circles
{ "circle": "CIRCLE (30 10 40)" }

PUT circles/_doc/2?pipeline=polygonize_circles
{ "circle": { "type": "circle", "coordinates": [101, 1], "radius": "100m" } }

GET circles/_search?filter_path=hits.hits._source.circle
...
        "_source" : {
          "circle" : "polygon ((30.000365257263184 10.0, 30.000111397193788 10.00034284530941, 29.999706043744222 10.000213571721195, 29.999706043744222 9.999786428278805, 30.000111397193788 9.99965715469059, 30.000365257263184 10.0))"
        }
      },
      {
        "_source" : {
          "circle" : {
            "type" : "polygon",
            "coordinates" : [
              [
                [
                  101.00090026855469,
                  1.0
                ],
                ...

GeoJSON Spec

link: https://tools.ietf.org/html/rfc7946

Inside this document, the term "geometry type" refers to seven
case-sensitive strings: "Point", "MultiPoint", "LineString",
"MultiLineString", "Polygon", "MultiPolygon", and
"GeometryCollection".

WKT Spec

link: http://docs.opengeospatial.org/is/18-010r7/18-010r7.html

Keywords are case-insensitive. Where human readability of the string is important, as in this document, keywords are normally in upper case.

Proposed Fix

To adhere to the GeoJSON spec, it is important that proper capitalization
of the geometry types are used. The WKT spec is flexible, but since it is
common practice to capitalize the types for human-readability, it would be
nice to capitalize them in WKT as well.

@talevy talevy added :Analytics/Geo Indexing, search aggregations of geo points and shapes >bug labels Nov 25, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Geo)

imotov added a commit to imotov/elasticsearch that referenced this issue Dec 17, 2019
Switches generated WKT to upper case to
conform to the stanard recommendation.

Relates elastic#49568
imotov added a commit that referenced this issue Dec 18, 2019
Switches generated WKT to upper case to
conform to the standard recommendation.

Relates #49568
imotov added a commit that referenced this issue Dec 18, 2019
Switches generated WKT to upper case to
conform to the standard recommendation.

Relates #49568
imotov added a commit to imotov/elasticsearch that referenced this issue Dec 19, 2019
Switches generated WKT to upper case to conform to
the standard recommendation.

Closes elastic#49568
imotov added a commit that referenced this issue Dec 20, 2019
Switches generated GeoJson type names to camel case
to conform to the standard.

Closes #49568
imotov added a commit that referenced this issue Dec 20, 2019
Switches generated GeoJson type names to camel case
to conform to the standard.

Closes #49568
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this issue Jan 23, 2020
Switches generated WKT to upper case to
conform to the standard recommendation.

Relates elastic#49568
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this issue Jan 23, 2020
elastic#50400)

Switches generated GeoJson type names to camel case
to conform to the standard.

Closes elastic#49568
@MaxHammermann
Copy link

MaxHammermann commented Jan 16, 2022

Is this really fixed?
On ES 7.14 (Docker) I experience this;

    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 8.791588,
        "hits": [{
            "_index": "spatial_simplified_0.001_ogrimport_precision_ms",
            "_type": "_doc",
            "_id": "jE3kVH4BJtEXbncSuIz8",
            "_score": 8.791588,
            "_source": {
                "ogc_fid": 205,
                "geometry": {
                    "type": "multipolygon",
                    "coordinates": [
                        [
                            [
                                [-7.431,
                                    4.351
                                ],
                                [-7.442,
                                    4.348
                                ],
                                [-7.466,
                                    4.345
                                ],
                                [-7.498,
                                    4.347
                                ],
                                ...
                                [
                                ...
                                ]
                            ]
                        ]
                    ]
                }
            }
        }]
    }

@iverase
Copy link
Contributor

iverase commented Jan 17, 2022

Yes, this is fixed but you need to understand that this only works on generated geojson not in the geojson that you give to Elasticsearch. To explain that let's look at the following example. First we create an index:

PUT /example
{
  "mappings": {
    "properties": {
      "location": {
        "type": "geo_shape"
      }
    }
  }
}

When ingesting for example GeoJson, Elasticsearch is relax on the captilization of the geometry and accepts any. For example we ingest the following 4 points with different upper and lower case for Point:

POST /example/_doc
{
  "location" : {
    "type" : "point",
    "coordinates" : [-77.03653, 38.897676]
  }
}

POST /example/_doc
{
  "location" : {
    "type" : "POINT",
    "coordinates" : [-77.03653, 38.897676]
  }
}

POST /example/_doc
{
  "location" : {
    "type" : "Point",
    "coordinates" : [-77.03653, 38.897676]
  }
}

POST /example/_doc
{
  "location" : {
    "type" : "PoInT",
    "coordinates" : [-77.03653, 38.897676]
  }
}

The four points are ingested successfully and if we retrieve the documents using the default fetch strategy what Elasticsearch is doing is to return the original documents and therefore the original capitalisation:

GET example/_search

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "example",
        "_type" : "_doc",
        "_id" : "xcnpZn4BEFagotpY5560",
        "_score" : 1.0,
        "_source" : {
          "location" : {
            "type" : "point",
            "coordinates" : [
              -77.03653,
              38.897676
            ]
          }
        }
      },
      {
        "_index" : "example",
        "_type" : "_doc",
        "_id" : "xsnpZn4BEFagotpY8J4J",
        "_score" : 1.0,
        "_source" : {
          "location" : {
            "type" : "POINT",
            "coordinates" : [
              -77.03653,
              38.897676
            ]
          }
        }
      },
      {
        "_index" : "example",
        "_type" : "_doc",
        "_id" : "x8npZn4BEFagotpY-55A",
        "_score" : 1.0,
        "_source" : {
          "location" : {
            "type" : "Point",
            "coordinates" : [
              -77.03653,
              38.897676
            ]
          }
        }
      },
      {
        "_index" : "example",
        "_type" : "_doc",
        "_id" : "yMnqZn4BEFagotpYA57A",
        "_score" : 1.0,
        "_source" : {
          "location" : {
            "type" : "PoInT",
            "coordinates" : [
              -77.03653,
              38.897676
            ]
          }
        }
      }
    ]
  }
}

This is what you are seeing. If we change the strategy and use the recently added fields API, then we are asking Elasticsearch to generate the Geojson and therefore it comes with standard capitalisation:

GET example/_search
{
  "_source": false,
  "fields": ["location"]
}


{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "example",
        "_type" : "_doc",
        "_id" : "xcnpZn4BEFagotpY5560",
        "_score" : 1.0,
        "fields" : {
          "location" : [
            {
              "coordinates" : [
                -77.03653,
                38.897676
              ],
              "type" : "Point"
            }
          ]
        }
      },
      {
        "_index" : "example",
        "_type" : "_doc",
        "_id" : "xsnpZn4BEFagotpY8J4J",
        "_score" : 1.0,
        "fields" : {
          "location" : [
            {
              "coordinates" : [
                -77.03653,
                38.897676
              ],
              "type" : "Point"
            }
          ]
        }
      },
      {
        "_index" : "example",
        "_type" : "_doc",
        "_id" : "x8npZn4BEFagotpY-55A",
        "_score" : 1.0,
        "fields" : {
          "location" : [
            {
              "coordinates" : [
                -77.03653,
                38.897676
              ],
              "type" : "Point"
            }
          ]
        }
      },
      {
        "_index" : "example",
        "_type" : "_doc",
        "_id" : "yMnqZn4BEFagotpYA57A",
        "_score" : 1.0,
        "fields" : {
          "location" : [
            {
              "coordinates" : [
                -77.03653,
                38.897676
              ],
              "type" : "Point"
            }
          ]
        }
      }
    ]
  }
}

Hope this helps.

@MaxHammermann
Copy link

Thank you very much for clearing this up, this really helps!

Do you know if the default Kibana strategy uses the fields API?
If not, then I have to check my ingested values again, as I thought they are definetely indexed using the GeoJSON standard (by GDAL ogr2ogr). Therefor I thought I will get the original values, this being the capitalized type fields.

Thanks again! :)

@iverase
Copy link
Contributor

iverase commented Jan 18, 2022

Do you know if the default Kibana strategy uses the fields API?
Not sure what you mean here, are you referring from the maps application in kibana? I think they currently don't use it but I would expect maps to be able to handle geojson with different case to the standard.

@MaxHammermann
Copy link

Oh sorry, I was very unspecific right there.

I didn't mean the maps application but rather the standard search that is conducted using the search bar in the "discover" function. Because the results there seem to be correct (capitalized).
And as I am currently only able to achieve the correct results when following your approach using the fields API this could be an indication for Kibana using the same strategy in the "discover" function.

Nonetheless, I will have a look if my Spring Boot backend & the ElasticsearchRepository can adapt to the fields API in this purpose of retrieving GeoJSONs.

Thank you Ignacio!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes >bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants