Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cut over geo_point field and queries to new LatLonPoint type #20315

Merged
merged 2 commits into from
Sep 13, 2016

Conversation

nknize
Copy link
Contributor

@nknize nknize commented Sep 2, 2016

This PR cuts over geo_point fields to use Lucene's new point-based LatLonPoint type for indexes created in 5.0. Indexes created prior to 5.0 continue to use their respective encoding type. Below is a description of the changes made to support the new encoding type:

  • New indexes use a new LatLonPointFieldMapper which provides a parse method for the new type
  • The new LatLonPoint parse method removes support for deprecated lat_lon and geohash parameters
  • Backcompat testing for deprecated lat_lon and geohash parameters is added to all unit and integration tests
  • LatLonPointFieldMapper provides DocValues support (enabled by default) which uses Lucene's new LatLonDocValuesField type
  • New LatLonPoint field data classes are added for aggregation support (wraps LatLonPoint's Numeric Doc Values)
  • MultiFields use the geohash as the string value instead of the lat,lon string. This makes it easier to perform geo queries on the geohash instead of a useless lat,lon comma delimited string.

Removed Features:

  • With the removal of geohash indexing, GeoHashCellQuery support is removed for all new indexes (still supported on existing indexes)
  • LatLonPoint does not support a Distance Range query because it is super inefficient. Instead, geo_distance_range queries should be accomplished using either the geo_distance aggregation, sorting by descending distance on a geo_distance query, or a boolean must not using the excluded distance (distance_range query was syntactic sugar for this).

closes #20314

@nknize nknize added review WIP :Analytics/Geo Indexing, search aggregations of geo points and shapes :Search Foundations/Mapping Index mappings, including merging and defining field types v5.0.0-beta1 labels Sep 2, 2016
@clintongormley
Copy link

@colings86 @rjernst @mikemccand please could you review this - @jpountz is away

import java.io.IOException;

/**
* Created by nknize on 8/23/16.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change these boilerplates to a real javadoc?

@mikemccand
Copy link
Contributor

Thanks @nknize, I looked at the Lucene usage here and left a few small comments.

Are you planning on exposing LatLonPoint.newPolygonQuery and .newDistanceSort too?

LatLonPoint also offers very fast nearest neighbor search, but we should probably do that separately.

@nknize
Copy link
Contributor Author

nknize commented Sep 8, 2016

Thanks for the review @mikemccand! I'll address the feedback and update the PR.

Are you planning on exposing LatLonPoint.newPolygonQuery and .newDistanceSort too?

newPolygonQuery does need to be added to this PR, and I'll look at adding newDistanceSort but I don't think that one's a blocker?

I'm also planning to add nearest neighbor in another PR,.

@mikemccand
Copy link
Contributor

newPolygonQuery does need to be added to this PR, and I'll look at adding newDistanceSort but I don't think that one's a blocker?

++, these both can be done separately.

@nknize nknize force-pushed the latlonpoint_mapper branch 6 times, most recently from ab47328 to ba0b77d Compare September 9, 2016 20:54
@nknize nknize removed the WIP label Sep 9, 2016
@mikemccand
Copy link
Contributor

The Lucene parts LGTM, thanks @nknize!

I do think .newDistanceSort is important since the optimized implementation in Lucene is extremely fast, but this can be done separately.

@@ -143,7 +143,16 @@ public abstract Y build(BuilderContext context, String simpleName, MappedFieldTy
FieldMapper geoHashMapper, MultiFields multiFields, Explicit<Boolean> ignoreMalformed, CopyTo copyTo);

public Y build(Mapper.BuilderContext context) {
GeoPointFieldType geoPointFieldType = (GeoPointFieldType)fieldType;
// version 5.0 cuts over to LatLonPoint and no longer indexes geohash, or lat/lon separately
if (context.indexCreatedVersion().before(Version.V_5_0_0_alpha6)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this must be updated to onOrBefore

@s1monw
Copy link
Contributor

s1monw commented Sep 13, 2016

this look great, @nknize I notice that this removes tons of docs but doesn't add new docs how is this going to work? do we have to add docs later?

@s1monw
Copy link
Contributor

s1monw commented Sep 13, 2016

test this please

@nknize
Copy link
Contributor Author

nknize commented Sep 13, 2016

Thanks @s1monw! I talked to @clintongormley about updating docs in a separate PR to make it easier to review.

This commit cuts over geo_point fields to use Lucene's new point-based LatLonPoint type for indexes created in 5.0. Indexes created prior to 5.0 continue to use their respective encoding type. Below is a description of the changes made to support the new encoding type:

* New indexes use a new LatLonPointFieldMapper which provides a parse method for the new type
* The new LatLonPoint parse method removes support for lat_lon and geohash parameters
* Backcompat testing for deprecated lat_lon and geohash parameters is added to all unit and integration tests
* LatLonPointFieldMapper provides DocValues support (enabled by default) which uses Lucene's new LatLonDocValuesField type
* New LatLonPoint field data classes are added for aggregation support (wraps LatLonPoint's Numeric Doc Values)
* MultiFields use the geohash as the string value instead of the lat,lon string making it easier to perform geo string queries on the geohash instead of a lat,lon comma delimited string.

Removed Features:

* With the removal of geohash indexing, GeoHashCellQuery support is removed for all new indexes (still supported on existing indexes)
* LatLonPoint does not support a Distance Range query because it is super inefficient. Instead, the geo_distance_range query should be accomplished using either the geo_distance aggregation, sorting by descending distance on a geo_distance query, or a boolean must not of the excluded distance (which is what the distance_range query did anyway).

TODO:

* fix/finish yaml changes for plugin and rest integration tests
* update documentation
This commit removes documentation for:

* geohash cell query
* lat_lon parameter
* geohash parameter
* geohash_precision parameter
* geohash_prefix parameter

It also updates failing tests that reference these parameters for backcompat.
@mikemccand
Copy link
Contributor

@nknize I think the newPolygonQueryhere is inefficient in the multi-polygon case? It looks like (correct me if this is wrong!!) if e.g. a user queries on the Russia polygon (many top level polygons, each with many holes) that ES will make a very large BooleanQuery as a result, instead of passing that multi-polygon to LatLonPoint (which is quite a bit more efficient)?

Should I open a new issue to fix this?

@nknize
Copy link
Contributor Author

nknize commented Oct 6, 2016

Nice catch @mikemccand I agree the multi-poly should be passed to LatLonPoint. Go ahead and open the issue and I'll make the change. Thx!

@mikemccand
Copy link
Contributor

Thanks @nknize ... I opened #20789.

@clintongormley clintongormley mentioned this pull request Oct 8, 2016
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this pull request Oct 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes >feature :Search Foundations/Mapping Index mappings, including merging and defining field types v5.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cut over geo_point fields to use Lucene's new LatLonPoint on new indexes
5 participants