Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geo-search #44

Open
fulmicoton opened this issue Oct 17, 2016 · 16 comments
Open

Geo-search #44

fulmicoton opened this issue Oct 17, 2016 · 16 comments

Comments

@fulmicoton
Copy link
Collaborator

This would require a longlat field type... (and coords if we want simple 2D as well

Then the user would want to perform queries such as

  • return document at a distance of less than X km
  • return document within this polygon
@Fiedzia
Copy link
Contributor

Fiedzia commented Feb 22, 2017

Two more practical requirements that I would love to have:

  1. Store multiple values
  2. Return which value was found

those two things have a range of usecases that are very practical for me

@hwchen
Copy link

hwchen commented Oct 25, 2018

Curious about the status of this issue. Is this still a desired feature?

I have a need for geo queries in a future project, and would be willing to help implement, although I might need some guidance.

@fulmicoton
Copy link
Collaborator Author

@hwchen Sorry about the late answer

This is a desired feature but this is also a lot of work.
I'm thinking of following Lucene idea of using Bkd Tree as it fits well the search indexing scheme.

You can work on it if you are confident, but be careful... This is one of the hardest ticket around and
I won't be able to give much guidance.

@hwchen
Copy link

hwchen commented Nov 7, 2018

Thanks, I'll keep this in mind. Maybe in a year or so I'll pick this up if nobody else has :)

@natew
Copy link

natew commented Dec 3, 2019

Plus one on this feature, very relevant to any local app - Uber Eats, Yelp, Maps, Delivery.

@shuoli84
Copy link

FYI: did a quick search, found one nice bkd tree impl https://github.com/peermaps/eyros

@PSeitz
Copy link
Contributor

PSeitz commented Aug 12, 2021

Nice, rucene also has a bkd implementation https://github.com/zhihu/rucene/blob/master/src/core/util/bkd/bkd_writer.rs

@shuoli84
Copy link

FYI: did a quick search, found one nice bkd tree impl https://github.com/peermaps/eyros

Hmm, seems the license of eyros is not compatible with MIT.

@shuoli84
Copy link

Another FYI.

From https://www.elastic.co/blog/lucene-points-6.0, the blog which announced lucene spatial support, there is a quote on future work:

Currently, dimensional values must be single points, but some geo-spatial use cases require indexing shapes as well, which can be done with a generalization of the k-d tree called an R-Tree, for example, and I expect that will be a future improvement to Lucene (patches welcome again!).

And there is a rtree crate called rstar, which provides a good in mem rtree. https://crates.io/crates/rstar

@Jure-BB
Copy link

Jure-BB commented Jun 5, 2022

Point search specific methods are easier to implement since they don't require specialized trees and they also perform better. It would make sense to implement point search first as it is probably most commonly used, and implement indexing of shapes at a latter time.

Steps to index geo coordinates (points):

  1. Convert coordinates form double floating points to integers. If they are converted to 32bit (by multiplying with 10,000,000) we get resolution of about 1cm. OSM stores coordinates as 32 bit integers and Solr does too.
  2. Bit-interleave latitude and longitude into a single 64 bit integer. This value is the Morton code of a Z-order curve.
    (There are other space filling curves, like Hilbert curve, that give better spatial locality, but due to additional computational complexity it is questionable, if they would provide better performance overall. Morton code calculations are simpler and can also be accelerated with SIMD.)
  3. Index that 64-bit integer to any tree that can perform 1D range search (eg Btree)

Steps to query are a bit more complex:

  1. Get AABB (or any other shape) of the area we want to query
  2. Calculate where the Z-curve intersects with the AABB edges to get ranges we need to query. This is the complex part. Some names of these functions are BigMin/LitMax, NextJumpIn, GetNextZ-Address. In a more recent paper[1] I saw a method of using quadtrees to calculate these regions, and if I remember correctly, it also performed a bit better.
  3. Merge ranges into longer continuous ranges where possible.
  4. Query ranges to get the Morton codes of the points inside that area.
  5. Since, ranges intersecting with the query-shape can cover a bit larger area that we need (in order too keep the number of ranges to a sensible amount), we need to filter resulting points, to exclude points that falls outside of exact query-area. Morton codes can be converted back to latitude and longitude coordinates in order to perform that filtering.

[1] I can look-up the name of paper using quadtrees to calculate ranges, if anyone is interested.

@umutbasal
Copy link

@adamreichold
Copy link
Collaborator

See #2130 for a version using in-memory rstar indexes and warmers spelled out. If there is sufficient interested, I would be willing to start an add-on crate using that approach.

(I am also working on flat K-D trees and R trees which could be memory-mapped directly from their on-disk representation avoiding the in-memory requirement. They do come from a simulation background though, i.e. generic N dimensional geometry. I guess one would want to specialize on Hilbert-packed two-dimensional trees for geographic search.)

@nolotz
Copy link

nolotz commented Jul 21, 2024

Hi everyone,

@adamreichold , have you or anyone else made any progress on this topic?

@Swoorup
Copy link

Swoorup commented Aug 15, 2024

Point search specific methods are easier to implement since they don't require specialized trees and they also perform better. It would make sense to implement point search first as it is probably most commonly used, and implement indexing of shapes at a latter time.

Steps to index geo coordinates (points):

  1. Convert coordinates form double floating points to integers. If they are converted to 32bit (by multiplying with 10,000,000) we get resolution of about 1cm. OSM stores coordinates as 32 bit integers and Solr does too.
  2. Bit-interleave latitude and longitude into a single 64 bit integer. This value is the Morton code of a Z-order curve.
    (There are other space filling curves, like Hilbert curve, that give better spatial locality, but due to additional computational complexity it is questionable, if they would provide better performance overall. Morton code calculations are simpler and can also be accelerated with SIMD.)
  3. Index that 64-bit integer to any tree that can perform 1D range search (eg Btree)

Steps to query are a bit more complex:

  1. Get AABB (or any other shape) of the area we want to query
  2. Calculate where the Z-curve intersects with the AABB edges to get ranges we need to query. This is the complex part. Some names of these functions are BigMin/LitMax, NextJumpIn, GetNextZ-Address. In a more recent paper[1] I saw a method of using quadtrees to calculate these regions, and if I remember correctly, it also performed a bit better.
  3. Merge ranges into longer continuous ranges where possible.
  4. Query ranges to get the Morton codes of the points inside that area.
  5. Since, ranges intersecting with the query-shape can cover a bit larger area that we need (in order too keep the number of ranges to a sensible amount), we need to filter resulting points, to exclude points that falls outside of exact query-area. Morton codes can be converted back to latitude and longitude coordinates in order to perform that filtering.

[1] I can look-up the name of paper using quadtrees to calculate ranges, if anyone is interested.

what is the name of the paper, I am interested.

@Swoorup
Copy link

Swoorup commented Aug 16, 2024

@Swoorup

It's Using a Space Filling Curve for the Management of Dynamic Point Cloud Data in a Relational DBMS (2016)

There's also a source code: https://github.com/stpsomad/DynamicPCDMS

https://isprs-annals.copernicus.org/articles/IV-2-W1/107/2016/isprs-annals-IV-2-W1-107-2016.pdf

Slides: https://pdfs.semanticscholar.org/287b/2fdad23e6110d4a524efcd87c70f8696a15a.pdf

Another paper I also found interesting/helpful is: Towards a Painless Index for Spatial Objects

Thanks. One thing that is not immediately obvious to me, is storing not just points but objects with AABB. The literature around appears to be either storing all the morton points that intersects against the AABB perhaps directly as points or morton ranges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests