Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_geoPoint sorting not working #1872

Closed
1 of 3 tasks
otuerk opened this issue Nov 3, 2021 · 12 comments
Closed
1 of 3 tasks

_geoPoint sorting not working #1872

otuerk opened this issue Nov 3, 2021 · 12 comments
Assignees
Labels
bug Something isn't working as expected milli Related to the milli workspace v0.26.0 PRs/issues solved in v0.26.0
Milestone

Comments

@otuerk
Copy link

otuerk commented Nov 3, 2021

Describe the bug
The result is not sorted by distance to the given lat/lng when using _geoPoint sorting function

To Reproduce
Use the following data

[
   {
      "id":"1",
      "_geo":{
         "lat":48.2187834,
         "lng":16.392127
      }
   },
   {
      "id":"2",
      "_geo":{
         "lat":48.26579,
         "lng":16.4281601
      }
   },
   {
      "id":"3",
      "_geo":{
         "lat":48.19431,
         "lng":16.4071101
      }
   },
   {
      "id":"4",
      "_geo":{
          "lat":48.22919,
          "lng":16.3679401
      }
   }
]

When using the _geoPoint sort function

curl -X POST 'http://localhost:7700/indexes/test/search' -H 'Content-type:application/json' --data-binary '{ "sort": ["_geoPoint(48.2342154, 16.4205687):asc"] }'

You get the following result:

{
   "hits":[
      {
         "id":"1",
         "_geo":{
            "lat":48.2187834,
            "lng":16.392127
         },
         "_geoDistance":2717
      },
      {
         "id":"2",
         "_geo":{
            "lat":48.26579,
            "lng":16.4281601
         },
         "_geoDistance":3556
      },
      {
         "id":"3",
         "_geo":{
            "lat":48.19431,
            "lng":16.4071101
         },
         "_geoDistance":4548
      },
      {
         "id":"4",
         "_geo":{
            "lat":48.22919,
            "lng":16.3679401
         },
         "_geoDistance":3938
      }
   ],
   "nbHits":4,
   "exhaustiveNbHits":false,
   "query":"",
   "limit":20,
   "offset":0,
   "processingTimeMs":0
}

Which is wrong since the object with ID 3 is returned before the one with ID 4, although it's further away.

Expected behavior
Expected output would be:

  • ID 1, Distance: 2717
  • ID 2, Distance: 3556
  • ID 4, Distance: 3938
  • ID 3, Distance: 4548

MeiliSearch version: 0.23.1


EDIT from @irevoire

Steps to REALLY sort the documents by geo distances.

@curquiza
Copy link
Member

curquiza commented Nov 3, 2021

Hello @otuerk

Thanks for your report!
This is a known issue, sorry to not have reported it publically. This problem is due to the lack of accuracy of the RTree, the structure we use to implement the geo-search. RTrees work in a 2D map, we use them to apply the sort, and then we re-calculate the real distance to make them work on a spherical map (the Earth). These lead to imprecisions, sorry for this.

Implementing this feature request should solve part of the problem you reported.

Thanks again for this feedback!

@curquiza curquiza added the bug Something isn't working as expected label Nov 3, 2021
@otuerk
Copy link
Author

otuerk commented Nov 3, 2021

Thanks for the explanation @curquiza!

Is it planned to make the sort more accurate in a future release?

@curquiza
Copy link
Member

curquiza commented Nov 3, 2021

@meilisearch/product-team can tell you more about what we plan in the future :)

@gmourier
Copy link
Member

gmourier commented Nov 16, 2021

Hello @otuerk 👋! Thanks for the feedback and sorry for the late reply!

Yes, we are planning to fix this problem. We're sorry to see this happen.

We have been thinking about several solutions like the one presented by @curquiza. We will decide in the next few days which one to implement and how far we will go if we add this type of behavior (I'm thinking of exposing an additional parameter that allows you to set the rate difference if we take the previous example).

However, the presented solution might not be suitable for you if you need to have accurate sorting on very close distances.

For example, if 2 documents are 200m apart and are placed in the same bucket because of this rate difference, they will have to be separated by other criteria. Since the search request presented here does not contain a q parameter and has no other sort rule parameter, the engine will not be able to separate the two documents. What do you think @ManyTheFish?

@irevoire
Copy link
Member

I would prefer if meilisearch could provide a good default for the rate difference.
But if we provide a setting to let the user choose its rate difference we would have overall a better experience for everyone.
And for the few people who want really accurate distance, they would be able to set the rate to 0% 🤔

@otuerk
Copy link
Author

otuerk commented Nov 16, 2021

Hi @gmourier and thank you for your explanation again.

Just as a feedback from my side: For my use-case it's necessary to sort documents (each document represents a store in my case) pretty accurately since almost all data are located in the same city and sometimes very close by. I understand that the proposed solution might be a workaround, but I think that when you look at this feature in the docs you would expect it to sort accuartely and not just approximately.

Right now the only way to get accurate data in my case is by doing the distance calculation directly via a DB query. But obviously this is much slower than what Meili returns.

So I would really love to see an accurate way of sorting documents by distance - even if they are close by.

@gmourier
Copy link
Member

Thank you! Can your end-users (if there are any) interact with an interface to search for stores by name, brand, or filter by other criteria?

@irevoire
Copy link
Member

irevoire commented Nov 16, 2021

So yeah, in reality, there are two problems.

The first one (which I think can be fixed quite easily) is an error of approximation with the data structure we use to store the geopoint.
The problem you are seeing is actually entirely related to this problem. Our data structure returns points in the wrong order and with bad distances, and then I fix the distance for you, but they are not sorted currently.

If this was implemented, your issue would disappear.

The second problem we have, and that doesn’t seem to be an issue for you, is that currently, when you sort with a _geoPoint, you are only sorting your documents by geo distances and nothing more.
So if you have three documents looking like that:

[
  {
    "id": 0,
    "_geo": { "lat": 1, "lng": 0 },
    "food": "kebab",
  },
  {
    "id": 1,
    "_geo": { "lat": 1.5, "lng": 0 },
    "food": "pizza",
  },
  {
    "id": 2,
    "_geo": { "lat": 2, "lng": 0 },
    "food": "kebab",
  },
  {
    "id": 3,
    "_geo": { "lat": 5, "lng": 0 },
    "food": "kebab",
  }
]

And your user sends this request:

{
  "q": "kebab",
  "sort": "_geoPoint(0, 0):asc"
}

He would get the three documents in the following order: [0, 1, 2, 3].
And we (= the meilisearch team) think you should get your documents in this order: [0, 2, 1, 3].
Since the 0 and 2 are quite close and contains the word kebab, we think most users would prefer to get the two kebab first and then the pizza and finally the last kebab since it’s quite far away.

If this doesn’t work for you, I would love to hear what is your use case! 😁

@otuerk
Copy link
Author

otuerk commented Nov 16, 2021

He would get the three documents in the following order: [0, 1, 2, 3]. And we (= the meilisearch team) think you should get your documents in this order: [0, 2, 1, 3]. Since the 0 and 2 are quite close and contains the word kebab, we think most users would prefer to get the two kebab first and then the pizza and finally the last kebab since it’s quite far away.

I get your point that prefering the documents containing kebab totally makes sense in this case since you provided a query that narrows down the search even more and pizza results might not be important in this case.

But the case where I would like to use this feature is that on our landing page there's a map that shows stores close to your current location. Underneath the map there's a short list of 5 results closest to you and a button to paginate more results. There's no additional query here. The problem is that the result from Meili is not ordered properly (it doesn't make sense that the result with a distance of 4548 is sorted before the one with 3938).

Of course I could order the results on the client side but when you use pagination that does not work anymore since you might get results that should actually have been sorted before the results you already have. E.g:

Initial load returns results with distance (let's assume the distance is correct)

100
200
300
400
500

But the next pagination request to the server returns:

350
600
700
800
900

Now 350 should have been in the initial result, and re-sorting the whole result on the client side now is completely confusing for the user. And this is what actually happens when we use Meili for that kind of distance sorting.

And now back to the example you provided, I would assume that even if I provide an additional query - if there are enough results that e.g. pagination is a thing you'd have the same problem again.

@irevoire
Copy link
Member

Ok thank you, I guess we would need another query setting as @gmourier was saying to let you say you want to order documents with 0% of imprecision.

it doesn't make sense that the result with a distance of 4548 is sorted before the one with 3938

And this is our fault but I think it could be easily fixed when I have the time 😁

@curquiza curquiza added this to the v0.26.0 milestone Dec 7, 2021
@irevoire
Copy link
Member

Hello @otuerk, I've been working on your bug.
The feature we were chatting just above was not implemented currently but your bug should be fixed now 🎉

If you have the time I would love it if you could run your test again and tell me if it works with this branch: https://github.com/meilisearch/MeiliSearch/tree/fix_geo_search

You'll need to re-index all your documents from a new data.ms or use a dump to use this branch since it's breaking the way we store the geoPoints in the database.

bors bot added a commit to meilisearch/milli that referenced this issue Jan 10, 2022
424: Store the geopoint in three dimensions r=Kerollmops a=irevoire

Related to this issue: meilisearch/meilisearch#1872

Fix the whole computation of distance for any “geo” operations (sort or filter). Now when you sort points they are returned to you in the right order.
And when you filter on a specific radius you only get points included in the radius.

This PR changes the way we store the geo points in the RTree.
Instead of considering the latitude and longitude as orthogonal coordinates, we convert them to real orthogonal coordinates projected on a sphere with a radius of 1.
This is the conversion formulae.
![image](https://user-images.githubusercontent.com/7032172/145990456-eefe840a-384f-4486-848b-81d0036814ec.png)
Which, in rust, translate to this function:
```rust
pub fn lat_lng_to_xyz(coord: &[f64; 2]) -> [f64; 3] {
    let [lat, lng] = coord.map(|f| f.to_radians());
    let x = lat.cos() * lng.cos();
    let y = lat.cos() * lng.sin();
    let z = lat.sin();

    [x, y, z]
}
```

Storing the points on a sphere is easier / faster to compute than storing the point on an approximation of the real earth shape.
But when we need to compute the distance between two points we still need to use the haversine distance which works with latitude and longitude.
So, to do the fewest search-time computation possible I'm now associating every point with its `DocId` and its lat/lng.

Co-authored-by: Tamo <tamo@meilisearch.com>
@curquiza curquiza added the milli Related to the milli workspace label Jan 24, 2022
@curquiza
Copy link
Member

curquiza commented Feb 2, 2022

Fixed by #2005 that bump the milli dependency to milli v0.22.0 containing the fix of this issue 🙂

The bug fix will be released in Meilisearch v0.26.0.

@curquiza curquiza closed this as completed Feb 2, 2022
@curquiza curquiza added the v0.26.0 PRs/issues solved in v0.26.0 label Aug 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected milli Related to the milli workspace v0.26.0 PRs/issues solved in v0.26.0
Projects
None yet
Development

No branches or pull requests

4 participants