Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to improve focus point queries #1567

Open
Joxit opened this issue Oct 15, 2021 · 1 comment
Open

How to improve focus point queries #1567

Joxit opened this issue Oct 15, 2021 · 1 comment

Comments

@Joxit
Copy link
Member

Joxit commented Oct 15, 2021

Use-cases

I tried to search the little city named Vars, Hautes-Alpes, France with a focus point, the request is /v1/autocomplete?lang=fr&focus.point.lat=48.03661925338169&focus.point.lon=6.580299229512&text=vars (focus point in France)

And the current result is

0) Varsovie, MZ, Pologne
1) Varsinais-Suomi, Finlande
2) Varsovie, Johns Creek, GA, USA
3) Ham-sous-Varsberg, France
4) Varsity Center, Lehigh Acres, FL, USA
5) Varsity Lakes, QLD, Australie
6) Varsovie, Throop, PA, USA
7) Varshets, MT, Bulgarie
8) Obshtina Varshets, MT, Bulgarie
9) Varsberg, France

This specific use case may be solved by #1202 but I think we can improve focus point too.

Attempted Solutions

Here is the current boosting system:
[
  {
    "function_score": {
      "query": { "match_all": {} },
      "functions": [
        {
          "weight": 15,
          "exp": {
            "center_point": {
              "origin": { "lat": 48.03661925338169, "lon": 6.580299229512 },
              "offset": "0km",
              "scale": "50km",
              "decay": 0.5
            }
          }
        }
      ],
      "score_mode": "avg",
      "boost_mode": "replace"
    }
  },
  {
    "function_score": {
      "query": { "match_all": {} },
      "max_boost": 20,
      "functions": [
        {
          "field_value_factor": {
            "modifier": "log1p",
            "field": "popularity",
            "missing": 1
          },
          "weight": 1
        }
      ],
      "score_mode": "first",
      "boost_mode": "replace"
    }
  },
  {
    "function_score": {
      "query": { "match_all": {} },
      "max_boost": 20,
      "functions": [
        {
          "field_value_factor": {
            "modifier": "log1p",
            "field": "population",
            "missing": 1
          },
          "weight": 3
        }
      ],
      "score_mode": "first",
      "boost_mode": "replace"
    }
  }
]

To summary, there are 3 boosts, focus, popularity and population.
The max value for focus is 16 (when then wanted point is the same as the focus point).
For popularity and population, the max value is 20.

This means, even when the wanted point has is max score, it will not exceed boosts from popularity/population.

I suggest that, when focus.point is present, we may reduce the max boost of popularity and population, the new max_boost can be between 8 and 12 for example.

I will try to draft a PR next week.

Do you have any examples of a query using a focus where the popular city should still be displayed in first position?

@orangejulius
Copy link
Member

Yeah, this is a good analysis. The other component involved in the scoring is, of course, the text match score. This is pretty complicated and depends on the number of terms in the input query, how the parser parsed things, the number of alt-names a record has, etc.

It might be interesting to try to estimate a "maximum" score for various text lengths and, if needed, different parsing scenarios. Then we would know all the components going into the total score and could attempt to balance them more appropriately.

Funnily enough, right now we can have two problems that are almost opposites, at the same time:

  • nearby, exact text matches can score below a far away populated/popular place that is a poor text match
  • distant, exact text matches for popular places can score below a nearby place with a good text match

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants