Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use cross_fields type for multi_match #116

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

missinglink
Copy link
Member

I noticed a bug today where which causes admin_multi_match queries to score results incorrectly for multi-term admin queries.

I'll attach a simple test-case below, prior to this PR we were using the default best_fields type which would essentially ignore the "Eureka" part of the query, meaning that the results were returned in the wrong order:

1) Target, California, MD, USA
2) Target, Eureka, CA, USA

By changing this to cross_fields we are able to use the combined score across all fields, which in-turn scores the query correctly, returning the "Eureka" result first.

{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "name.default": {
                            "analyzer": "peliasQuery",
                            "query": "Target"
                        }
                    }
                },
                {
                    "multi_match": {
                        "fields": [
                            "parent.country^1",
                            "parent.region^1",
                            "parent.county^1",
                            "parent.localadmin^1",
                            "parent.locality^1",
                            "parent.borough^1",
                            "parent.neighbourhood^1",
                            "parent.region_a^1"
                        ],
                        "type": "best_fields",
                        "query": "Eureka California",
                        "analyzer": "peliasAdmin",
                        "cutoff_frequency": 0.01
                    }
                }
            ]
        }
    },
    "size": 20
}

@missinglink
Copy link
Member Author

This issue is covered by a failing test case which indicates that the text length possibly has something to do with how the best_fields calculation was being made:

search_poi
  ✔ [searchpoi-1] "/v1/search?text=Target Eureka CA"
  ✘ regression [searchpoi-2] "/v1/search?text=Target Eureka California": score 4 out of 5
  diff:
    priorityThresh is 1 but found at position 2

missinglink added a commit to pelias/api that referenced this pull request Nov 14, 2019
@missinglink
Copy link
Member Author

missinglink commented Nov 14, 2019

I added a second commit to disable cutoff_frequency, I would expect this setting to have no effect on the sorting of results, but in fact it does!

Below is a minimal test-case which shows that removing cutoff_frequency changes the order of results.

@orangejulius any idea what's going on here? I read the docs for multi_match cross_fields and cutoff_frequency several times and still can't figure out what's going on..

I'm assuming it's something to do with mixing multi_match and cutoff_frequency.
If we decide that it's not safe to use the two together then we might have to put a check in pelias/query to prevent them from being used together? or maybe it's only triggered (or noticeable) when cross_fields is used?

{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "name.default": {
                            "analyzer": "peliasQuery",
                            "query": "Target",
                            "cutoff_frequency": 0.01
                        }
                    }
                },
                {
                    "multi_match": {
                        "fields": [
                            "parent.country^1",
                            "parent.region^1",
                            "parent.county^1",
                            "parent.localadmin^1",
                            "parent.locality^1",
                            "parent.borough^1",
                            "parent.neighbourhood^1",
                            "parent.region_a^1"
                        ],
                        "type": "cross_fields",
                        "query": "Eureka California",
                        "analyzer": "peliasAdmin",
                        "cutoff_frequency": 0.01
                    }
                }
            ]
        }
    },
    "size": 20
}

@missinglink
Copy link
Member Author

I tested this PR on our staging environment and it caused 0 regressions and 1 improvement:

✔ improvement [searchpoi-2] "/v1/search?text=Target Eureka California"

missinglink added a commit to pelias/api that referenced this pull request Nov 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant