Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Suggestion sort order matches by common letter count largest to smallest #280

Open
robsomething opened this issue Sep 25, 2020 · 0 comments

Comments

@robsomething
Copy link

robsomething commented Sep 25, 2020

I am noticing that some of my matches in which I have one term as a subset of another term for partial_set_token come back with the non-optimal choice. For the sort order when having ties, there needs to be a better way that is independent of the order of the data. Perhaps using total common tokens (or letters).

"Company" and "Company 1" has a score of 100
"Company 1" and "Company 1" has a score of 100
It would seem that the second pairing would be the better match.

query = 'Company 2' choices = ['Company' ,'Company 1', 'Company 2', 'Awesome Company' ] process.extractOne(query, choices, scorer= fuzz.partial_token_set_ratio)

Out[72]: ('Company', 100)
The winner always seems to be the first in the list of choices. While one could order both lists before using the functions, that could create a different kind of bias in which we would never match to the appropriate choice when the tokens are in the middle of the choice string.

Similar behavior when using the partial_token_sort_ratio scorer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant