Feature Suggestion sort order matches by common letter count largest to smallest #280

robsomething · 2020-09-25T22:14:50Z

I am noticing that some of my matches in which I have one term as a subset of another term for partial_set_token come back with the non-optimal choice. For the sort order when having ties, there needs to be a better way that is independent of the order of the data. Perhaps using total common tokens (or letters).

"Company" and "Company 1" has a score of 100
"Company 1" and "Company 1" has a score of 100
It would seem that the second pairing would be the better match.

query = 'Company 2' choices = ['Company' ,'Company 1', 'Company 2', 'Awesome Company' ] process.extractOne(query, choices, scorer= fuzz.partial_token_set_ratio)

Out[72]: ('Company', 100)
The winner always seems to be the first in the list of choices. While one could order both lists before using the functions, that could create a different kind of bias in which we would never match to the appropriate choice when the tokens are in the middle of the choice string.

Similar behavior when using the partial_token_sort_ratio scorer.

The text was updated successfully, but these errors were encountered:

ZihangH mentioned this issue Dec 17, 2020

Implemented sort order matches by common letter count largest to smallest #295

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Suggestion sort order matches by common letter count largest to smallest #280

Feature Suggestion sort order matches by common letter count largest to smallest #280

robsomething commented Sep 25, 2020 •

edited

Feature Suggestion sort order matches by common letter count largest to smallest #280

Feature Suggestion sort order matches by common letter count largest to smallest #280

Comments

robsomething commented Sep 25, 2020 • edited

robsomething commented Sep 25, 2020 •

edited