Wired behavior of partial_ratio #313

sillybun · 2021-05-02T11:07:41Z

In [47]: fuzzywuzzy.fuzz.partial_ratio("red", "random")                        
Out[47]: 33

In [48]: fuzzywuzzy.fuzz.partial_ratio("rod", "random")                        
Out[48]: 33

In [49]: fuzzywuzzy.fuzz.partial_ratio("prod", "random")                       
Out[49]: 25

In [50]: fuzzywuzzy.fuzz.partial_ratio("pred", "random")                       
Out[50]: 50

why "pred" is more similar to "random" than "prod"?

The text was updated successfully, but these errors were encountered:

maxbachmann · 2021-05-02T11:39:42Z

This is a known issue in python-Levenshtein: #79
In your case for the comparision of

"prod" <-> "random"

the following alignment is used:

"prod" <-> "ndom"

which has a similarity of 25.
However the optimal alignment would be:

"prod" <-> "rand"

which has a similarity of 50.
In FuzzyWuzzy you will get the correct result when the slower difflib based implementation is used:

>>> from fuzzywuzzy import fuzz
>>> from difflib import SequenceMatcher
>>> fuzz.SequenceMatcher = SequenceMatcher
>>> fuzzywuzzy.fuzz.partial_ratio("prod", "random") 
50

sillybun mentioned this issue May 2, 2021

Different behavior with fuzzywuzzy rapidfuzz/RapidFuzz#102

Closed

maxbachmann mentioned this issue Apr 2, 2023

replace python-Levenshtein with rapidfuzz seatgeek/thefuzz#10

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wired behavior of partial_ratio #313

Wired behavior of partial_ratio #313

sillybun commented May 2, 2021

maxbachmann commented May 2, 2021

Wired behavior of partial_ratio #313

Wired behavior of partial_ratio #313

Comments

sillybun commented May 2, 2021

maxbachmann commented May 2, 2021