process.extract() with scorer partial_ratio returns wrong results #216

SujaySKumar · 2018-09-11T14:05:18Z

Correct answer to the following command should be 100.

>>> fuzz.partial_ratio("thane", "nation hospitality honda water thane thane west")
40

Removal of any word from the string nation hospitality honda water thane thane west results in the correct answer of 100.

This issue is reproducible in all installations (Irrespective of whether python-levenshtein is installed or not).
Versions:

fuzzywuzzy         0.16.0
python-levenshtein 0.12.0
Python 3.6

The text was updated successfully, but these errors were encountered:

josegonzalez · 2018-09-12T03:32:01Z

Is there a reason the partial ratio result should be 100? And can you add a failing test case to our test suite to prove this?

SujaySKumar · 2018-09-12T06:21:03Z

Yes. Since the shorter string is a substring of the longer string, partial_ratio should be 100. This is described in detail in github documentation as well as the blog

fuzz.ratio("YANKEES", "NEW YOR") ⇒ 14
fuzz.ratio("YANKEES", "EW YORK") ⇒ 28
fuzz.ratio("YANKEES", "W YORK ") ⇒ 28
fuzz.ratio("YANKEES", " YORK Y") ⇒ 28
...
fuzz.ratio("YANKEES", "YANKEES") ⇒ 100
and conclude that the last one is clearly the best. It turns out that “Yankees” and “New York Yankees” are a perfect partial match…the shorter string is a substring of the longer. We have a helper function for this too (and it’s far more efficient than the simplified algorithm I just laid out)
fuzz.partial_ratio("YANKEES", "NEW YORK YANKEES") ⇒ 100
fuzz.partial_ratio("NEW YORK METS", "NEW YORK YANKEES") ⇒ 69

josegonzalez · 2018-09-12T06:27:49Z

Do you mind adding the appropriate tests to test_fuzzywuzzy.py so that CI hits it and we can see the test fails?

SujaySKumar · 2018-09-14T06:49:35Z

CI passes since it uses Python 3.5.3. This issue seems to happen in Python 3.6 or even 3.5.6

josegonzalez · 2018-09-14T15:25:34Z

Mind filing a PR to use 3.5.6?

lisabutti · 2019-08-22T08:00:20Z

I can confirm that this also happens in Python 3.7

gw00207 · 2019-09-05T14:19:42Z

in python 3.7, a shorter example that gives the same result:

>>> fuzz.partial_ratio("thane", "t hosa na e thane ws")
40

Lychfindel · 2020-06-16T23:48:27Z

Is there any solution for this issue?

Lychfindel mentioned this issue Jun 17, 2020

Bug in fuzzywuzzy De-Qua/v4w_website#41

Closed

maxbachmann mentioned this issue Apr 2, 2023

replace python-Levenshtein with rapidfuzz seatgeek/thefuzz#10

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

process.extract() with scorer partial_ratio returns wrong results #216

process.extract() with scorer partial_ratio returns wrong results #216

SujaySKumar commented Sep 11, 2018

josegonzalez commented Sep 12, 2018

SujaySKumar commented Sep 12, 2018

josegonzalez commented Sep 12, 2018

SujaySKumar commented Sep 14, 2018

josegonzalez commented Sep 14, 2018

lisabutti commented Aug 22, 2019

gw00207 commented Sep 5, 2019 •

edited

Lychfindel commented Jun 16, 2020

process.extract() with scorer partial_ratio returns wrong results #216

process.extract() with scorer partial_ratio returns wrong results #216

Comments

SujaySKumar commented Sep 11, 2018

josegonzalez commented Sep 12, 2018

SujaySKumar commented Sep 12, 2018

josegonzalez commented Sep 12, 2018

SujaySKumar commented Sep 14, 2018

josegonzalez commented Sep 14, 2018

lisabutti commented Aug 22, 2019

gw00207 commented Sep 5, 2019 • edited

Lychfindel commented Jun 16, 2020

gw00207 commented Sep 5, 2019 •

edited