Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

process.extract() with scorer partial_ratio returns wrong results #216

Open
SujaySKumar opened this issue Sep 11, 2018 · 8 comments
Open

Comments

@SujaySKumar
Copy link

Correct answer to the following command should be 100.

>>> fuzz.partial_ratio("thane", "nation hospitality honda water thane thane west")
40

Removal of any word from the string nation hospitality honda water thane thane west results in the correct answer of 100.

This issue is reproducible in all installations (Irrespective of whether python-levenshtein is installed or not).
Versions:

fuzzywuzzy         0.16.0
python-levenshtein 0.12.0
Python 3.6
@josegonzalez
Copy link
Contributor

Is there a reason the partial ratio result should be 100? And can you add a failing test case to our test suite to prove this?

@SujaySKumar
Copy link
Author

Yes. Since the shorter string is a substring of the longer string, partial_ratio should be 100. This is described in detail in github documentation as well as the blog

fuzz.ratio("YANKEES", "NEW YOR") ⇒ 14
fuzz.ratio("YANKEES", "EW YORK") ⇒ 28
fuzz.ratio("YANKEES", "W YORK ") ⇒ 28
fuzz.ratio("YANKEES", " YORK Y") ⇒ 28
...
fuzz.ratio("YANKEES", "YANKEES") ⇒ 100
and conclude that the last one is clearly the best. It turns out that “Yankees” and “New York Yankees” are a perfect partial match…the shorter string is a substring of the longer. We have a helper function for this too (and it’s far more efficient than the simplified algorithm I just laid out)
fuzz.partial_ratio("YANKEES", "NEW YORK YANKEES") ⇒ 100
fuzz.partial_ratio("NEW YORK METS", "NEW YORK YANKEES") ⇒ 69

@josegonzalez
Copy link
Contributor

Do you mind adding the appropriate tests to test_fuzzywuzzy.py so that CI hits it and we can see the test fails?

@SujaySKumar
Copy link
Author

CI passes since it uses Python 3.5.3. This issue seems to happen in Python 3.6 or even 3.5.6

@josegonzalez
Copy link
Contributor

Mind filing a PR to use 3.5.6?

@lisabutti
Copy link

I can confirm that this also happens in Python 3.7

@gw00207
Copy link

gw00207 commented Sep 5, 2019

in python 3.7, a shorter example that gives the same result:

>>> fuzz.partial_ratio("thane", "t hosa na e thane ws")
40

@Lychfindel
Copy link

Is there any solution for this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants