Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify JaroWinkler boosting to match behaviour of jellyfish algorithm #76

Merged
merged 2 commits into from Oct 27, 2021

Conversation

juliangilbey
Copy link
Contributor

Jellyfish has recently modified its JaroWinkler algorithm to allow for boosting even when one of the strings is shorter than 4 characters: jamesturk/jellyfish@87f9679. It is very unclear whether this is a good idea or not. But as it is, the tests now fail, as the internal and external algorithms give different results on a pair of strings such as ":" and ":0".

This patch replicates the change that jellyfish has made, which will then allow the external tests to pass once again. It also modifies the expected value of the comparison "fog" and "frog" to match this new algorithm behaviour.

If you do not wish to apply this patch, then the external tests will need modifying to exclude the case where either of the strings has length < 4.

@juliangilbey
Copy link
Contributor Author

Argh, no, please don't apply this yet. While the idea is right, there are some other tests that currently fail (namely, left="0", right="00"). I don't have time to fix it today... will get back to you soon.

@orsinium
Copy link
Member

Deal. Thank you for working on it :)

@juliangilbey
Copy link
Contributor Author

Ah, it turns out to be an issue with jellyfish; I've just reported it there: jamesturk/jellyfish#147 I'll keep this PR updated when things get resolved there.

@juliangilbey
Copy link
Contributor Author

OK, jellyfish has now been updated to version 0.8.9 fixing this issue. This PR (against textdistance) now works correctly. I don't know whether there's an easy way to say that with this patch, version 0.8.9 of jellyfish is required for the JaroWinkler tests to pass.

@orsinium
Copy link
Member

Great! Thank you.

@orsinium orsinium merged commit 5785d46 into life4:master Oct 27, 2021
@juliangilbey juliangilbey deleted the jarowinkler-jellyfish branch September 4, 2022 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
2 participants