Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Levenstein distance for better matching with regexp #21

Closed
wants to merge 1 commit into from
Closed

Levenstein distance for better matching with regexp #21

wants to merge 1 commit into from

Conversation

radarek
Copy link

@radarek radarek commented Feb 4, 2011

I added levenstein calculation for matching with regexp because it returned first matching country and sometimes there was better choice. For example:

old behaviour:

Carmen.country_code("Pol")
=> "PF" # French Polynesia

new behaviour:

Carmen.country_code("Pol")
=> "PL" # Poland

Consider adding merging it to master.
Thank you.

@jim
Copy link
Collaborator

jim commented Feb 8, 2011

This is very cool.

I'm trying to think about when this sort of matching would be the most useful. Can you tell me how you're using it in your app? I'm thinking it might be better as another method or as an optional flag to this method.

I know the regex matching is already in the library, but I'm not totally sure it should be either.

@radarek
Copy link
Author

radarek commented Feb 8, 2011

The point of that change is that using current regexp matching mechanism it returns first match which is not always appropriate. Using levenstein algorithm we can increase probability that returned country is better for given value.

I think it's useful when input values come from 3rd party database/datastore/website etc and we can't predict all values, specially shortcuts and typos. For example one can write "Cannada" or "Canda" and we can guess that it's probably "Canada" (levenstein is good for that case). I also gave example with "Pol" where I expected (maybe because I'm from Poland :D) that "Poland" is better than "French Polynesia".

If you want to add it as separate method then I encourage you to move with all regexp matching part because IMHO matching with regexp and choosing first match isn't good.

Hope it helped :).

@jim
Copy link
Collaborator

jim commented Feb 20, 2011

Makes sense. I'm going to move the regex stuff into an optional module, and then I'll incorporate this into it as well.

@nengxu
Copy link
Contributor

nengxu commented Jun 24, 2011

+1

@jim
Copy link
Collaborator

jim commented Mar 6, 2014

Closing this as it's been open for 3 years. Smarter fuzzy matching may come at some point in the future if there is demand for it.

@jim jim closed this Mar 6, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants