Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match threshold #4

Closed
danyal opened this issue Jan 8, 2013 · 8 comments
Closed

Match threshold #4

danyal opened this issue Jan 8, 2013 · 8 comments

Comments

@danyal
Copy link

danyal commented Jan 8, 2013

Could you add an optional parameter to find that allows you to set a match threshold? I am looking for matches that are only really close vs. getting matches that are .2 or .3 in similarity. By setting the threshold I could eliminate anything that isn't almost exactly a match.

@seamusabshere
Copy link
Owner

hey @danyal i'll look into this ASAP

@seamusabshere
Copy link
Owner

hey @danyal we're trying out a similar feature in #3

gem 'fuzzy_match', github: 'seamusabshere/fuzzy_match', branch: 'find_all_with_score'

you get 2 scores for every record (because sometimes pair distance aka dice's coefficient can't tell things apart)

fz = FuzzyMatch.new [...]
fz.find_all_with_score('foobar').each do |record, dice_similar, leven_similar|
  [...]
end

it returns all scores, so you can do the threshold yourself:

fz.find_all_with_score('foobar').select do |record, dice_similar, leven_similar|
  dice_similar > 0.3
end

is this sufficient for your needs? if so, i'll put it in a new gem release

@danyal
Copy link
Author

danyal commented Jan 9, 2013

Hi Seamus,

Yup that would serve my purpose though I think it would be cleaner if find
took an optional threshold param and then on line 264 in fuzzy_match.rb you
could change the "> 0" to "> threshold". Either way works though.

Thanks,
Danyal

On Wed, Jan 9, 2013 at 7:40 AM, Seamus Abshere notifications@github.comwrote:

hey @danyal https://github.com/danyal we're trying out a similar
feature in #3 #3

gem 'fuzzy_match', github: 'seamusabshere/fuzzy_match', branch: 'find_all_with_score'

you get 2 scores for every record (because sometimes pair distance aka
dice's coefficient can't tell things apart)

fz = FuzzyMatch.new [...]
fz.find_all_with_score('foobar').each do |record, dice_similar, leven_similar|
[...]
end

it returns all scores, so you can do the threshold yourself:

fz.find_all_with_score('foobar').select do |record, dice_similar, leven_similar|
dice_similar > 0.3
end

is this sufficient for your needs? if so, i'll put it in a new gem release


Reply to this email directly or view it on GitHubhttps://github.com//issues/4#issuecomment-12049657.

@seamusabshere
Copy link
Owner

hey @danyal how's this lookin' 8e11cfe

@danyal
Copy link
Author

danyal commented Jan 10, 2013

Looks great! Thanks so much.

On Wed, Jan 9, 2013 at 6:29 PM, Seamus Abshere notifications@github.comwrote:

hey @danyal https://github.com/danyal how's this lookin' 8e11cfehttps://github.com/seamusabshere/fuzzy_match/commit/8e11cfe0628c15b309a1f8a3137f5ba8544ed51d


Reply to this email directly or view it on GitHubhttps://github.com//issues/4#issuecomment-12077940.

@seamusabshere
Copy link
Owner

@danyal can you use this as a github branch? if not, i can rush a gem release.

@danyal
Copy link
Author

danyal commented Jan 10, 2013

I can use the branch, no need to rush.

On Wed, Jan 9, 2013 at 7:47 PM, Seamus Abshere notifications@github.comwrote:

@danyal https://github.com/danyal can you use this as a github branch?
if not, i can rush a gem release.


Reply to this email directly or view it on GitHubhttps://github.com//issues/4#issuecomment-12079549.

@seamusabshere
Copy link
Owner

fixed since version 1.4.1 i believe (c2e6f3e)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants