Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sift3 string distance in IE7 is invalid. #14

Closed
DimitarChristoff opened this issue Mar 23, 2012 · 9 comments
Closed

sift3 string distance in IE7 is invalid. #14

DimitarChristoff opened this issue Mar 23, 2012 · 9 comments

Comments

@DimitarChristoff
Copy link

This got caught by our QA team:

DimitarChristoff#1

compare:
String.distance('gmails.com', 'aol.co.uk')

Expected / seen: 8.5 in major browsers, actual: 0.5 in IE7
As a result, in IE7:

Expect user@gmails.com' to suggestuser@gmail.com, actual:user@aol.co.ukin our DB Expectuser@gmails.com' to suggest user@gmail.com, actual: user@google.com in your kicksend DB

As a workaround, I have now added levenstein to calculate the distance instead, seems far more reliable. https://github.com/DimitarChristoff/mailcheck/blob/master/src/String.distance.js#L71-93 or google for other implementations.

This is now covered by my test coverage and buster.js reports the following tests in Ie7 (they pass in all other browsers):

Internet Explorer 7.0 Windows: .FFF..........FFFFFF............
Failure: Internet Explorer 7.0 Windows String.distance tests Expect distance between two strings with 1 typo to be 1
    [assert.equals] 0 expected to be equal to 1

Failure: Internet Explorer 7.0 Windows String.distance tests Expect distance between two strings with 1 typo and 1 char difference to be 1.5
    [assert.equals] 0.5 expected to be equal to 1.5

Failure: Internet Explorer 7.0 Windows String.distance tests Expect distance between two unrelated strings to be length of base string
    [assert.equals] 0 expected to be equal to 5

Failure: Internet Explorer 7.0 Windows mailcheck.mootools distance tests Working with emails > Expect a typo in domain to produce a suggestion (gnail.com -> gmail.com)
    [assert.equals] undefined expected to be equal to gmail.com

Failure: Internet Explorer 7.0 Windows mailcheck.mootools distance tests Working with emails > Expect a typo in domain to produce a suggestion with custom threshold of 3 (gmail.org -> gmail.com)
    [assert.equals] undefined expected to be equal to gmail.com

Failure: Internet Explorer 7.0 Windows mailcheck.mootools distance tests Working with emails > Expect uppercase user input not to matter to suggestions
    [assert.equals] fsmail.net expected to be equal to hotmail.com

Failure: Internet Explorer 7.0 Windows mailcheck.mootools distance tests Working with emails > Expect obscure RFC compatible emails like "foo@bar"@gnail.com to produce a valid suggestion
    [assert.equals] undefined expected to be equal to gmail.com

Failure: Internet Explorer 7.0 Windows mailcheck.mootools distance tests Working with emails > Expect cache to store look-up for faster future reference
    [assert.equals] false expected to be equal to gmail.com

Failure: Internet Explorer 7.0 Windows mailcheck.mootools distance tests Working with emails > Expect cache to store look-up failures for faster future reference
    [assert.isFalse] Expected aol.co.uk to be false

4 test cases, 32 tests, 32 assertions, 9 failures, 0 errors, 0 timeouts
@samoht625
Copy link

Changing the array access to use charAt instead of [] seems to fix this bug too. Here is a rough patch:

while ((c + offset1 < s1.length) && (c + offset2 < s2.length)) { - if (s1[c + offset1] == s2[c + offset2]) { - if (s1.charAt(c + offset1) == s2.charAt(c + offset2)) { lcs++; } else { offset1 = 0; offset2 = 0; for (var i = 0; i < maxOffset; i++) { - if ((c + i < s1.length) && (s1[c + i] == s2[c])) { - if ((c + i < s1.length) && (s1.charAt(c + i) == s2.charAt(c))) { offset1 = i; break; } - if ((c + i < s2.length) && (s1[c] == s2[c + i])) { - if ((c + i < s2.length) && (s1.charAt(c) == s2.charAt(c + i))) { offset2 = i; break; }

@DimitarChristoff
Copy link
Author

good catch, i just made levenstein my default - but kept sift3 as an option. all my tests now pass in IE7 as well: http://fragged.org/dev/mailcheck/test/

if you actually read my test file (its buster not jasmine), some of the expectations are a bit off with sift3, like 1.5 when I'd have expected 2 when 1 typo + 1 new char, eg:

https://github.com/DimitarChristoff/mailcheck/blob/master/test/tests/mootools.mailcheck-test.js#L22-24

vs levenstein for same case:

https://github.com/DimitarChristoff/mailcheck/blob/master/test/tests/mootools.mailcheck-test.js#L47-49

cheers anyway

@khrizt
Copy link
Contributor

khrizt commented Apr 11, 2012

Hi,

I've implemented mailcheck on my site and also have problems with IE6 and 7, is there any possibility to fix this in the master branch? I always can download the non-minified version and apply the samoht625 patch but would be great to have it in the official version.

Thanks

@hpshelton
Copy link
Contributor

My fork has an implementation of the Levenstein algorithm in the full and minimized versions, but it still defaults to the sift3 implementation at the moment and has not yet been submitted as a pull request to the master; use at your own risk.

@hpshelton
Copy link
Contributor

@samoht625 Good catch. From stackoverflow:

In JavaScript <= 5.7 (IE 7) you cannot access a string like an array. You have to use String.charAt() instead.

@derrickko
Copy link
Member

I'll merge the charAt change for sure.

What are your views on sift3 vs Levenshtein? I'm not keen on offering two algorithms by default, and I'm open to changing the default as well.

@samoht625
Copy link

I ran some offline analysis of the email addresses for dropbox (we are
using this system on dropbox.com/register btw - thanks!), and sift3 was
certainly faster than levenshtein. Not sure if it actually would ever
matter, but just a data point. I'm cool with either. I think as long as
you get the charAt patch, you're good... it's a pretty crummy experience in
IE7 without that patch.

@derrickko
Copy link
Member

Thanks @samoht625! Glad to see it on Dropbox. I've merged the IE fix into master.

Closing this issue for now. We may offer alternate string similarity algorithms in the future, in way that doesn't introduce unnecessary bloat. (See #18).

I'll be merging in a couple of pull requests (including @hpshelton's awesome one) for a 1.2 release shortly.

@DimitarChristoff
Copy link
Author

like i said before:

some of the expectations are a bit off with sift3, like 1.5 when I'd have expected 2 when 1 typo + 1 new char, eg:

so between 'gmail and 'gnails' it returns 1.5 when there is a 2 char 'distance'. levenstein returns 2 so makes more sense.

amir-s pushed a commit to amir-s/mailcheck that referenced this issue Jan 23, 2015
jgarcia0092 added a commit to jgarcia0092/mailcheck that referenced this issue Jun 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants