New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QSSense fixup #2342
QSSense fixup #2342
Conversation
Cool! Strange that Git doesn’t show |
I think it's just the way GitHub show (or more precisely, doesn't) renames though. Also, you might want to take a look at the test case. It has some "interesting" characteristics I didn't expect. |
|
||
strRange.length += STEP; | ||
score = QSScoreForAbbreviationWithRanges(str, CFSTR("test"), indexes, strRange, abbrRange); | ||
XCTAssertEqualWithAccuracy(score, 0.76129, ACC); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like this. It actually matches higher just because I added one word (which I don't think should matter) anyway.
Just noticing my test is not what I expected too, I forgot there's a 't' 2nd place in "string"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some comments to this section, I got my head round it, but it took me a while (e.g. line 89 you could say what each index in the NSIndexSet
represents)
So here you're saying matching test
with This excellent string tells us␣
gives a higher score than matching without the space at the end (This excellent string tells
)? Seems strange to me!
Also can you explain what ACC
is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. I was actually surprised that the index set is for each subsequent matching character of the abbreviation in the search string, not just "the best match", as I expected (ie, it doesn't correspond to what is shown underlined in the UI).
Yes, that's the weird thing, scoring increases proportionally to the string's length. I think it's that way because of the change you made (which is why I built the test case), because the other test case doesn't behave like that.
ACC
is just a constant I made up because non-determinism related to floating-point calculation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test case here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tiennou Perhaps it's bad form to comment on a year-old PR thread (and perhaps this has already been figured out), but I also ran into the index set issue in porting the scoring algorithm to JavaScript (and I've been poking into old PRs here to try to understand why the scores in TestQSSense.m
don't quite line up with the results of my code).
Consider matching "test"
against the string "t e s t tess!"
. The full abbreviation doesn't appear as a substring in the first loop, so it's reduced to "tes"
and the function recurses. That substring is found near the end of the string, and those indices ([8, 9, 10]
, I think) get set in the mask. Then it looks for the remaining "t"
in the rest of the string, but only finds "s!"
, so it has to start over and shrink the abbreviation again. Eventually, it finds the individual letters separated by spaces and returns a score. But the mask still contains those first hits that it gave up on.
My solution was to clear the mask just before the return 0
at the end of the function. I believe that's correct, as the 0 score means the algorithm will start over with a shorter abbreviation, and that shorter bit might hit in a different place, rendering the previous hits obsolete. If the UI isn't underlining all those letters, then something else must be cleaning up the mask. (I tried clamping it to the length of the abbreviation, but I don't think that gives the correct results in all cases.)
XCTAssertEqualObjects(indexes, results); | ||
|
||
strRange.length += STEP; | ||
score = QSScoreForAbbreviationWithRanges(str, CFSTR("test"), indexes, strRange, abbrRange); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also shouldn't you use abbr
here instead of re-defining CFSTR("test")
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Will fix.
Grrrr. For some reason, my scheme is missing everything but “Quicksilver Tests”. I guess I need to manually put the rest back. Should we also add some tests for strings that don’t match? |
Another thought on the tests… Do we really need to check for a specific score? A small change to the algorithm down the road doesn’t mean the score is “wrong”. Should we adjust the tests to just check that the scores for certain strings are higher or lower than others? |
This is also a really old branch (2012 was a good year !).
So, we have 2 different versions of the ranking algorithm in-tree, both dating back from the Dawn of Time. The main ranking loop would use QSSense (through
QSStringRanker
). I haven't seen any use of theNSString
category, but since it's hidden behind forwarding, it might be used sometimes.They had a differing behavior (but I never found the time to thoroughly check).The differing behavior was introduced in 87d328a actually. The comparison options are also a give-away (QSSense is case and diacritic insensitive and respect localization, the other is just case-insensitive.
So THE DIFFERENCE is what it is.
I've added a test case for it, because I wanted to 1) see it work 2) see what was different.