Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

small changes to the scoring algorithm #2452

Merged
merged 8 commits into from Sep 20, 2018
Merged

small changes to the scoring algorithm #2452

merged 8 commits into from Sep 20, 2018

Conversation

skurfer
Copy link
Member

@skurfer skurfer commented Aug 1, 2018

I addressed the first two points here @fwextensions. On the third, I don’t doubt the bug, but the fix doesn’t seem right. At that point in the code, mask is going to be thrown away forever anyway. Clearing it wouldn’t make a difference. (I tried and the scores didn’t change.) I haven’t had a chance to dig in and figure out the actual fix.

closes #2450

@fwextensions
Copy link

@fwextensions fwextensions commented Aug 1, 2018

At that point in the code, mask is going to be thrown away forever anyway.

Is it? I thought the caller passed it in by reference. I don't know what the id type is in the signature, but in the test file the mask param is defined as NSMutableIndexSet *indexes, which I assume is pointer to the set, so the end of the function wouldn't have any impact on the caller's value.

Clearing it wouldn’t make a difference. (I tried and the scores didn’t change.)

Yes, clearing it doesn't make a difference in the scores, just in the set of indices of the matched characters. I'm guessing the UI just uses the first abbr.length indices and ignores the rest, but those won't necessarily correspond to the matches that were used for the score.

I'd also recommend resetting the index set after each strRange.length += STEP;. Otherwise, it's just accumulating indices from all the previous calls, which isn't how the scorer would normally be called.

Otherwise, 👍

@skurfer
Copy link
Member Author

@skurfer skurfer commented Sep 19, 2018

Yes, clearing it doesn't make a difference in the scores, just in the set of indices of the matched characters.

I see what you mean now. What’s odd is that @tiennou specifically wrote the tests as if he was expecting the abbreviation to be repeated within the same string. I’ve asked for clarification on that.

In the meantime, I’ve made the change to clear the mask and am testing it locally. Hoping to get a release out before Mojave.

skurfer added 2 commits Sep 19, 2018
* don’t expect repeats if the abbreviation occurs more than once
* reset indexes before each check, since it should be empty in real-world use
* test a variety of abbreviations that hit more of the string
@skurfer skurfer merged commit 4b924df into master Sep 20, 2018
2 checks passed
skurfer added a commit that referenced this issue Sep 20, 2018
@skurfer skurfer deleted the scoring branch Sep 20, 2018
@skurfer
Copy link
Member Author

@skurfer skurfer commented Sep 20, 2018

In the absence of feedback, I’m going for it. I think this makes more sense. 1.6.0 will be a preview release at first if you want to try it out.

@@ -124,5 +124,6 @@ CGFloat QSScoreForAbbreviationWithRanges(CFStringRef str, CFStringRef abbr, id m
}
}
CFRelease(userLoc);
[mask removeAllIndexes];
Copy link
Member

@tiennou tiennou Sep 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIR, the mask-managing thing is supposed to be the responsibility of higher-level code (ie. it's not supposed to pass an already-populated/non-empty index set in here), so I'm somewhat suspicious that this could cause a spurious reset of a correctly matching mask in a (previous) recursive step in case a subsequent one doesn't. I'm failing to imagine an example of it though, so maybe the algorithm already caters to that ?

Copy link
Member Author

@skurfer skurfer Sep 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIR, the mask-managing thing is supposed to be the responsibility of higher-level code (ie. it's not supposed to pass an already-populated/non-empty index set in here)

But it was passing a non-empty set. If you look at testLongString (before the changes here), the index set would have up to 10 matches for an abbreviation 4 characters long. Though now I’m wondering if that’s because the test was reusing the index without clearing it each time. You might be right about the behavior of the actual application never doing that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two index set issues. One is that the test code wasn't passing in a fresh set to each call, like you normally would, so it was caching hits from previous calls.

The other is that the algorithm will leave behind hits from abandoned iterations if the set isn't cleared before returning 0, which is what this line changes. I mentioned the example of matching "test" against "t e s t tess!" in this comment on the PR that originally added this test (a year later, so I'm assuming no one saw it).

Other than this one file, I have zero familiarity with the Quicksilver code base, but I'm betting there's something that is picking just the first abbreviation.length chars as matches to render, so that the "extra" hits left in the index set aren't shown. I think that works in most cases, but I believe there are edge cases where it would show hits that don't actually correspond to the match that was used to calculate the score.

@tiennou
Copy link
Member

@tiennou tiennou commented Sep 23, 2018

There are two index set issues. One is that the test code wasn't passing in a fresh set to each call, like you normally would, so it was caching hits from previous calls.

This one's my fault — I wasn't exactly testing the masks, only the scoring system, hence blindly passing the same indexset over and over is incorrect. Arguably, what gets "masked" should be in its own test case.

The other is that the algorithm will leave behind hits from abandoned iterations if the set isn't cleared before returning 0, which is what this line changes.

This one is what I'm worried about, ie. the spurious reset I alluded to above. I would only expect indexes from a "valid match path" to be included, since we only add found things (at line 85). Does only moving the indexset addition just before the return score at line 123 make sense, so we'd only add things that recursive calls would have considered valid ?

All this said, I suspect something is wrong, because I've noticed this, using the default ranker.
capture d ecran 2018-09-23 22 16 56

I'm not quite sure if that's correct or not… Shouldn't a 3-letter match from the start of a word be higher than a 2l + l ? Or is that the fabled TextStart ranker 😉?

@fwextensions
Copy link

@fwextensions fwextensions commented Sep 24, 2018

I would only expect indexes from a "valid match path" to be included, since we only add found things (at line 85).

If you match "test" against "t e s t tess!", you'll see that the hit indexes are [ 0, 2, 4, 6, 8, 9, 10 ]. The [8, 9, 10] are left over from the first iteration.

Does only moving the indexset addition just before the return score at line 123 make sense, so we'd only add things that recursive calls would have considered valid ?

Good idea. I tried doing that in my JS version of the algorithm and all the test suites passed. The hits do get set in reverse order, though I assume that doesn't matter for a mask. (It did for me, since I was pushing them onto an array, but using unshift() to push them onto the beginning of the array instead left them in the correct order at the end.) So maybe that's the cleaner fix.

Shouldn't a 3-letter match from the start of a word be higher than a 2l + l ?

It's due to the lengths of the strings. If you make them the same length, like "Divinity - Or", that string scores slightly higher.

@skurfer
Copy link
Member Author

@skurfer skurfer commented Sep 27, 2018

I haven’t been following this too closely, but if someone wants to add the proposed fix, I’ll look it over and try to get it into 1.6.1. I’d like to release it soon since we currently have a bug that prevents QS from launching under 10.10.

@skurfer
Copy link
Member Author

@skurfer skurfer commented Oct 11, 2018

I would only expect indexes from a "valid match path" to be included, since we only add found things (at line 85). Does only moving the indexset addition just before the return score at line 123 make sense, so we'd only add things that recursive calls would have considered valid ?

So are you suggesting moving that if block down before the return, and getting rid of the new call to removeAllIndexes? I tried that. I don’t see much difference in actual use, but the revised tests still pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants