-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new fuzzy quote matching implementation #2814
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2814 +/- ##
==========================================
+ Coverage 97.75% 97.77% +0.01%
==========================================
Files 203 204 +1
Lines 7720 7764 +44
Branches 1708 1718 +10
==========================================
+ Hits 7547 7591 +44
Misses 173 173
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave this a look over last week and tested it out a bit. Obviously we get much better orphan matching out of the gate and I don't see any red flags. Code seems straightforward (ignoring approx-string-match), can't give any reasons not to merge this now!
return 0.0; | ||
} | ||
const matches = search(text, str, str.length); | ||
return 1 - matches[0].errors / str.length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
readability for me :)
return 1 - matches[0].errors / str.length; | |
1 - (matches[0].errors / str.length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, although I did have to suppress Prettier's formatting.
const matches = search(text, quote, maxErrors); | ||
|
||
if (matches.length === 0) { | ||
// All matches had more than `maxErrors` errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could perhaps call this a "candidate" or "candidate match"
e.g.
// All candidates had more than maxErrors
errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this again the comment is pretty superfluous, so I just removed it.
Implement a `matchQuote` function which will be used to replace `dom-anchor-text-quote` for finding the best match for annotation quotes in the document text. The new implementation is based on the `approx-string-match` library and provides several improvements over the existing one: - Better performance when there are many differences between the quote and closest document text - It will be easier for us to tune the degree of mismatch allowed between the quote and document text and how candidate matches are ranked
0920964
to
500e9bc
Compare
This PR is the first part of #2779. It adds the new fuzzy quote matching algorithm and tests. The next PR will then switch quote matching in the client to use the new implementation.
Implement a
matchQuote
function which will be used to replacedom-anchor-text-quote
for finding the best match for annotation quotesin the document text.
The new implementation is based on the
approx-string-match
library andprovides several improvements over the existing one:
Better performance when there are many differences between the quote
and closest document text
It will be easier for us to tune the degree of mismatch allowed
between the quote and document text and how candidate matches are
ranked