-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(search): Handle word searches with char codes below 0x3000 #1607
fix(search): Handle word searches with char codes below 0x3000 #1607
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1607 +/- ##
==========================================
+ Coverage 79.66% 79.84% +0.17%
==========================================
Files 7 7
Lines 3010 2992 -18
Branches 189 185 -4
==========================================
- Hits 2398 2389 -9
+ Misses 607 599 -8
+ Partials 5 4 -1
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick review before work; I'll try to look again afterwards! (It looks reasonable though!)
#190 is the issue for handling ~ As far other tests, I think we don't need to test every single word; it should be enough to test a representative word and be confident that similar words behave the same. I'd be interested more in the doubts you had while developing since as you say the goal of tests is to allow making changes without worrying! |
- This if-check was there for performance, but there's no difference in performance so it can be removed.
…s://github.com/bryanjenningz/rikaikun into fix-word-searches-with-char-codes-below-0x3000
I didn't have any specific doubts while developing. I'm just not familiar with the code in wordSearch so I just wrote some tests to see the diffs before and after my change to be 100% sure the wordSearch output is changed how I expected it to change. |
- This range check was preventing words like "+α" from being searched in the dictionary.
I changed the PR body a bit to make it read better (imo) once it gets merged and becomes the commit message body. Just for your information, I had two goals:
An astute reader would be able to figure that stuff out but since code sleuthing months/years later is hard, I tend to try to make it as easy as possible for my future self. :) |
- Before, non-Japanese text would not trigger xsearch for performance reasons, which caused words starting with non-Japanese characters to not trigger xsearch. - These performance reasons don't appear to be relevant anymore, but if there are any performance problems in the future, we can filter in a more granular way in the future to fix any performance problems.
Thanks so much! I appreciate that. The PR has a nice initial paragraph now. I'll be sure to add more context like this in future PRs so that it's easier to understand for my future self too. |
@all-contributors Add @bryanjenningz for code. |
I've put up a pull request to add @bryanjenningz! 🎉 |
Adds @bryanjenningz as a contributor for code. This was requested by melink14 [in this comment](#1607 (comment))
🎉 This PR is included in version 2.5.13 🎉 The release is available on: Your semantic-release bot 📦🚀 |
Words like DVD-ROM were not matching due to the hyphen (represented by a char code below 0x3000) in the middle. In order to include, entries like +α, logic was removed from rikaicontent.ts that skipped lookups when the letter the mouse hovered over was not a normal Japanese char code. (This included other various ranges which didn't show additional impacts as of this PR.)
These changes are thought to have been added for performance reasons which are no longer relevant but if some problem does crop up we can solve it more narrowly and with better documentation at that time.
See word entries affected by this change (122 unique words)
This list of words was collected by running this code:
I wrote snapshot tests to make sure none of the words without character codes below 0x3000 were affected by this change. I didn't include those tests in this pull request because they take a while to run (since there are over 200,000 entries in the dictionary) and they make this pull request bigger. I can add those tests in a separate pull request if we think that's a good idea so we can verify diffs of word searches for all words in the dictionary. This would give us more confidence whenever we change the wordSearch function because it shows all the diffs for each search result in the dictionary and it would allow us to drastically simplify the wordSearch function in the future with confidence that we aren't breaking of the previous behavior.
I also wrote code to do a check to make sure all words in the dictionary are searchable and found there is only a 1 unsearchable word in the dictionary now:
So this means after this pull request gets merged, the only unsearchable word entry in the dictionary is "~ [にょろ] /(n) tilde/wave dash/".
This word is a single character with a character code of 65374, which is 0xff5e in hexadecimal (which is above 0x3000, which explains why this pull request doesn't handle this word). After searching through the code for this number 0xff5e, it looks like there's some code that handles this character code that needs to be modified slightly to account for this word in the dictionary. That can be handled in a subsequent pull request to keep this pull request short and simple.
Fixes: #1453