-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Unicode #135
Fix Unicode #135
Conversation
Oops, accidentally included old commits. |
Co-authored-by: Benoît Cortier <benoit.cortier@fried-world.eu>
Updated. Is the order of the |
Also, I just found out that it doesn't fix the unicode problem regex problem exactly, I'll look into it tomorrow. |
I think it looks good, but what is '_' categorized as now? |
Added a test, and it's still recognized as a word. The categories can be seen here, https://www.compart.com/en/unicode/category. I'll look into the aforementioned regression tomorrow, or possibly later depending on my schedule. |
helix-core/src/movement.rs
Outdated
| GeneralCategory::OpenPunctuation | ||
| GeneralCategory::InitialPunctuation | ||
| GeneralCategory::FinalPunctuation => Category::Punctuation, | ||
_ => unreachable!("unknown '{}' character category", ch), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will fail for other cases, like +
it will get a panic. We have to cover all the cases for this, or even just put it under unknown category.
Might be worth having a separate test containing the equivalent of, "the fox jumped over the... Etc.", though I have no idea what that would be for Unicode. Maybe foreign equivalents? The goal of it would be to check for any panics. I guess we should just get one character from each category. |
If you want that, fuzzing will be easier since it spams random characters, then we run our code through the fuzzer. |
Property testing: https://github.com/BurntSushi/quickcheck |
`is_ascii_punctuation` will only work for ASCII punctuations, and when we have unicode punctuation (or other) we jump into the `unreachable`. This patch fallback into categorizing everything in this branch as `Unknown`. Fixes helix-editor#123 helix-editor#135: add better support for unicode categories.
I changed a bit #129 to return a |
That's perfect, I was a little worried about how I could create tests for this PR. |
@kirawi Can you please rebase and add it for ctrl-w in words.rs as well? And the tests should not have weird behavior like not deleting the unicode punctuations. |
#88 was not fixed, that issue should be reopened. I still have no idea what is causing the bug. |
Ah, interesting. Never tried that. Perhaps that's why it chokes on the new input. Well, I'll add a comment to the original issue tomorrow. Getting late here. |
#[inline] | ||
pub(crate) fn is_punctuation(ch: char) -> bool { | ||
use unicode_general_category::{get_general_category, GeneralCategory}; | ||
|
||
matches!( | ||
get_general_category(ch), | ||
GeneralCategory::OtherPunctuation | ||
| GeneralCategory::OpenPunctuation | ||
| GeneralCategory::ClosePunctuation | ||
| GeneralCategory::InitialPunctuation | ||
| GeneralCategory::FinalPunctuation | ||
| GeneralCategory::ConnectorPunctuation | ||
| GeneralCategory::DashPunctuation | ||
| GeneralCategory::MathSymbol | ||
| GeneralCategory::CurrencySymbol | ||
| GeneralCategory::ModifierSymbol | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should move these pub(crate)
and categories stuff to category.rs
or something. Maybe in another PR.
@@ -231,6 +249,7 @@ where | |||
// need to +1 so that prev() includes current char | |||
let mut chars = slice.chars_at(*pos + 1); | |||
|
|||
#[allow(clippy::while_let_on_iterator)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a false positive, maybe report it to clippy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly, but the next line operates on next(), and I didn't know what to do with that so I decided to suppress the warning for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Merging now as it has gotten quite long but we need some follow up on the category thing. Will test it tonight.
Oh wait, can't merge due to conflicts. Can you please fix the conflicts and address the issues?
Squash and merge works |
Fix #123, should probably still include #129 as a fallback. Remember how I said there shouldn't be any regressions in #121? Well, I figured out my "fix" for #88 broke
b
on unicode especially, but now I guess that's fixed here since it properly detects unicode?