Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: handle character boundaries for wide chars in extend_selection #17426

Merged
merged 1 commit into from
Jun 19, 2024

Conversation

roife
Copy link
Member

@roife roife commented Jun 15, 2024

fix #17420.

When calling 'extend_selection' within a string, r-a attempts to locate the current word at the cursor. This is done by finding the first char before the cursor which is not a letter, digit, or underscore.

The position of this character is referred to as start_idx, and the word is considered to start from start_idx + 1. However, for wide characters, start_idx + 1 is not character boundaries, which leading to panic. We should use ceil_char_boundary to ensure that the idx is always on character boundaries.

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 15, 2024
@Veykril
Copy link
Member

Veykril commented Jun 19, 2024

Thanks!
@bors r+

@bors
Copy link
Collaborator

bors commented Jun 19, 2024

📌 Commit bf356e0 has been approved by Veykril

It is now in the queue for this repository.

@bors
Copy link
Collaborator

bors commented Jun 19, 2024

⌛ Testing commit bf356e0 with merge a150d42...

@bors
Copy link
Collaborator

bors commented Jun 19, 2024

☀️ Test successful - checks-actions
Approved by: Veykril
Pushing a150d42 to master...

@bors bors merged commit a150d42 into rust-lang:master Jun 19, 2024
11 checks passed
@duncanawoods
Copy link

Just passing by after seeing this in the RA release notes. I lack all context so please forgive me if this is irrelevant!

is_char_boundary will still split grapheme clusters which won't panic but is typically not wanted in an IDE. Code increasingly has emojis in strings, comments etc.

e.g. this has 5 utf-8 character boundaries but for a text editor, it should be treated as just one visual character:

let s = "👨‍👨‍👦";
for i in 0..s.len() {
    println!("{i}: {}", s.is_char_boundary(i));
}

When I needed to segment visual characters/words for an editor, I used:

https://docs.rs/unicode-segmentation/latest/unicode_segmentation/

@roife
Copy link
Member Author

roife commented Jun 25, 2024

Just passing by after seeing this in the RA release notes. I lack all context so please forgive me if this is irrelevant!

is_char_boundary will still split grapheme clusters which won't panic but is typically not wanted in an IDE. Code increasingly has emojis in strings, comments etc.

e.g. this has 5 utf-8 character boundaries but for a text editor, it should be treated as just one visual character:

let s = "👨‍👨‍👦";
for i in 0..s.len() {
    println!("{i}: {}", s.is_char_boundary(i));
}

When I needed to segment visual characters/words for an editor, I used:

https://docs.rs/unicode-segmentation/latest/unicode_segmentation/

Thank you for your suggestion! I indeed missed it. But after thinking and testing, I believe there is no need to consider this aspect in this request. Here is an informal proof:

Assuming the user is working with the string aauuaa, where a represents [A-Za-z0-9_] and u stands for a wide character (potentially forming an emoji), with | indicating the cursor position and {} denoting the selection:

  • When in a|auuaa or aa|uuaa, we will directly select {aa} without affecting the emoji.
  • In the case of aau|uaa, since neither side meets the expansion condition, the selection is empty and we will choose the entire string directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
5 participants