Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect handling of unicode combining markers - Backspace splits Ideographic Variation Sequence #15622

Closed
JoshVarty opened this issue Nov 17, 2016 · 5 comments
Assignees
Labels
editor-commands Editor text manipulation commands feature-request Request for new features or functionality verification-needed Verification of issue is requested verified Verification succeeded
Milestone

Comments

@JoshVarty
Copy link

JoshVarty commented Nov 17, 2016

  • VSCode Version: 1.6.1
  • OS Version: Windows 10

We've got this bug in Visual Studio for Windows and I noticed that VS Code has it as well.

Steps to Reproduce:

  1. Paste following code in VSCode. It's C# code, but it repros in any file.
List<string> test = new List<string>();
test.Add("辻󠄁"); // 辻󠄁 8FBB DB40 DD01
test.Add(""); // 辻 8FBB
test.Add("辻󠄀"); // 辻󠄀 8FBB DB40 DD00
  1. Place cursor to right of either first or third 辻󠄀

  2. Press backspace

Expected behavior: The entire character should be removed as it does in Notepad and Microsoft Word

Actual Behavior: The right two bytes are removed from the buffer but 8FBB remains.

These characters are not valid within the context of most C# code (variable/method names) but are legal within comments and strings where they may be more likely to occur in the future.

It's also worth noting that these characters cause issues with left and right arrow movements. I'm not sure if it's worth opening a separate issue for this, though.

@alexdima
Copy link
Member

Variation Selectors are not treated special in any way

// 辻󠄁 8FBB DB40 DD01
"辻󠄁".codePointAt(0).toString(16) // 8fbb
"辻󠄁".codePointAt(1).toString(16) // e0101

http://www.unicode.org/charts/PDF/UE0100.pdf

image

@alexdima alexdima added bug Issue identified by VS Code Team member as probable bug editor-core Editor basic functionality labels Nov 30, 2016
@alexdima alexdima added this to the Backlog milestone Nov 30, 2016
@alexdima alexdima added feature-request Request for new features or functionality and removed bug Issue identified by VS Code Team member as probable bug editor labels Apr 17, 2018
@alexdima alexdima removed their assignment Apr 27, 2018
@alexdima alexdima added editor-commands Editor text manipulation commands and removed editor-core Editor basic functionality labels Oct 21, 2019
@alexdima
Copy link
Member

alexdima commented Oct 21, 2019

The same issue exists with combining diacritics - e.g. or e.g. பு

@alexdima alexdima self-assigned this Oct 21, 2019
@alexdima alexdima modified the milestones: Backlog, October 2019 Oct 21, 2019
@alexdima alexdima changed the title Backspace splits Ideographic Variation Sequence Incorrect handling of unicode markers - Backspace splits Ideographic Variation Sequence Oct 22, 2019
@alexdima alexdima changed the title Incorrect handling of unicode markers - Backspace splits Ideographic Variation Sequence Incorrect handling of unicode combining markers - Backspace splits Ideographic Variation Sequence Oct 22, 2019
@alexdima alexdima added the verification-needed Verification of issue is requested label Oct 28, 2019
@alexdima
Copy link
Member

To verify, paste the following snippet:

List<string> test = new List<string>();
test.Add("辻󠄁"); // 辻󠄁 8FBB DB40 DD01
test.Add("辻"); // 辻 8FBB
test.Add("辻󠄀"); // 辻󠄀 8FBB DB40 DD00
// ããã
// புன்சிரிப்போடு

Attempt to move the cursor around in between characters (left/up/right/down arrows), enable the block cursor. You can compare with the stable version to see the multitude of issues that were addressed...

@ghost
Copy link

ghost commented Nov 20, 2019

This, or at least the incorrect rendering of combining characters in #79067, seems to be related to the font: With the default not-particularly-pretty Menlo, Monaco, 'Courier New', monospace Font Family setting, σ̃ (σ + combining tilde) renders correctly, however with Source Code Pro, everything is messed up.

@alexdima
Copy link
Member

@vomout Yes, in this case it appears that the font Source Code Pro does not handle combining marks correctly:

image

image

@vscodebot vscodebot bot locked and limited conversation to collaborators Dec 6, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
editor-commands Editor text manipulation commands feature-request Request for new features or functionality verification-needed Verification of issue is requested verified Verification succeeded
Projects
None yet
Development

No branches or pull requests

4 participants