Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

win,tty: win10 supports utf-16 surrogate pairs #2910

Closed
wants to merge 2 commits into from

Conversation

bnoordhuis
Copy link
Member

memset(&info, 0, sizeof(info));
info.dwOSVersionInfoSize = sizeof(info);
pRtlGetVersion(&info);
is_windows_10_or_greater = (info.dwMajorVersion >= 10);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a fallback setting for is_windows_10_or_greater when pRtlGetVersion == NULL?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because in that case is_windows_10_or_greater == 0 because it's in static storage (variables in bss are zero-initialized.)

Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code seems arranged strangely to me, if I'm reading correctly, this would simply drop any character entirely on Windows 10 if the utf8_codepoint > 0xffff. Have you tested this visually to confirm it doesn't actually do that?

Did Microsoft give a statement or documentation update that shows when this changed? It's been a couple years since I tested to show it was still MBS WCS-2, so happy to hear that's been addressed now! Are these always interpreted as UTF16 now, or only in certain terminal modes?

It was perhaps a bit odd to have here special handling for conhost.exe, when that's not the only terminal emulator that could be reading this output (could also be ConEmu, Console2, etc.). Though the old behavior was attempting to play to the lowest common denominator, even then the wcswidth is sometimes wrong (true across all platforms and terminal emulators), so I'd possibly propose we eliminate this conditional entirely.

@erw7
Copy link
Contributor

erw7 commented Jul 10, 2020

Windows 10 version 1909 (OS build 18363.900) conhost.exe does not support the display of surrogate pairs. But Windows Terminal supports it (I don't know which version it is supported from). I believe that conhost.exe only supported it from Windows10 version 2004. It may be better for the program to guess the terminal and switch it as in uv_tty_set_vterm_state.

The following changes should also be required to display characters above U+FFFF.

--- a/src/win/tty.c
+++ b/src/win/tty.c
@@ -2857,6 +2854,12 @@ static int uv_tty_write_bufs(uv_tty_t* handle,
         ENSURE_BUFFER_SPACE(1);
         utf16_buf[utf16_buf_used++] = (WCHAR) utf8_codepoint;
         previous_eol = 0;
+      } else {
+        ENSURE_BUFFER_SPACE(2);
+        utf8_codepoint -= 0x10000;
+        utf16_buf[utf16_buf_used++] = (WCHAR) (utf8_codepoint / 0x400 + 0xD800);
+        utf16_buf[utf16_buf_used++] = (WCHAR) (utf8_codepoint % 0x400 + 0xDC00);
+        previous_eol = 0;
       }
     }
   }

@DHowett
Copy link

DHowett commented Jul 10, 2020

So - hi, conhost's owner here! - I definitely misread our history when I reported #2909. I'm really sorry about that!

I do, however, think the best course of action is to simply let surrogate pairs through. The console supports storing them all the way back to 10.0.14393 (the earliest version I have in a VM) -- it just doesn't support rendering them. I have the feeling that it might support storing them down to 10.0.10240, and I can verify that with an investment of about 30 minutes, which I'm happy to do on Monday.

Now, why does it matter if we store them if we can't render them? Because, as @vtjnash brings up, there are other terminal emulators on Windows. ConEmu/Cmder/ConsoleZ and WinPTY are wrappers around conhost. They're implemented as screen-scraping clients that hook in as console applications and read the buffer back. Because of that, they would still be able to benefit from having surrogate pair halves in the buffer... and they could render them "correctly". 😄

@bnoordhuis
Copy link
Member Author

@libuv/windows Dustin's rationale sounds reasonable to me, WDYT? I'll include @erw7's patch if there's consensus.

@bnoordhuis
Copy link
Member Author

Patch included + rebased. New CI: https://ci.nodejs.org/job/libuv-test-commit/1950/

Comment on lines +2126 to 2130
* console before Windows 10 doesn't really support UTF-16, so just emit
* the replacement character. */
if (!is_windows_10_or_greater && utf8_codepoint > 0xffff) {
utf8_codepoint = UNICODE_REPLACEMENT_CHARACTER;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* console before Windows 10 doesn't really support UTF-16, so just emit
* the replacement character. */
if (!is_windows_10_or_greater && utf8_codepoint > 0xffff) {
utf8_codepoint = UNICODE_REPLACEMENT_CHARACTER;
}

The proposal by Dustin and myself is to also eliminate this restriction entirely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is, does it make sense to pass them through below windows 10? Dustin suggested that it works all the way back to

10.0.14393 (the earliest version I have in a VM)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's just the version number for the OS he was running, but it's always been possible to run an arbitrary application on the OS which support this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump?

@musm musm mentioned this pull request Aug 6, 2020
@musm
Copy link
Contributor

musm commented Aug 24, 2020

bump

@bnoordhuis
Copy link
Member Author

#2910 (comment) is still an unresolved question 🤷

@musm
Copy link
Contributor

musm commented Aug 30, 2020

@bnoordhuis We have an answer #2971 (comment) , which includes the fix as suggested by conhost's owner.

@bnoordhuis
Copy link
Member Author

bnoordhuis commented Sep 1, 2020

Closing in favor of #2971.

@bnoordhuis bnoordhuis closed this Sep 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The windows console does support UTF-16 surrogate pairs now
8 participants