New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
win,tty: win10 supports utf-16 surrogate pairs #2910
Conversation
memset(&info, 0, sizeof(info)); | ||
info.dwOSVersionInfoSize = sizeof(info); | ||
pRtlGetVersion(&info); | ||
is_windows_10_or_greater = (info.dwMajorVersion >= 10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a fallback setting for is_windows_10_or_greater
when pRtlGetVersion == NULL
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because in that case is_windows_10_or_greater == 0
because it's in static storage (variables in bss are zero-initialized.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code seems arranged strangely to me, if I'm reading correctly, this would simply drop any character entirely on Windows 10 if the utf8_codepoint > 0xffff. Have you tested this visually to confirm it doesn't actually do that?
Did Microsoft give a statement or documentation update that shows when this changed? It's been a couple years since I tested to show it was still MBS WCS-2, so happy to hear that's been addressed now! Are these always interpreted as UTF16 now, or only in certain terminal modes?
It was perhaps a bit odd to have here special handling for conhost.exe
, when that's not the only terminal emulator that could be reading this output (could also be ConEmu, Console2, etc.). Though the old behavior was attempting to play to the lowest common denominator, even then the wcswidth
is sometimes wrong (true across all platforms and terminal emulators), so I'd possibly propose we eliminate this conditional entirely.
Windows 10 version 1909 (OS build 18363.900) The following changes should also be required to display characters above --- a/src/win/tty.c
+++ b/src/win/tty.c
@@ -2857,6 +2854,12 @@ static int uv_tty_write_bufs(uv_tty_t* handle,
ENSURE_BUFFER_SPACE(1);
utf16_buf[utf16_buf_used++] = (WCHAR) utf8_codepoint;
previous_eol = 0;
+ } else {
+ ENSURE_BUFFER_SPACE(2);
+ utf8_codepoint -= 0x10000;
+ utf16_buf[utf16_buf_used++] = (WCHAR) (utf8_codepoint / 0x400 + 0xD800);
+ utf16_buf[utf16_buf_used++] = (WCHAR) (utf8_codepoint % 0x400 + 0xDC00);
+ previous_eol = 0;
}
}
} |
So - hi, conhost's owner here! - I definitely misread our history when I reported #2909. I'm really sorry about that! I do, however, think the best course of action is to simply let surrogate pairs through. The console supports storing them all the way back to 10.0.14393 (the earliest version I have in a VM) -- it just doesn't support rendering them. I have the feeling that it might support storing them down to 10.0.10240, and I can verify that with an investment of about 30 minutes, which I'm happy to do on Monday. Now, why does it matter if we store them if we can't render them? Because, as @vtjnash brings up, there are other terminal emulators on Windows. ConEmu/Cmder/ConsoleZ and WinPTY are wrappers around conhost. They're implemented as screen-scraping clients that hook in as console applications and read the buffer back. Because of that, they would still be able to benefit from having surrogate pair halves in the buffer... and they could render them "correctly". 😄 |
@libuv/windows Dustin's rationale sounds reasonable to me, WDYT? I'll include @erw7's patch if there's consensus. |
Patch included + rebased. New CI: https://ci.nodejs.org/job/libuv-test-commit/1950/ |
* console before Windows 10 doesn't really support UTF-16, so just emit | ||
* the replacement character. */ | ||
if (!is_windows_10_or_greater && utf8_codepoint > 0xffff) { | ||
utf8_codepoint = UNICODE_REPLACEMENT_CHARACTER; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* console before Windows 10 doesn't really support UTF-16, so just emit | |
* the replacement character. */ | |
if (!is_windows_10_or_greater && utf8_codepoint > 0xffff) { | |
utf8_codepoint = UNICODE_REPLACEMENT_CHARACTER; | |
} |
The proposal by Dustin and myself is to also eliminate this restriction entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is, does it make sense to pass them through below windows 10? Dustin suggested that it works all the way back to
10.0.14393 (the earliest version I have in a VM)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's just the version number for the OS he was running, but it's always been possible to run an arbitrary application on the OS which support this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bump?
bump |
#2910 (comment) is still an unresolved question 🤷 |
@bnoordhuis We have an answer #2971 (comment) , which includes the fix as suggested by conhost's owner. |
Closing in favor of #2971. |
Fixes: #2909
CI: https://ci.nodejs.org/job/libuv-test-commit/1944/