Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C1 control characters detection breaks output (regression) #10310

Closed
alabuzhev opened this issue Jun 2, 2021 · 25 comments · Fixed by #11690
Closed

C1 control characters detection breaks output (regression) #10310

alabuzhev opened this issue Jun 2, 2021 · 25 comments · Fixed by #11690
Labels
Needs-Tag-Fix Doesn't match tag requirements Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Product-Conhost For issues in the Console codebase Resolution-Fix-Committed Fix is checked in, but it might be 3-4 weeks until a release.

Comments

@alabuzhev
Copy link
Contributor

alabuzhev commented Jun 2, 2021

Windows Terminal version (or Windows build number)

1.9.1445.0

Other Software

No response

Steps to reproduce

Compile and run the following code:

#include <windows.h>

int main()
{
	const char data[] = "\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8A\x8B\x8C\x8D\x8E\x8F\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9A\x9B\x9C\x9D\x9E\x9F";
	wchar_t buffer[sizeof(data)];
	if (!MultiByteToWideChar(1252, MB_USEGLYPHCHARS, data, -1, buffer, sizeof(buffer)))
	{
		printf("%d\n", GetLastError());
	}

	DWORD n;
	WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), buffer, sizeof(buffer) / sizeof(wchar_t), &n, 0);

	printf("\n\n");

	for (int i = 0; i != sizeof(data) - 1; ++i)
	{
		if (buffer[i] == (unsigned char)data[i])
		{
			printf("%04X not converted\n", (unsigned char)data[i]);
		}
	}
}

Expected Behavior

€�‚ƒ„…†‡ˆ‰Š‹Œ�Ž��‘’“”•–—˜™š›œ�žŸ or something similar, depending on your output codepage

Actual Behavior

€‚ƒ„…†‡ˆ‰Š‹Œ - most of the characters are missing.

After f91b53d the whole range 0x80 - 0x9F is considered control characters.

The comment above boldly claims that

"we do not need to worry about confusion whether a single byte, for example, \x9b in a single-byte stream represents a C1 CSI or some other glyph, because by the time we get here, everything is Unicode. Knowing whether a single-byte \x9b represents a single-character C1 CSI or some other glyph is handled by MultiByteToWideChar before we get here (if the stream was not already UTF-16). For instance, in CP_ACP, if a \x9b shows up, it will get converted to \x203a. So, if we get here, and have a \x009b, we know that it unambiguously represents a C1 CSI"

, but that is simply not true: as the example code above demonstrates, \x81, \x8D, \x8F, \x90, and \x9D are not handled by MultiByteToWideChar (at least in codepage 1252 ANSI - Latin I, hopefully popular enough), so no, not everything is Unicode by the time we get here, and no, we do need to worry about such confusion and implement a proper check to avoid breaking existing applications.

@ghost ghost added Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Needs-Tag-Fix Doesn't match tag requirements labels Jun 2, 2021
@skyline75489
Copy link
Collaborator

I think #7854 (comment) explains this.

@DHowett
Copy link
Member

DHowett commented Jun 2, 2021

@skyline75489 has the right of it. Those characters are unspecified in 1252. They are control characters after f91b53d, but they were control characters before that, too.

What is a well-meaning application doing printing things outside of its codepage’s codepoint coverage?

@alabuzhev
Copy link
Contributor Author

The application basically outputs a file picked by the user using a codepage picked by the user.

If that behaviour is now by design - ok.
Although it would be nice to either mention that is the comment or remove that MultiByteToWideChar-inspired motivation altogether in favour of something like 0x80 - 0x9F are control codes now. Deal with it. to avoid further confusions.

@alabuzhev
Copy link
Contributor Author

A few more thoughts:

Except for SS2 and SS3 in EUC-JP text, and NEL in text transcoded from EBCDIC, the 8-bit forms of these codes are almost never used. CSI, DCS and OSC are used to control text terminals and terminal emulators, but almost always by using their 7-bit escape code representations. Their ISO/IEC 2022 compliant single-byte representations are invalid in UTF-8, and the UTF-8 encodings of their corresponding codepoints are two bytes long like their escape code forms (for instance, CSI at U+009B is encoded as the bytes 0xC2, 0x9B in UTF-8), so there is no advantage to using them rather than the equivalent two-byte escape sequence. When these codes appear in modern documents, web pages, e-mail messages, etc., they are usually intended to be printing characters at that position in a proprietary encoding such as Windows-1252 or Mac OS Roman that use the C1 codes to provide additional graphic characters.

  • Windows does allow 0x80 - 0x9F in filenames. You can literally create a file named "��������������������������������", type dir, sit back and watch the world burn:

image
image

Don't ask "but why?" - users can, so they will.
And incorrect codepage conversions are still a thing, especially during processing of various metadata. There are lots and lots of weird file names in the wild.

Sanitising file names, everywhere, even in scenarios not related to outputting anything for the sake of the feature that is "almost never used"... 🤔
Supposedly sooner or later C1 will make it into the conhost and this is where the fun begins.

@j4james
Copy link
Collaborator

j4james commented Jun 2, 2021

Note that your test case won't be interpreted as control characters in conhost (even the very latest build), because you have to have the ENABLE_VIRTUAL_TERMINAL_PROCESSING mode set for this functionality to apply. In Windows Terminal that's enabled by default, so if you don't want VT controls processed in WT I think you have to explicitly disable that mode.

That said, the cmd shell does enable VT mode, so control characters in a filename could be an issue there. Somebody already raised that in issue #10069, which I misdiagnosed as a conpty problem, but I've just checked with a recent OpenConsole build and can reproduce the issue there too. So that issue should probably be reopened - it's not a dup of #4363.

However, note that this has been an issue long before PR #7340, because we already supported the 8-bit CSI control before then. PR #7340 just added more controls.

@PennRobotics
Copy link

PennRobotics commented Oct 22, 2021

BurntSushi/ripgrep#1992

At least two more users are affected by this behavior.

@DHowett
Copy link
Member

DHowett commented Oct 22, 2021

It seems like BurntSushi/ripgrep#1992 is another instance of "an application is printing UTF-8 to the screen without setting the console codepage to UTF-8, or converting internally and printing it as UTF-16."

@PennRobotics
Copy link

PennRobotics commented Oct 22, 2021

I'm aware of chcp.com, but is there a way to change the code page in WSL2? I've tried (semi-successfully) to pipe output through iconv, but I don't know flags to get the same output as conhost. For instance, when I use -f UTF-7 and omit invalid characters, -c, the output (below) ends at letter z.

In any case, I want to run commands without escape codes causing cut off lines and random escape characters appearing on the next zsh prompt.

This escape code behavior is not occurring in the default WSL terminal (via conhost, I believe), where I see instead
let identchars_ok = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz������������������������������ ¡¢£¤¥¦§µÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ' (Interesting enough, the Github edit window for this post shows a different Unicode replacement character than this post.)


Update: piping iconv -f UTF-8 -t UNICODE -c makes all the extended characters show up as replacement symbols, which is nice. Not using the -c flag causes an error at line 396, so I opened up vim to this line, and it's a total, utter disaster. Lines 391, 392, 396, 397, and 401 will change persistently just by moving the cursor around in this area. This is a function called Run_regexp_multibyte_magic, and the hexdump of the offending characters from these lines follows:

00000000  e0 b8 ab e0 b8 a1 e0 b9  88 78 20 e0 b8 ad e0 b8  |.........x .....|
00000010  a1 78 ** e0 b8 ad e0 b8  a1 78 20 e0 b8 ab e0 b8  |.x.......x .....|
00000020  a1 e0 b9 88 78 ** fc 92  8d 85 99 b8 79 ** fc 92  |....x.......y...|
00000030  8d 8a af 8d 7a ** c3 a4  c3 b6 20 c3 bc ce b1 cc  |....z..... .....|
00000040  84 cc 86 cc 81 ** **                              |.......|

I replaced 0a with ** so the linebreaks are easier to spot.

This snippet is displayed in vim:

  1 หม่x อมx
  2 อมx หม่x
  3 ������y
  4 ������z
  5 äö üᾱ̆
  6            

Moving the cursor around causes all sorts of trouble. Moving the cursor down, line-by-line, from the top line to the bottom without entering insert mode:

  1 หม่x อม
  2 อมx หม่
  3 ��
  4 ��
  5 äö üα
  6         

On further inspection, this same file shows up screwy in conhost, too, so vim's handling of UTF-8 could be imperfect. Also,
using iconv -f UTF-8 -t ASCII -c will strip away information that might be needed, so I don't believe this is a good solution. The string
AÀÁÂÃÄÅĀĂĄǍǞǠǺȂȦȺḀẠẢẤẦẨẪẬẮẰẲẴẶ BƁɃḂḄḆ CÇĆĈĊČƇȻḈꞒ DĎĐƊḊḌḎḐḒ EÈÉÊËĒĔĖĘĚȄȆȨɆḔḖḘḚḜẸẺẼẾỀỂỄỆ FƑḞꞘ GĜĞĠĢƓǤǦǴḠꞠ HĤĦȞḢḤḦḨḪⱧ IÌÍÎÏĨĪĬĮİƗǏȈȊḬḮỈỊ JĴɈ KĶƘǨḰḲḴⱩꝀ LĹĻĽĿŁȽḶḸḺḼⱠ MḾṀṂ NÑŃŅŇǸṄṆṈṊꞤ OÒÓÔÕÖØŌŎŐƟƠǑǪǬǾȌȎȪȬȮȰṌṎṐṒỌỎỐỒỔỖỘỚỜỞỠỢ PƤṔṖⱣ QɊ RŔŖŘȐȒɌṘṚṜṞⱤꞦ SŚŜŞŠȘṠṢṤṦṨⱾꞨ TŢŤŦƬƮȚȾṪṬṮṰ UÙÚÛÜŨŪŬŮŰƯǕǙǛǓǗȔȖɄṲṴṶṸṺỤỦỨỪỬỮỰ VƲṼṾ WŴẀẂẄẆẈ XẊẌ YÝŶŸƳȲɎẎỲỴỶỸ ZŹŻŽƵẐẒẔⱫ aàáâãäåāăąǎǟǡǻȃȧᶏḁẚạảấầẩẫậắằẳẵặⱥ bƀɓᵬᶀḃḅḇ cçćĉċčƈȼḉꞓꞔ dďđɗᵭᶁᶑḋḍḏḑḓ eèéêëēĕėęěȅȇȩɇᶒḕḗḙḛḝẹẻẽếềểễệ fƒᵮᶂḟꞙ gĝğġģǥǧǵɠᶃḡꞡ hĥħȟḣḥḧḩḫẖⱨꞕ iìíîïĩīĭįǐȉȋɨᶖḭḯỉị jĵǰɉ kķƙǩᶄḱḳḵⱪꝁ lĺļľŀłƚḷḹḻḽⱡ mᵯḿṁṃ nñńņňʼnǹᵰᶇṅṇṉṋꞥ oòóôõöøōŏőơǒǫǭǿȍȏȫȭȯȱɵṍṏṑṓọỏốồổỗộớờởỡợ pƥᵱᵽᶈṕṗ qɋʠ rŕŗřȑȓɍɽᵲᵳᶉṛṝṟꞧ sśŝşšșȿᵴᶊṡṣṥṧṩꞩ tţťŧƫƭțʈᵵṫṭṯṱẗⱦ uùúûüũūŭůűųǚǖưǔǘǜȕȗʉᵾᶙṳṵṷṹṻụủứừửữự vʋᶌṽṿ wŵẁẃẅẇẉẘ xẋẍ yýÿŷƴȳɏẏẙỳỵỷỹ zźżžƶᵶᶎẑẓẕⱬ
becomes
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z

@DHowett
Copy link
Member

DHowett commented Oct 22, 2021

change the code page in WSL2

Hmm. I let my assumptions cloud my interpretation of the linked issue and missed that this was under WSL. I'm sorry.

conhost will unfortunately exhibit this behavior on Windows 11 (in before I've accidentally missed that detail too) or a later update, since the code in this repo is the code for conhost and huge swaths of the terminal emulation code are straight-up shared.

I'll have to come back to the vim bits after the weekend for a deeper investigation.

@PennRobotics
Copy link

PennRobotics commented Oct 22, 2021

I've gotta link this file I'm using as a test for reference: https://github.com/vim/vim/blob/master/src/testdir/test_regexp_utf8.vim
This might be the boss level of encoding tests. On my system, the characters of many of the strings are different when shown in Github and when viewed raw.

I'm not even as much worried about the proper character display as much as the 1;0c that pops out at the next prompt whenever I search certain repositories with the right terms. I'm not sure how that's getting from stdout to the input buffer.

Also, sorry for hijacking @alabuzhev 's thread

@j4james
Copy link
Collaborator

j4james commented Oct 22, 2021

I'm not even as much worried about the proper character display as much as the 1;0c that pops out at the next prompt whenever I search certain repositories with the right terms. I'm not sure how that's getting from stdout to the input buffer.

That's a response to the DECID query which is triggered by the C1 control character U+009A (see https://invisible-island.net/xterm/ctlseqs/ctlseqs.html#h3-C1-_8-Bit_-Control-Characters).

@j4james
Copy link
Collaborator

j4james commented Oct 23, 2021

  1 หม่x อมx
  2 อมx หม่x
  3 ������y
  4 ������z
  5 äö üᾱ̆
  6            

@PennRobotics FYI, this particular case has got nothing to do with C1 controls. Lines 1, 2 and 5 have got combining characters or non-spacing marks, that we are probably not handling correctly, or at least not in the same way as vim. Lines 3 and 4 are all just invalid code points in UTF-8, which I guess could also be an issue if there is a discrepancy in the way vim and WT handle the erroneous values.

I'm not seeing the content changing when moving the cursor up and down in vim, but maybe that depends on the version or other configuration differences. It certainly wouldn't surprise me if it did something weird with those characters though. In any event, I think these problems are possibly more on topic in issue #8000, but @DHowett can correct me on that.

@PennRobotics
Copy link

PennRobotics commented Oct 25, 2021

@j4james set cursorline (and potentially also set cursorcolumn) is the offending .vimrc option

Sure enough, echo -ne '\u009a' is enough to add characters to the beginning of the next prompt, so piping to sed "s/$(echo -ne '\u009a')//g" will suppress this behavior.

Unrelated/unimportant: There's still another stray character sequence that---when running ripgrep ok on the file I linked above---causes two instances of install-from-source (the beginning of the outputted file path; on the next line) to appear as énstall-from-source, and a line break is also stripped away. In general, now I know what to look for when I start to get extra characters at the prompt. Thanks!

@PennRobotics
Copy link

image

The top is Ubuntu via Windows Terminal and the bottom is Ubuntu started as an app.

@lhecker lhecker added the Needs-Discussion Something that requires a team discussion before we can proceed label Oct 25, 2021
@j4james
Copy link
Collaborator

j4james commented Oct 25, 2021

The top is Ubuntu via Windows Terminal and the bottom is Ubuntu started as an app.

@PennRobotics That's because full C1 support was only added in PR #7340 and assumedly hasn't made its way into the inbox conhost yet, or at least not the version you're using. But U+009B should at least work. Try echo -ne '\u009Bc'.

@PennRobotics
Copy link

Is there an easy way to universally disable the aspect of C1 support that is putting characters into the input buffer? An environmental var? I understand this could be the side effects of an exciting new/upcoming feature, but I can't say I would want the old terminal to start doing what this newer Terminal does---prefixing my next command with some gibberish.

I do have the use case where I need to grep a mixed-ASCII-and-binary file (.elf w/ symbols and strings, .pdf, etc.) as part of a larger codebase search, so this has become a rare but occasional annoyance. (I also understand that I'm in a minority of users, and the workaround of pressing Ctrl+U to clear the input line is much, much faster than any alternatives that I can conceive.)

@j4james
Copy link
Collaborator

j4james commented Oct 26, 2021

If you're outputting raw control characters to the screen, there is always the possibility you're going to trigger a query response, even if you could disable C1 support. You'd really need to disable all VT processing entirely.

So if you're using grep on binary files, I'd suggest piping the output through less, which should handle the control character filtering for you. Instead of the controls being interpreted, you'll just see something like <U+009A> in the output.

@PennRobotics
Copy link

Eww. less on its own will erase its buffer after exiting. less -X will leave the output in stdout but forces the user to scroll to the bottom to leave the full (up to the argumented buffer size, at least) results in the active buffer. less -X | grep $ eliminates the scrolling and brings back the unwanted control characters!

In time, I'll find a good alternative to sed and less.

It doesn't make sense why a query response shows up at the prompt as the next command. In contrast, something like ((sleep 1 && echo -n a) &) will echo "a" at the prompt before the cursor (or mid-command if you type immediately after), but this echo'd character is never part of the next command and is overwritten by the correct, typed character if you move the cursor back.

If you don't delete the query response's 1; from the next prompt, you'll jump to the last pushed directory and run a (likely) invalid command. It's not a major bug, but it is an annoyance.

I fail to understand why the control character must become part of stdin instead of stdout, but even wikipedia indicates a terminal can generate sequences that seem to come from the user.

However, for the sake of your time and my time, I have no more interest in this subject and reserve hope that stray characters never land in the input buffer at some point in the future.

Thanks for sharing all this reasonably obscure info on terminal nuances.

@PennRobotics
Copy link

Piping to preconv -r works without removing color codes, although umlauts and other non-English letters also get converted. This is close enough for me.

@j4james
Copy link
Collaborator

j4james commented Oct 31, 2021

@DHowett If we want to try and do something about this, I have a proposal that I think might make most people happy.

  1. We start with C1 controls disabled by default. They're not particularly useful in UTF-8, and it's unlikely anyone is expecting a random subset of them to work in the unmapped portions of the DOS/Windows code pages.
  2. If ISO-2022 mode is requested, that's when we enable the C1 support in the parser, since that's the one time they're actually likely to be needed.
  3. Optionally add support for the DECAC1 (Accept C1 Controls) escape sequence, so they can also be manually enabled in the UTF-8 codepage, in case anyone actually does need that.

Hopefully this will cut down on the bug reports, without actually losing any significant functionality.

It's also worth mentioning that XTerm doesn't support C1 controls in UTF-8 either, so it's unlikely to cause compatibility issues with Linux apps. While there are a few Linux terminals that do support UTF-8 C1 (VTE being the most well known), I think they're probably in the minority.

Anyway, I don't feel that strongly about this either way, but I'd be happy to put together a PR if you like the idea.

@zadjii-msft zadjii-msft removed the Needs-Discussion Something that requires a team discussion before we can proceed label Nov 1, 2021
@DHowett
Copy link
Member

DHowett commented Nov 1, 2021

@j4james I'm totally on board with this proposal. I love it.

The only thing that gives me pause (and it is not enough pause for me to care) is that I think we supported one C1 control when we initially went open-source, and I somewhat wondered if there was a reason we chose to support that one. It's likely a "this one seems common, maybe we should support it we guess?" situation.

@DHowett DHowett removed their assignment Nov 1, 2021
@DHowett
Copy link
Member

DHowett commented Nov 1, 2021

I'll probably prepare this one for isolated ingestion into Windows so it can be released in a servicing update. 😄 Folks may eventually be more broadly upset that conhost is "acting weird" and "displaying corrupt text" and "stole my lunch."

@j4james
Copy link
Collaborator

j4james commented Nov 2, 2021

The only thing that gives me pause (and it is not enough pause for me to care) is that I think we supported one C1 control when we initially went open-source, and I somewhat wondered if there was a reason we chose to support that one. It's likely a "this one seems common, maybe we should support it we guess?" situation.

At the time you went open-source, there were only 5 supported sequences that could potentially have been implemented as C1 as far as I could see: HTS, RI, CSI, OSC, and ST. Of those five, HTS and RI aren't handled at the StateMachine level, so I can understand why they might not even have been considered.

That leaves CSI, ST, and OSC, the first two of which were actually supported as C1 controls. So the odd one out is OSC, which is kind of weird, because the only time you would use ST is when coupled with OSC. But the bottom line is that you supported almost all of the controls at the state machine level that were implemented at that time.

What's giving me pause now, though, is I just went to dig up an issue I remembered from the VTE tracker, where they were considering removing their C1 handling, thinking it would support this decision, but they ultimately chose not to (this was issue 209). Ironically they cited the Windows support of C1 as one of the reasons for keeping it.

It wasn't just the fact that they chose to keep it, though, but they linked to a bug report in Alacritty where Fedora was using an OSC sequence with a C1 control, and since Alacritty didn't support C1, it got a bunch of garbage output on the screen. So that is a situation we could end up facing as well.

That said, I think this particular case only arose because Alacritty was started from a VTE shell, and was misrecognized as VTE because the VTE_VERSION environment variable was set. So I still think we are more likely to get bug reports from having C1 enabled by default, than we will if we remove it, but I don't want to pretend there are no downsides.

@ghost ghost added the In-PR This issue has a related PR label Nov 5, 2021
@alabuzhev
Copy link
Contributor Author

Guys, any recommendations which characters should be used for C1 replacement?
For C0 it's what MB_USEGLYPHCHARS does, but, as far as I see, there's no established equivalent for C1.
I've tried to just remap them to a private range (E080 - E09F), but it looks like the host has performance issues with some of those, at least with E098.

@ghost ghost closed this as completed in #11690 Nov 17, 2021
@ghost ghost added Resolution-Fix-Committed Fix is checked in, but it might be 3-4 weeks until a release. and removed In-PR This issue has a related PR labels Nov 17, 2021
ghost pushed a commit that referenced this issue Nov 17, 2021
There are some code pages with "unmapped" code points in the C1 range,
which results in them being translated into Unicode C1 control codes,
even though that is not their intended use. To avoid having these
characters triggering unintentional escape sequences, this PR now
disables C1 controls by default.

Switching to ISO-2022 encoding will re-enable them, though, since that
is the most likely scenario in which they would be required. They can
also be explicitly enabled, even in UTF-8 mode, with the `DECAC1` escape
sequence.

What I've done is add a new mode to the `StateMachine` class that
controls whether C1 code points are interpreted as control characters or
not. When disabled, these code points are simply dropped from the
output, similar to the way a `NUL` is interpreted.

This isn't exactly the way they were handled in the v1 console (which I
think replaces them with the font _notdef_ glyph), but it matches the
XTerm behavior, which seems more appropriate considering this is in VT
mode. And it's worth noting that Windows Explorer seems to work the same
way.

As mentioned above, the mode can be enabled by designating the ISO-2022
coding system with a `DOCS` sequence, and it will be disabled again when
UTF-8 is designated. You can also enable it explicitly with a `DECAC1`
sequence (originally this was actually a DEC printer sequence, but it
doesn't seem unreasonable to use it in a terminal).

I've also extended the operations that save and restore "cursor state"
(e.g. `DECSC` and `DECRC`) to include the state of the C1 parser mode,
since it's closely tied to the code page and character sets which are
also saved there. Similarly, when a `DECSTR` sequence resets the code
page and character sets, I've now made it reset the C1 mode as well.

I should note that the new `StateMachine` mode is controlled via a
generic `SetParserMode` method (with a matching API in the `ConGetSet`
interface) to allow for easier addition of other modes in the future.
And I've reimplemented the existing ANSI/VT52 mode in terms of these
generic methods instead of it having to have its own separate APIs.

## Validation Steps Performed

Some of the unit tests for OSC sequences were using a C1 `0x9C` for the
string terminator, which doesn't work by default anymore. Since that's
not a good practice anyway, I thought it best to change those to a
standard 7-bit terminator. However, in tests that were explicitly
validating the C1 controls, I've just enabled the C1 parser mode at the
start of the tests in order to get them working again.

There were also some ANSI mode adapter tests that had to be updated to
account for the fact that it has now been reimplemented in terms of the
`SetParserMode` API.

I've added a new state machine test to validate the changes in behavior
when the C1 parser mode is enabled or disabled. And I've added an
adapter test to verify that the `DesignateCodingSystems` and
`AcceptC1Controls` methods toggle the C1 parser mode as expected.

I've manually verified the test cases in #10069 and #10310 to confirm
that they're no longer triggering control sequences by default.
Although, as I explained above, the C1 code points are completely
dropped from the output rather than displayed as _notdef_ glyphs. I
think this is a reasonable compromise though.

Closes #10069
Closes #10310
@ghost
Copy link

ghost commented Feb 3, 2022

🎉This issue was addressed in #11690, which has now been successfully released as Windows Terminal Preview v1.13.10336.0.:tada:

Handy links:

jazzdelightsme added a commit to microsoft/DbgShell that referenced this issue May 17, 2022
Conhost/Terminal supported C1 control sequences for a while... but then
that apparently caused some problems. So they disabled them by default,
which causes all our VT SGR sequences to be broken (so instead of pretty
color output, you just see gray output, with strange numbers sprinkled
all over). Fortunately they provided a way to turn them back on.

Related: microsoft/terminal#11690
Related: microsoft/terminal#10310

Note that I believe you need a relatively recent build for this change
to have effect.
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs-Tag-Fix Doesn't match tag requirements Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Product-Conhost For issues in the Console codebase Resolution-Fix-Committed Fix is checked in, but it might be 3-4 weeks until a release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants