Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable the acceptance of C1 control codes by default #11690

Merged
11 commits merged into from Nov 17, 2021

Conversation

j4james
Copy link
Collaborator

@j4james j4james commented Nov 5, 2021

There are some code pages with "unmapped" code points in the C1 range,
which results in them being translated into Unicode C1 control codes,
even though that is not their intended use. To avoid having these
characters triggering unintentional escape sequences, this PR now
disables C1 controls by default.

Switching to ISO-2022 encoding will re-enable them, though, since that
is the most likely scenario in which they would be required. They can
also be explicitly enabled, even in UTF-8 mode, with the DECAC1 escape
sequence.

What I've done is add a new mode to the StateMachine class that
controls whether C1 code points are interpreted as control characters or
not. When disabled, these code points are simply dropped from the
output, similar to the way a NUL is interpreted.

This isn't exactly the way they were handled in the v1 console (which I
think replaces them with the font notdef glyph), but it matches the
XTerm behavior, which seems more appropriate considering this is in VT
mode. And it's worth noting that Windows Explorer seems to work the same
way.

As mentioned above, the mode can be enabled by designating the ISO-2022
coding system with a DOCS sequence, and it will be disabled again when
UTF-8 is designated. You can also enable it explicitly with a DECAC1
sequence (originally this was actually a DEC printer sequence, but it
doesn't seem unreasonable to use it in a terminal).

I've also extended the operations that save and restore "cursor state"
(e.g. DECSC and DECRC) to include the state of the C1 parser mode,
since it's closely tied to the code page and character sets which are
also saved there. Similarly, when a DECSTR sequence resets the code
page and character sets, I've now made it reset the C1 mode as well.

I should note that the new StateMachine mode is controlled via a
generic SetParserMode method (with a matching API in the ConGetSet
interface) to allow for easier addition of other modes in the future.
And I've reimplemented the existing ANSI/VT52 mode in terms of these
generic methods instead of it having to have its own separate APIs.

Validation Steps Performed

Some of the unit tests for OSC sequences were using a C1 0x9C for the
string terminator, which doesn't work by default anymore. Since that's
not a good practice anyway, I thought it best to change those to a
standard 7-bit terminator. However, in tests that were explicitly
validating the C1 controls, I've just enabled the C1 parser mode at the
start of the tests in order to get them working again.

There were also some ANSI mode adapter tests that had to be updated to
account for the fact that it has now been reimplemented in terms of the
SetParserMode API.

I've added a new state machine test to validate the changes in behavior
when the C1 parser mode is enabled or disabled. And I've added an
adapter test to verify that the DesignateCodingSystems and
AcceptC1Controls methods toggle the C1 parser mode as expected.

I've manually verified the test cases in #10069 and #10310 to confirm
that they're no longer triggering control sequences by default.
Although, as I explained above, the C1 code points are completely
dropped from the output rather than displayed as notdef glyphs. I
think this is a reasonable compromise though.

Closes #10069
Closes #10310

@ghost ghost added Area-Output Related to output processing (inserting text into buffer, retrieving buffer text, etc.) Issue-Bug It either shouldn't be doing this or needs an investigation. Product-Conhost For issues in the Console codebase labels Nov 5, 2021
@github-actions
Copy link

github-actions bot commented Nov 5, 2021

@check-spelling-bot Report

Unrecognized words, please review:

  • DECAC
Previously acknowledged words that are now absent carlos dpg sid SPACEBAR Unregister Urxvt vcvarsall xIcon yIcon zamora
To accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands

... in a clone of the git@github.com:j4james/terminal.git repository
on the disable-c1-controls branch:

update_files() {
perl -e '
my @expect_files=qw('".github/actions/spelling/expect/alphabet.txt
.github/actions/spelling/expect/expect.txt
.github/actions/spelling/expect/web.txt"');
@ARGV=@expect_files;
my @stale=qw('"$patch_remove"');
my $re=join "|", @stale;
my $suffix=".".time();
my $previous="";
sub maybe_unlink { unlink($_[0]) if $_[0]; }
while (<>) {
if ($ARGV ne $old_argv) { maybe_unlink($previous); $previous="$ARGV$suffix"; rename($ARGV, $previous); open(ARGV_OUT, ">$ARGV"); select(ARGV_OUT); $old_argv = $ARGV; }
next if /^(?:$re)(?:(?:\r|\n)*$| .*)/; print;
}; maybe_unlink($previous);'
perl -e '
my $new_expect_file=".github/actions/spelling/expect/7aae2e9100c392b0c98b1120152bcf9709fcf9f6.txt";
use File::Path qw(make_path);
use File::Basename qw(dirname);
make_path (dirname($new_expect_file));
open FILE, q{<}, $new_expect_file; chomp(my @words = <FILE>); close FILE;
my @add=qw('"$patch_add"');
my %items; @items{@words} = @words x (1); @items{@add} = @add x (1);
@words = sort {lc($a)."-".$a cmp lc($b)."-".$b} keys %items;
open FILE, q{>}, $new_expect_file; for my $word (@words) { print FILE "$word\n" if $word =~ /\w/; };
close FILE;
system("git", "add", $new_expect_file);
'
}

comment_json=$(mktemp)
curl -L -s -S \
  --header "Content-Type: application/json" \
  "https://api.github.com/repos/microsoft/terminal/issues/comments/961542742" > "$comment_json"
comment_body=$(mktemp)
jq -r .body < "$comment_json" > $comment_body
rm $comment_json

patch_remove=$(perl -ne 'next unless s{^</summary>(.*)</details>$}{$1}; print' < "$comment_body")
  

patch_add=$(perl -e '$/=undef;
$_=<>;
s{<details>.*}{}s;
s{^#.*}{};
s{\n##.*}{};
s{(?:^|\n)\s*\*}{}g;
s{\s+}{ }g;
print' < "$comment_body")
  
update_files
rm $comment_body
git add -u
✏️ Contributor please read this

By default the command suggestion will generate a file named based on your commit. That's generally ok as long as you add the file to your commit. Someone can reorganize it later.

⚠️ The command is written for posix shells. You can copy the contents of each perl command excluding the outer ' marks and dropping any '"/"' quotation mark pairs into a file and then run perl file.pl from the root of the repository to run the code. Alternatively, you can manually insert the items...

If the listed items are:

  • ... misspelled, then please correct them instead of using the command.
  • ... names, please add them to .github/actions/spelling/allow/names.txt.
  • ... APIs, you can add them to a file in .github/actions/spelling/allow/.
  • ... just things you're using, please add them to an appropriate file in .github/actions/spelling/expect/.
  • ... tokens you only need in one place and shouldn't generally be used, you can add an item in an appropriate file in .github/actions/spelling/patterns/.

See the README.md in each directory for more information.

🔬 You can test your commits without appending to a PR by creating a new branch with that extra change and pushing it to your fork. The check-spelling action will run in response to your push -- it doesn't require an open pull request. By using such a branch, you can limit the number of typos your peers see you make. 😉

🗜️ If you see a bunch of garbage

If it relates to a ...

well-formed pattern

See if there's a pattern that would match it.

If not, try writing one and adding it to a patterns/{file}.txt.

Patterns are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your lines.

Note that patterns can't match multiline strings.

binary-ish string

Please add a file path to the excludes.txt file instead of just accepting the garbage.

File paths are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your files.

^ refers to the file's path from the root of the repository, so ^README\.md$ would exclude README.md (on whichever branch you're using).

@j4james j4james marked this pull request as ready for review November 6, 2021 18:16
zadjii-msft
zadjii-msft previously approved these changes Nov 8, 2021
@zadjii-msft zadjii-msft added the Needs-Second It's a PR that needs another sign-off label Nov 8, 2021
@DHowett
Copy link
Member

DHowett commented Nov 9, 2021

Thanks for doing this! So far, I've found one app in the wild that uses C1 controls on Windows -- @jdebp's terminal-tests. I've realized two things:

  1. @jdebp probably has an opinion about this that we haven't heard! Jonathan: if you're interested, there's discussion beginning at C1 control characters detection breaks output (regression) #10310 (comment) about solving the issue of C1 control characters working their way into "normal" output and what we can do about that.
  2. Curiously, when I enable DECAC1 and then run Attributes.ps1, it can no longer ED 2 or DECSCNM.

Other things seem to be working:

Without DECAC1, I get... (expectedly!)

image

but with it, I get:

image

Compared to 7-bit mode:

image

@j4james
Copy link
Collaborator Author

j4james commented Nov 9, 2021

Curiously, when I enable DECAC1 and then run Attributes.ps1, it can no longer ED 2 or DECSCNM.

That's weird. I can't reproduce that. Do those sequences also fail if you test them individually from the command line with echo, or however it is you do these things in powershell?

I thought it might be a timing issue related to the change in code page, like maybe the first part of the output is processed before the code page changes to utf-8. That doesn't seem likely, but if you insert some kind of delay in the script after the code page change that might tell us something.

@j4james
Copy link
Collaborator Author

j4james commented Nov 9, 2021

If nothing else, I just realised while testing this that I got the DECAC1 escape sequence wrong. It should be ESC SP 7, not ESC SP 6.

@DHowett
Copy link
Member

DHowett commented Nov 9, 2021

Huh! So, adding a delay didn't work.... but it definitely looks like DECSET/DECRST don't work with C1 CSI:

image

image

Before this change, it does seem to work (\x9B?5h, that is)

\x9B2J works fine from PowerShell as well. Huh.

@j4james
Copy link
Collaborator Author

j4james commented Nov 9, 2021

OK I can reproduce this now. I was testing in conhost, but this only happens in Windows Terminal. I'm assuming it's got something to do with conpty passing through the sequences with the C1 controls, but the WT parser on the other end is no longer capable of handling that. I wonder if we could fix this just by enabling C1 support in WT.

@DHowett
Copy link
Member

DHowett commented Nov 9, 2021

Huh. I wouldn't've expected that, for sure. Good catch!

Cheap fix: If we pass through DECAC1, it may "just work"?

EDIT: Ah, that's probably what you were suggesting. I got it in my head that we'd only either [always enable it] or [request it be enabled via ConPTY on startup.]

@j4james
Copy link
Collaborator Author

j4james commented Nov 9, 2021

I'm not really sure what the right approach is. I'm going to have to some more experimenting. I'm a little worried now that this is going to break something. It's more complicated than I first thought.

@j4james
Copy link
Collaborator Author

j4james commented Nov 9, 2021

OK, so I think this is the right approach. If C1 controls are being filtered out, that's going to be handled by the conhost parser and shouldn't ever be passed through to conpty. But if C1 controls are being accepted, then there's the potential for some sequences to be passed through exactly as they were received (DECSCNM being one example), and the WT parser needs to be prepared to accept them. But as long as it's always configured to accept C1, that should be fine.

Also note that I've fixed the DECAC1 sequence, so you'll need to use ESC SP 7 from now on.

@DHowett
Copy link
Member

DHowett commented Nov 9, 2021

I'm totally comfortable with that outcome, honestly. The biggest issue with C1 controls originates from Windows' unique disposition with codepages, and so it feels right for the default in Windows' console host to be "ignore". Terminal wanted to be slightly less tied to Windows' console's needs in a lot of regards, and... accepting C1 by default surely doesn't seem like the worst of those regards.

miniksa
miniksa previously approved these changes Nov 10, 2021
Copy link
Member

@miniksa miniksa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me. Sorry that this ended up being a kerfluffle.

src/cascadia/TerminalCore/Terminal.cpp Outdated Show resolved Hide resolved
@DHowett DHowett added the AutoMerge Marked for automatic merge by the bot when requirements are met label Nov 17, 2021
@ghost
Copy link

ghost commented Nov 17, 2021

Hello @DHowett!

Because this pull request has the AutoMerge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

@DHowett
Copy link
Member

DHowett commented Nov 17, 2021

Thanks again, James! Well-thought-out and well-implemented as always.

@ghost ghost merged commit 6742965 into microsoft:main Nov 17, 2021
@j4james j4james deleted the disable-c1-controls branch December 28, 2021 20:59
@ghost
Copy link

ghost commented Feb 3, 2022

🎉Windows Terminal Preview v1.13.10336.0 has been released which incorporates this pull request.:tada:

Handy links:

jazzdelightsme added a commit to microsoft/DbgShell that referenced this pull request May 17, 2022
Conhost/Terminal supported C1 control sequences for a while... but then
that apparently caused some problems. So they disabled them by default,
which causes all our VT SGR sequences to be broken (so instead of pretty
color output, you just see gray output, with strange numbers sprinkled
all over). Fortunately they provided a way to turn them back on.

Related: microsoft/terminal#11690
Related: microsoft/terminal#10310

Note that I believe you need a relatively recent build for this change
to have effect.
ghost pushed a commit that referenced this pull request Sep 12, 2022
When we added support for the `DECAC1` control sequence, which
determines whether `C1` controls are accepted or not, the intention was
that conhost would be making that determination, and Windows Terminal
would always be expected to accept any passed-through `C1` controls.

However, this didn't take into account that a passed-through `RIS`
sequence could end up disabling `DECAC1`, and that would leave Windows
Terminal incapable of processing any `C1` controls. This PR attempts to
fix that oversight.

The `DECAC1` sequence was added in PR #11690, when we disabled `C1`
acceptance by default.

This is a bit of a hack, but I've added a new `AlwaysAcceptC1` mode to
the state machine, which is enabled at startup in the Terminal, and is
never disabled. The parser then just needs to check whether either
`AcceptC1` or `AlwaysAcceptC1` are set.

## Validation Steps Performed

I've manually confirmed the test case in #13968 now works as expected.

Closes #13968
DHowett pushed a commit that referenced this pull request Dec 12, 2022
When we added support for the `DECAC1` control sequence, which
determines whether `C1` controls are accepted or not, the intention was
that conhost would be making that determination, and Windows Terminal
would always be expected to accept any passed-through `C1` controls.

However, this didn't take into account that a passed-through `RIS`
sequence could end up disabling `DECAC1`, and that would leave Windows
Terminal incapable of processing any `C1` controls. This PR attempts to
fix that oversight.

The `DECAC1` sequence was added in PR #11690, when we disabled `C1`
acceptance by default.

This is a bit of a hack, but I've added a new `AlwaysAcceptC1` mode to
the state machine, which is enabled at startup in the Terminal, and is
never disabled. The parser then just needs to check whether either
`AcceptC1` or `AlwaysAcceptC1` are set.

## Validation Steps Performed

I've manually confirmed the test case in #13968 now works as expected.

Closes #13968

(cherry picked from commit f2b361c)
Service-Card-Id: 87207769
Service-Version: 1.15
DHowett pushed a commit that referenced this pull request Dec 12, 2022
When we added support for the `DECAC1` control sequence, which
determines whether `C1` controls are accepted or not, the intention was
that conhost would be making that determination, and Windows Terminal
would always be expected to accept any passed-through `C1` controls.

However, this didn't take into account that a passed-through `RIS`
sequence could end up disabling `DECAC1`, and that would leave Windows
Terminal incapable of processing any `C1` controls. This PR attempts to
fix that oversight.

The `DECAC1` sequence was added in PR #11690, when we disabled `C1`
acceptance by default.

This is a bit of a hack, but I've added a new `AlwaysAcceptC1` mode to
the state machine, which is enabled at startup in the Terminal, and is
never disabled. The parser then just needs to check whether either
`AcceptC1` or `AlwaysAcceptC1` are set.

## Validation Steps Performed

I've manually confirmed the test case in #13968 now works as expected.

Closes #13968

(cherry picked from commit f2b361c)
Service-Card-Id: 87207767
Service-Version: 1.16
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Output Related to output processing (inserting text into buffer, retrieving buffer text, etc.) AutoMerge Marked for automatic merge by the bot when requirements are met Issue-Bug It either shouldn't be doing this or needs an investigation. Needs-Second It's a PR that needs another sign-off Product-Conhost For issues in the Console codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

C1 control characters detection breaks output (regression) Issue with Kanji
4 participants