Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenConsole: Duplicate output lines when connecting to Ubuntu via SSH #15769

Closed
lhecker opened this issue Jul 26, 2023 · 13 comments · Fixed by #15935
Closed

OpenConsole: Duplicate output lines when connecting to Ubuntu via SSH #15769

lhecker opened this issue Jul 26, 2023 · 13 comments · Fixed by #15935
Labels
Area-VT Virtual Terminal sequence support In-PR This issue has a related PR Issue-Bug It either shouldn't be doing this or needs an investigation. Product-Conhost For issues in the Console codebase

Comments

@lhecker
Copy link
Member

lhecker commented Jul 26, 2023

OpenConsole version

fc95802

Steps to reproduce

The exact steps to reproduce the issue are somewhat unclear to me, but it happens readily on a Ubuntu 22.04.2 LTS server I own. Access can be provided on a case-by-case basis. The issue occurs with both bash and fish-shell with the standard configuration provided by Ubuntu. It does not reproduce with Windows Terminal.

Expected Behavior

image

Actual Behavior

Like this:

image

Or any other wild intermix, including entirely missing or individually duplicated characters. Sometimes I receive up to 5000 lines of output this way.

Investigation

We do actually receive those duplicates as WriteConsole calls:
image

git bisect leads me to believe that this issue was introduced in fc95802. Unfortunately, the effect is entirely random and sometimes seems to not occur at all, so I can't be 100% certain. But I've tried to reproduce the issue with da3a33f ~100 times now and couldn't, whereas it happens at least half the time with fc95802.

@lhecker lhecker added Issue-Bug It either shouldn't be doing this or needs an investigation. Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting labels Jul 26, 2023
@lhecker
Copy link
Member Author

lhecker commented Jul 26, 2023

@j4james FYI I believe I found a bug in the LineFeed refactor. It's a tricky one and I'm not entirely sure yet how to debug this. I suspect I'll just have to do ye old "read the diff until you found the bug"-trick again. 😄

@lhecker lhecker added the Product-Conhost For issues in the Console codebase label Jul 27, 2023
@lhecker
Copy link
Member Author

lhecker commented Jul 27, 2023

I found the exact line I need to flip to make it work again: fc95802#diff-21383c37a75bafb4ee95b4559a8f5b3d421640e19d3aef91cce9a419b28707e9L2103-R2178

In other words, if I revert the AdaptDispatch::LineFeed handling from _DoLineFeed to _api.LineFeed it works again. It's unclear to me however why that would be...

@j4james
Copy link
Collaborator

j4james commented Jul 29, 2023

I don't get what's happening here. Who is making all those WriteConsole calls with identical content? Is that the SSH client? If that's the case, why wouldn't you expect to see duplicate output? Are they resetting the cursor position between the WriteConsole calls? Are their cursor positioning escape sequences in the output somewhere?

And this is assumedly not what the server is sending is it? Like can you do a script recording on the server side to see what is being produced at that end? Because if the server isn't sending duplicate content, then that suggests the SSH client is doing its own VT parsing and rewriting the buffer strangely.

My suggestion would be to add some logging to conhost that writes out everything that is received in the WriteConsole call so we can see exactly what is being sent, and then see if you can reproduce the issue just by TYPEing that log. If not, you may need to check for other relevant API calls, like mode changes, cursor movement, and maybe buffer resizing.

@lhecker
Copy link
Member Author

lhecker commented Jul 29, 2023

I spent quite some time trying to figure out what's going on and I'm not entirely sure which part is broken. FWIW I don't think it's the new _DoLineFeed function itself, but rather a bad interaction with some other part of OpenConsole or Windows' SSH that is simply revealed by its introduction. For instance, I noticed that the issue is most easily reproducible if I connect via SSH the second pwsh is starting up. If I start pwsh and wait a few seconds before connecting, it happens way less often. I mean, for instance, what if we have functions that read the cursor position without acquiring the console lock first? Things like that basically... I'll try to continue looking into it. Thanks for the tip for script. Dustin also suggested using is experimental record & replay feature for the condrv interface.

@lhecker
Copy link
Member Author

lhecker commented Aug 1, 2023

What's also interesting is that I sometimes see a ?2004l being printed. In some case it quickly disappears because it's (partially or entirely) overwritten. (It's like the cursor is being reset to the start of the line.)

@DHowett
Copy link
Member

DHowett commented Aug 1, 2023

Hey, what version of openssh is that? They had an issue around the use of the alternate screen buffer where they would re-send an entire console write when the window size changed.

@j4james
Copy link
Collaborator

j4james commented Aug 1, 2023

Yeah, I was wondering that too. There are quite a few reports of duplicated content related to Win32-OpenSSH, but they all seem to point to a PR at PowerShell/openssh-portable#426 which was fixed years ago.

@lhecker
Copy link
Member Author

lhecker commented Aug 1, 2023

Hey, what version of openssh is that?

I'm using OpenSSH_for_Windows_8.6p1, LibreSSL 3.4.3 (Windows 11 Beta branch, build 22631).

The issue doesn't occur with WSL btw. As I've said before, I don't really think we're doing anything wrong per se, but it is weird that I cannot reproduce it before fc95802 and always reproduce it after fc95802 with Windows' SSH client. Something's whacky, I can feel it. 😄

@j4james
Copy link
Collaborator

j4james commented Aug 1, 2023

Note that PR #14874 is definitely expected to change the behavior of the linefeed operation under certain conditions, but that would typically have involved margins in some way. My concern is that this particular change may not be expected, though, and there's no way of knowing that without seeing the API calls and/or escape sequences the ssh client is using.

@lhecker
Copy link
Member Author

lhecker commented Aug 2, 2023

At this point I genuinely think it's a bug in Windows' SSH implementation. Here's a text file that 9 out of 10 times reproduces the issue in the latest OpenConsole version (1.18.1462.0) if I cat it remotely over SSH in a standard 120x30 viewport: 15769_repro.txt
The start sequence is the output of tput reset. The rest is plain text. It should show the same two lines 11x repeated, but it usually shows them more often than that, similar to my screenshots above.

The shell doesn't play any role at all - it happens even if I use dash. (I've confirmed that the output of script is identical, apart from the additional raw $ input line. I can provide the typescript file if anyone's interested in it.)
I'm still confused why the LineFeed refactor would excerbate this issue, but... Maybe that's simply not important to figure out? It might be more worthwhile to contact the https://github.com/PowerShell/openssh-portable maintainers.

@lhecker
Copy link
Member Author

lhecker commented Aug 2, 2023

Y'ALL I FIGURED IT OUT!!!!!!!!!!!!!!!!!!!!!
You will not believe this. 🤣

This works:

auto hr = screenInfo.SetViewportOrigin(true, newViewOrigin, true);

This does not:

THROW_IF_FAILED(ServiceLocator::LocateGlobals().api->SetConsoleWindowInfoImpl(info, true, windowRect));

Replacing it with this fixes the issue:

void ConhostInternalGetSet::SetViewportPosition(const til::point position)
{
    auto& info = _io.GetActiveOutputBuffer();
    const auto dimensions = info.GetViewport().Dimensions();
    const auto windowRect = til::rect{ position, dimensions }.to_inclusive_rect();
    info.SetViewport(Viewport::FromInclusive(windowRect), true);
}

This happens because SetConsoleWindowInfoImpl ends up calling Window::PostUpdateWindowSize() which sets CONSOLE_SETTING_WINDOW_SIZE and posts a CM_SET_WINDOW_SIZE. Removing the CM_SET_WINDOW_SIZE message fixes the issue, so I'm sure it's because of that. Reading the SSH source code (and via tracing) I believe this triggers SSH to send a viewport size change event (TIOCGWINSZ) and I think this corrupts the output, because of an ensuing race condition between server and client. (This update happens on every single viewport position change after all.)

I would prefer writing

    THROW_IF_FAILED(info.SetViewportOrigin(true, position, true));

but it seems like SetViewport and SetViewportOrigin aren't equivalent? Is that a bug? It would also need to post a window update to update the scrollbar.

Any thoughts about my findings? 😊

@carlos-zamora carlos-zamora added Area-VT Virtual Terminal sequence support and removed Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting labels Aug 2, 2023
@carlos-zamora carlos-zamora added this to the Terminal v1.19 milestone Aug 2, 2023
@j4james
Copy link
Collaborator

j4james commented Aug 3, 2023

Reading the SSH source code (and via tracing) I believe this triggers SSH to send a viewport size change event (TIOCGWINSZ) and I think this corrupts the output,

OK, that makes some sense. That sounds similar to the issue mentioned in PowerShell/openssh-portable#426.

But note that SetConsoleWindowInfo was the recommended method for console apps to scroll the screen buffer, so regardless of what changes we make to our line feed implementation, this would assumedly still be an issue for SSH wouldn't it?

I would prefer writing

    THROW_IF_FAILED(info.SetViewportOrigin(true, position, true));

I'm inclined to agree with this, because aside from SetConsoleWindowInfo, that's the approach most other parts of the API use for scrolling the viewport (typically as an action to keep the cursor visible, but it comes to the same thing).

But note that SetViewportOrigin just ends up calls SetViewport anyway, just with a lot more steps along the way. How much of that additional code is actually needed, I don't know, but unless you're desperate to optimise things, it seems safest to stick with what's there if it's working.

Some things I'd want to look out for:

  • Do the VT erase display operations still work as expected? _EraseAll and _EraseScrollback also follow this code path to update the viewport position.
  • Does everything still work reasonably if the user is scrolled back (or scrolled forward for that matter) when output is received that updates the viewport?
  • Do these operations still update the buffer bottom correctly? Check the "Disable Scroll-Forward option" in the conhost settings.
  • Do things still work correctly when working with multiple buffers, and you're writing to one of the background buffers rather than the visible one?

I think it should be fine, but I also thought SetConsoleWindowInfo would be fine, so what do I know?

@lhecker
Copy link
Member Author

lhecker commented Aug 3, 2023

(TIL: When you press Ctrl+Enter on GitHub it submits your comment. Whoops.)

But note that SetConsoleWindowInfo was the recommended method for console apps to scroll the screen buffer, so regardless of what changes we make to our line feed implementation, this would assumedly still be an issue for SSH wouldn't it?

This issue is basically another victim of #281. When we call SetConsoleWindowInfo we intentionally post a WINDOW_BUFFER_SIZE_EVENT and that breaks SSH. It used to work because (ironically) we didn't post a WINDOW_BUFFER_SIZE_EVENT when scrolling in VT mode despite the change in viewport position, which is... weird? But as noted in #281 the "quick hack" was a fairly bad, inconsistent idea, so I think the inconsistent behavior is to be expected unfortunately. I'm genuinely not confident to fix this bug by myself, because I'm unsure what to look out for when testing the behavior, or even just how to use these parts of the internal conhost APIs. (But your list helps a lot with testing!)

This issue makes using SSH pretty much impossible to use with conhost and we intend to about +/- ship it next year, so we're on the hook to fix it within the next +/- couple months. 😟 I wonder if we should address #281 and remove the qirky behavior as well? It would break some new apps, but it would fix just as many old apps, including SSH.

@microsoft-github-policy-service microsoft-github-policy-service bot added the In-PR This issue has a related PR label Sep 6, 2023
microsoft-github-policy-service bot pushed a commit that referenced this issue Sep 18, 2023
`SetConsoleWindowInfoImpl` calls `PostUpdateWindowSize`, which emits a
`CM_SET_WINDOW_SIZE` event, which causes `_InternalSetWindowSize` to be
called, which calls `ScreenBufferSizeChange` which then finally emits a
`WINDOW_BUFFER_SIZE_EVENT` event into the client input buffer.

This messes up applications like which make use of
`WINDOW_BUFFER_SIZE_EVENT` to perform potentially lossy operations.
In case of SSH this results in a resize (SIGWINCH) of the server-side
screen which similarly may result in a response by the shell, etc.
Since that happens over networks and is async, and because our conhost
VT viewport implementation appears to have a number of subtle bugs,
this results in duplicate output lines (sometimes hundreds).

Under Windows Terminal this issue is not as apparent, since ConPTY has
no viewport that can be moved and no scrollback. It only appears as an
issue if a terminal application reacts poorly to the SIGWINCH event.

Closes #15769

## Validation Steps Performed
* Set a breakpoint in `SynthesizeWindowBufferSizeEvent`
* Launch WSL and cause the viewport to move down
  No calls to `SynthesizeWindowBufferSizeEvent` ✅
* Execute `tput reset`
  Input line moves to row 0 ✅
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-VT Virtual Terminal sequence support In-PR This issue has a related PR Issue-Bug It either shouldn't be doing this or needs an investigation. Product-Conhost For issues in the Console codebase
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants