Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when notebook scrolling is on and a lot of output is streamed in #182636

Closed
r3m0t opened this issue May 16, 2023 · 9 comments
Closed

Crash when notebook scrolling is on and a lot of output is streamed in #182636

r3m0t opened this issue May 16, 2023 · 9 comments

Comments

@r3m0t
Copy link
Contributor

r3m0t commented May 16, 2023

Does this issue occur when all extensions are disabled?: Yes/No

  • VS Code Version:
  • OS Version:
    Version: 1.79.0-insider (user setup)
    Commit: 2575777
    Date: 2023-05-12T05:21:06.392Z
    Electron: 22.5.2
    Chromium: 108.0.5359.215
    Node.js: 16.17.1
    V8: 10.8.168.25-electron.0
    OS: Windows_NT x64 10.0.19045
    Sandboxed: Yes

Steps to Reproduce:

  1. Install Jupyter and Python
  2. Ensure user setting "notebook.output.scrolling": true, "notebook.output.textLineLimit": 30 (this is the default)
  3. Create interactive window
  4. Run cell containing
import sys
for i in range(31500):
    print('{!r} {!r}'.format(list(range(10)), i), file=sys.stderr)

This works, it's only abou 1 MB of output.

  1. Run cell containing
import sys
for i in range(31500):
    print('{!r} {!r}'.format(list(range(10)), i), file=sys.stderr)
    sys.stderr.flush()

The renderer crashes!

@r3m0t
Copy link
Contributor Author

r3m0t commented May 16, 2023

@rebornix @DonJayamanne are you accepting PRs for this?

It looks like, there is no batching of events from the notebook, although you made processing each event more efficient last year.

The renderer is calculating the scroll height each time stderr is flushed

image

@r3m0t
Copy link
Contributor Author

r3m0t commented May 16, 2023

I guess OutputCell.updateContentAndRender needs to be debounced with requestAnimationFrame, but, when I tried it the resize observer started intermittently sending height of 0 back to the extension.

@rebornix
Copy link
Member

@r3m0t thanks for offering help, it would be very great. We might need to carefully design the debouncing to ensure a smooth incremental update experience though.

@r3m0t
Copy link
Contributor Author

r3m0t commented May 25, 2023

I had a look last week, but I found many concerns. It's not just the renderer. Basically, updateContentAndRender is an idempotent operation (an upsert) of the output into the rendered DOM. The approach of sending the full output to the renderer each time is quadratic-

import sys, json
res = []
total_message_length = 0

for i in range(31500):
    res.append('{!r} {!r}\n'.format(list(range(10)), i))
    message = json.dumps({'event': 'update_output', 'output_id': '01010239032-3210-321-312321321', 'contents': ''.join(res)})
    total_message_length += len(message)
print(f'Sending {total_message_length:,.0f} bytes with a final output of {len(message)}')
# Sending 18,556,686,495 bytes with a final output of 1185979

That's really excessive, and there's a lot of work needed in the output renderer as well. Each time it processes all the last 5000 lines for ANSI colors and links, instead of just the new part, and creates new div elements -

element.appendChild(handleANSIOutput(buffer.slice(-5000).join('\n'), trustHtml));
return element;
}
export function createOutputContent(id: string, outputs: string[], linesLimit: number, scrollable: boolean, trustHtml: boolean): HTMLElement {
const buffer = outputs.join('\n').split(/\r\n|\r|\n/g);
if (scrollable) {
return scrollableArrayOfString(id, buffer, trustHtml);

it would make more sense to send incremental content -

Separate issue as it's on the main host side- there is also quadratic behaviour in formatStreamText - this is called 31,500 times and searches the full string each time with these two includes calls -

if (!buffer.buffer.includes(BACKSPACE_CHARACTER) && !buffer.buffer.includes(CARRIAGE_RETURN_CHARACTER)) {

Basically, debouncing would require updateContentAndRender to stay idempotent which I don't agree with, and it would make the logic more complicated if you want to introduce a mix of idempotent and incremental messages later.

I might be able to send you some small changes to optimize https://github.com/microsoft/vscode/blob/143d9674dd470c8f3d3a5e4b930f5eb1fe9a91d3/extensions/notebook-renderers/src/index.ts , but I don't think performance will be good until you can tackle the above.

Or should I just tell my users to go to Terminal 🤣

@r3m0t
Copy link
Contributor Author

r3m0t commented May 25, 2023

Oh, and I think I'm basically suggesting unpicking #161023 - or optimizing for both cases (append usually but replace when \r or \b used as stream characters come into play)

@amunger
Copy link
Contributor

amunger commented May 25, 2023

We were considering incremental updates to the output here, but decided for a simpler change to fix that issue. As you noted, we look for characters that edit previous outputs and compress the buffer into a single output, which would be very difficult to support if we only sent text to append. We also need to consider 3rd party renderers that would break if we changed the message in such a way.

We have a hard limit of 5000 lines for text output in the renderer, so it might make sense to also limit the message sent to help mitigate this issue.

@r3m0t
Copy link
Contributor Author

r3m0t commented May 26, 2023

I don't think third party renderers need to be affected as the escape sequences are already only handled for a few hard coded MIME types. Are there other renderers for these types? Are there other MIME types that need to be handled incrementally? In the Jupyter protocol, you can't incrementally update an output of a MIME type. This can be private API to the built in stream renderers.

I do agree OutputItem[] is the wrong type when the escape sequence can "affect data behind it". I was thinking of an OutputItem.replaceVersionId which increments exactly when the new contents is different from the old contents in the range [0, oldContents.length). If it's unchanged then linkify, ANSI color processing and <span> creation can be skipped for the existing output.

@r3m0t
Copy link
Contributor Author

r3m0t commented Jun 12, 2023

Thanks for putting this on the iteration plan, can you tag it freeze-crash?

I'll leave you with my thoughts which may be useful in implementation, hopefully this doesn't send you down the wrong way!

  • I'm not sure my 18,556,686,495 bytes was accurate as I didn't check the behaviour of copyBufferIfNeeded in this case.
  • The first question if making big changes to this area is whether to keep stdout and stderr separate or mix them- more like a terminal or Visual Studio 2022 Python support. In VS the stdout and stderr are shown in the same place with different colours. It makes more sense to me and users at my employer than what Jupyter does. Jupyter wire protocol doesn't say how stdout/stderr should be displayed
  • As I mentioned there's inefficiencies at almost every layer of the stack wrt output.
    • The efficient one is the ext host and messages to the main process due to the inclusion of ICellExecuteOutputEditDto.append.
    • But when we reach ExtHostCell.setOutputItems it gets quadratic behaviour and we forget it's an append event.
    • Then backLayerWebview.ts and a couple of stack frames in the webview side are paying the quadratic price.
    • As well as the application/vnd.code.notebook.stdout renderer.
  • I would split CellExecutionUpdateType.OutputItems into two update types - one is always append and only contains application/vnd.code.notebook.stdout and/or stderr and the other is the normal one (which no longer needs an append case).
  • This follows all down the stack, if \r or \b is encountered then the event gets converted to the normal type of event before it gets to backLayerWebview & etc.
  • The debouncing is probably worthwhile even with these optimisations, as I couldn't find a generic way to avoid that Layout 10ms cost shown in the profiler, even if it was fully incremental all the way down. Once webviewPreloads.ts knows which showOutput messages are appendy, it'll be easy to implement the debouncing there.
  • A non-generic way I just thought of is to not measure the offsetHeight of the output element after rendering (
    const offsetHeight = this.element.offsetHeight;
    ) . After all, if there are enough lines for the scrollbar to appear, you know the output height will stay constant as line count increases. But, IMO you will hit inefficiencies elsewhere.

PS While writing I noticed MirrorNotebookDocument ignores NotebookOutputItemChangedEvent.append / NotebookOutputChangedEvent.append, I don't know the impact and there is a comment that might mean you already know this but I didn't understand it so I thought it was worth mentioning.

@rebornix
Copy link
Member

@amunger and I discussed how we can introduce incremental rendering to the stream output renderer and his experiments are looking promising. Here are some perf logs we captured

With incremental rendering

image

Every output append takes around 2ms and it's very consistent. Our hypothesis is once we reach the max height of the scrollable element, the height will not change anymore and it won't trigger any resize to its ancestor elements. Every append operation is only changing its internal content height, that's why the layout is very minimal and almost identical each time. With https://developer.mozilla.org/en-US/docs/Web/CSS/contain, it can probably be reduced further.

Without incremental rendering

image

Without incremental update, every update is a "remove" and then "insert" and the "insert" operation is getting bigger, thus the rendering time for each operation is getting longer. Starting from around 2ms, all the way to tens or hundreds of ms

image image

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants