Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert coalescese_streams function to CoalesceStreamsPreprocessor #2089

Merged
merged 1 commit into from Dec 26, 2023

Conversation

ryan-williams
Copy link
Contributor

coalescese_streams was the last remaining "decorated function Preprocessor", and I couldn't find an example of how to use it.

Here it is converted to be a Preprocessor subclass, like the others. A top-level --coalesce-streams flag is also added, so this now works:

jupyter nbconvert --coalesce-streams --inplace notebook.ipynb

to normalize a notebook's otherwise-nondeterministic stdout/stderr chunking.

For example, a cell might end up with two stderr lines split across two outputs:

{
  "cells": [
    {
      "cell_type": "code",
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "before sleep\n"
          ]
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "after sleep\n"
          ]
        }
      ],
      "source": [
        "import sys\n",
        "import time\n",
        "sys.stderr.write(\"before sleep\\n\")\n",
        "time.sleep(1)\n",
        "sys.stderr.write(\"after sleep\\n\");"
      ]
    }
  ]
}

(I usually see this when running notebooks via Papermill)

jupyter nbconvert --coalesce-streams --inplace combines consecutive outputs from the same stream, within each cell:

## inserted before /cells/0/outputs/0:
+  output:
+    output_type: stream
+    name: stderr
+    text:
+      before sleep
+      after sleep

## deleted /cells/0/outputs/0-1:
-  output:
-    output_type: stream
-    name: stderr
-    text:
-      before sleep
-  output:
-    output_type: stream
-    name: stderr
-    text:
-      after sleep

I also implemented this functionality in juq merge-outputs, before noticing this preprocessor!

@ryan-williams ryan-williams force-pushed the coalesce-streams branch 2 times, most recently from ba8cefa to 26a82bd Compare December 26, 2023 16:12
`coalescese_streams` was the last remaining "decorated function Preprocessor", and I couldn't find an example of how to use it. Here it is converted to be a Preprocessor subclass, like the others. A top-level --coalesce-streams flag is also added.
Copy link
Member

@blink1073 blink1073 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@blink1073 blink1073 merged commit ece13fd into jupyter:main Dec 26, 2023
21 of 24 checks passed
@ryan-williams ryan-williams deleted the coalesce-streams branch December 26, 2023 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants