Fix #26 - Rework stdout/stderr handling by parente · Pull Request #38 · adtech-labs/spylon-kernel

parente · 2017-06-14T18:07:40Z

Totally new tact: force pyspark to pipe py4j JVM output back to the parent process and read the streams with a small monkey patch. Opened https://issues.apache.org/jira/browse/SPARK-21094 about making this part of pyspark proper.

This change should completely fix all lost output from Scala and Spark by piping the entire JVM process to the kernel process. There's nowhere else for it to go now.

Surprisingly, output in the notebook now shows up under the proper cell too, even though the kernel unit tests for stdout/stderr fail now. The issue with the tests is that the main ioloop thread needs to yield to let the child process stream consumer threads make all waiting data available before the main thread can return the kernel execution result. Issue #21 remains open to address the problem. As it stands, the user experience is better at the expense of some protocol tests that were passing without reflecting the brokenness of output previously.

I'm opening this PR to make the changes visible for discussion and QA.

Force the pyspark.java_gateway.launch_gateway to pipe stdout/stderr to the kernel process so that it can be read in a thread instead of mucking with Scala Console streams. Opened https://issues.apache.org/jira/browse/SPARK-21094 about making the piping capability a part of the API.

They were working by timing luck before. Need to resolve adtech-labs#21 to re-enable. (Surprisingly, output in the notebook appears in the correct cell now even though the tests indicate it’s appearing after idle instead of before.)

codecov · 2017-06-14T18:07:42Z

Codecov Report

Merging #38 into master will decrease coverage by 0.02%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master      #38      +/-   ##
==========================================
- Coverage   85.21%   85.18%   -0.03%     
==========================================
  Files           6        6              
  Lines         399      378      -21     
==========================================
- Hits          340      322      -18     
+ Misses         59       56       -3

codecov · 2017-06-14T18:07:43Z

Codecov Report

Merging #38 into master will decrease coverage by 2.52%.
The diff coverage is 72.34%.

@@            Coverage Diff             @@
##           master      #38      +/-   ##
==========================================
- Coverage   85.21%   82.68%   -2.53%     
==========================================
  Files           6        6              
  Lines         399      387      -12     
==========================================
- Hits          340      320      -20     
- Misses         59       67       +8

ericdill

Just some commentary. This is terrifying code that I am not going to pretend that I fully understand.

[edit] I mean it's intrinsically terrifying. Not that your implementation is terrifying 😀

ericdill · 2017-06-15T15:32:21Z

spylon_kernel/scala_interpreter.py

+        kwargs['stderr'] = subprocess.PIPE
+        spark_jvm_proc = subprocess.Popen(*args, **kwargs)
+        return spark_jvm_proc
+    pyspark.java_gateway.Popen = Popen


ericdill · 2017-06-15T15:33:40Z

spylon_kernel/scala_interpreter.py

-                await asyncio.sleep(0, loop=self.loop)
-            else:
-                await asyncio.sleep(0.01, loop=self.loop)
+            buff = fd.read(8192)


what's the rationale for the chunk size here?

Greater-than-zero to specify that we don't want to wait until the pipe closes. Less than 65k which is (probably) the maximum pipe size these days (https://unix.stackexchange.com/questions/11946/how-big-is-the-pipe-buffer).

ericdill · 2017-06-15T15:33:59Z

spylon_kernel/scala_interpreter.py


-        If you want to get the result as a Python object, follow this with a
-        call to `last_result`.
+        Follow this with a call `last_result` to retrieve the result as a


s/with a call/with a call to/

ericdill · 2017-06-15T15:34:17Z

spylon_kernel/scala_interpreter.py

        ScalaException
            When there is a problem interpreting the code
        """
+        # Ensure the cell is not incomplete. Same approach taken by Apache Zeppelin.


is there an obvious place in the zeppelin code where you could link this comment?

https://github.com/apache/zeppelin/blob/3219218620e795769e6f65287f134b6a43e9c010/spark/src/main/java/org/apache/zeppelin/spark/SparkInterpreter.java#L1263

Adding as comment.

ericdill · 2017-06-15T15:35:03Z

spylon_kernel/scala_interpreter.py

+
        try:
-            res = self.jimain.interpret(code, synthetic)
+            res = self.jimain.interpret(code, False)


Can you include the rationale for hard coding the synthetic class option to False?

Nothing in the codebase here ever set it to True that I could find, so I preferred to remove it as YAGNI. As for what the parameter means, I haven't found anything in the API doc (http://www.scala-lang.org/api/2.12.1/scala-compiler/scala/tools/nsc/interpreter/IMain.html) but http://docs.scala-lang.org/glossary/#synthetic-class might apply and explain why False is the correct value. That's a guess, at best.

ericdill · 2017-06-15T15:35:35Z

test_spylon_kernel_jkt.py

    code_generate_error = "4 / 0"

    def test_execute_stderr(self):
+        raise SkipTest("needs execute result, stream output synchronization")


does pytest interpret unittest.SkipTest correctly?

Yes.

test_spylon_kernel_jkt.py::SpylonKernelTests::test_execute_stderr SKIPPED test_spylon_kernel_jkt.py::SpylonKernelTests::test_execute_stdout SKIPPED

ericdill · 2017-06-15T15:43:54Z

test/test_scala_kernel.py

    code = dedent("""\
        %%init_spark
-        application_name = 'Dave'
+        launcher.conf.spark.app.name = 'Dave'


nit: use something ungendered. strawman (yes I realize the irony of me using that term): launcher.conf.spark.app.name = whatzit

I've been seeking an ungendered version of strawman. Draft? Suggestion? Starting point? None are as good.

Spylon-kernel-task

I mean ungendered version of "strawman". I'll certainly replace "Dave".

ericdill · 2017-06-15T16:05:51Z

spylon_kernel/scala_interpreter.py

+        """
+        nonlocal spark_jvm_proc
+        # Override these in kwargs to avoid duplicate value errors
+        kwargs['bufsize'] = 0


should this match the size you're reading below (8k)?

No. Setting to zero sets the streams to unbuffered so they don't block and allow for short reads. Adding a comment with a link to the Python doc about it.

Ensures immediate output of JVM stdout/stderr stream bytes instead of buffering until cell execution completed because the main tornado ioloop is blocked. Fixes most disabled tests.

ericdill · 2017-06-16T15:45:39Z

last three commits LGTM

mariusvniekerk · 2017-06-16T16:10:17Z

Yeah looks good. Pretty scary but good :)

* stdout and stderr tests in test_spylon_kernel_jkt.py are duplicates of those in the base class * Various references to application_name remain

* Send stderr to the default JVM log4j location by default * Allow user to %%init_spark --stderr to capture it in the notebook * Keep working tests in order

run_tests.py handles it

ericdill · 2017-06-19T12:17:01Z

spylon_kernel/init_spark_magic.py

-        using Python code.
+    # Use argparse to parse the whitespace delimited cell magic options
+    # just as we would parse a command line.
+    @option(


does this use click under the hood?

optparse apparently. Will adjust the comment.

https://github.com/Calysto/metakernel/blob/d100ea4a101ed3f385ce0a5bf1592741125f7e29/metakernel/magic.py#L15

ericdill · 2017-06-19T12:18:18Z

spylon_kernel/init_spark_magic.py

+        help="Capture stderr in the notebook instead of in the kernel log"
+    )
+    def cell_init_spark(self, stderr=False):
+        """%%init_spark --stderr CODE - starts a SparkContext with a custom


I don't understand what CODE is supposed to be here. It seems like it is just supposed to be a boolean?

CODE is the body of the cell containing the Python code to initialize the Spark context. I'll try to make it clearly in the description.

ericdill · 2017-06-19T12:20:13Z

spylon_kernel/scala_interpreter.py

+            try:
+                handler(chunk)
+            except Exception as ex:
+                self.log.exception('Exception handling stdout')


log.exception() automagically bundles the traceback right?

ericdill · 2017-06-19T12:21:06Z

test/test_scala_kernel.py


-@pytest.mark.skip("fails randomly, maybe because interpreter is reused")
-def test_stderr(spylon_kernel):
+@pytest.mark.skip('fails randomly, possibly because of mock reuse across tests')


can you walk me through your thoughts on mock reuse ~~in person~~ today?

[edit] not sure why I specified "in person" 🤷‍♂️

In-person is a good idea. Let's do that tomorrow.

Summary: I spent a good couple of hours trying to track down why this test is flaky and a better way to write it. I wound up back where I started with this test disabled. Which isn't to say that stdout/err are untested: they are as part of the jupyter_kernel_test suite that spawns a kernel and puts it through the paces before any of the spylon specific tests.

ericdill · 2017-06-19T12:29:03Z

All of this LGTM. I'm 👍 on getting this in and continuing to iterate in subsequent PRs. I'm starting to just go commit-by-commit which is going to get tricky to keep track of

parente · 2017-06-19T12:39:21Z

Thanks for all the feedback @ericdill . I agree with merging real-soon-now and continuing in other PRs.

parente added 3 commits June 14, 2017 13:52

Use launcher.spark.app.name instead of invented application_name

12434f4

TST: Disable stdout/stderr tests

3cafe2b

They were working by timing luck before. Need to resolve adtech-labs#21 to re-enable. (Surprisingly, output in the notebook appears in the correct cell now even though the tests indicate it’s appearing after idle instead of before.)

parente changed the title ~~[WIP} Fix #26 - Rework stdout/stderr handling~~ [WIP] Fix #26 - Rework stdout/stderr handling Jun 14, 2017

ericdill reviewed Jun 15, 2017

View reviewed changes

parente added 3 commits June 15, 2017 16:18

DOC: Address code review, primarily with comments

31c6388

DOC: Correct parameter documentation

59f515e

Call stream handlers directly

f4af5b7

Ensures immediate output of JVM stdout/stderr stream bytes instead of buffering until cell execution completed because the main tornado ioloop is blocked. Fixes most disabled tests.

parente added 4 commits June 18, 2017 22:22

DOC: Remove bogus tests and doc

0acd825

* stdout and stderr tests in test_spylon_kernel_jkt.py are duplicates of those in the base class * Various references to application_name remain

ENH: Option to direct stderr to notebook or log

f2b789e

* Send stderr to the default JVM log4j location by default * Allow user to %%init_spark --stderr to capture it in the notebook * Keep working tests in order

TST: Add code execution test

4a95b7f

TST: Remove unnecessary coverage config

8905006

run_tests.py handles it

ericdill reviewed Jun 19, 2017

View reviewed changes

parente changed the title ~~[WIP] Fix #26 - Rework stdout/stderr handling~~ Fix #26 - Rework stdout/stderr handling Jun 19, 2017

ericdill reviewed Jun 19, 2017

View reviewed changes

DOC: Fix optparse ref, clarify magic cell body

72d8ce2

ericdill approved these changes Jun 19, 2017

View reviewed changes

parente merged commit a1ef663 into adtech-labs:master Jun 19, 2017

Conversation

parente commented Jun 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jun 14, 2017

Codecov Report

Uh oh!

codecov bot commented Jun 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ericdill left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ericdill commented Jun 16, 2017

Uh oh!

mariusvniekerk commented Jun 16, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ericdill Jun 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ericdill commented Jun 19, 2017

Uh oh!

parente commented Jun 19, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

parente commented Jun 14, 2017 •

edited

Loading

codecov bot commented Jun 14, 2017 •

edited

Loading

ericdill left a comment •

edited

Loading

ericdill Jun 19, 2017 •

edited

Loading