Skip to content

Fix #26 - Rework stdout/stderr handling#38

Merged
parente merged 11 commits intoadtech-labs:masterfrom
parente:fix-26-rework-stdout-stderr
Jun 19, 2017
Merged

Fix #26 - Rework stdout/stderr handling#38
parente merged 11 commits intoadtech-labs:masterfrom
parente:fix-26-rework-stdout-stderr

Conversation

@parente
Copy link
Contributor

@parente parente commented Jun 14, 2017

Totally new tact: force pyspark to pipe py4j JVM output back to the parent process and read the streams with a small monkey patch. Opened https://issues.apache.org/jira/browse/SPARK-21094 about making this part of pyspark proper.

This change should completely fix all lost output from Scala and Spark by piping the entire JVM process to the kernel process. There's nowhere else for it to go now.

Surprisingly, output in the notebook now shows up under the proper cell too, even though the kernel unit tests for stdout/stderr fail now. The issue with the tests is that the main ioloop thread needs to yield to let the child process stream consumer threads make all waiting data available before the main thread can return the kernel execution result. Issue #21 remains open to address the problem. As it stands, the user experience is better at the expense of some protocol tests that were passing without reflecting the brokenness of output previously.

I'm opening this PR to make the changes visible for discussion and QA.

parente added 3 commits June 14, 2017 13:52
Force the pyspark.java_gateway.launch_gateway to
pipe stdout/stderr to the kernel process so that
it can be read in a thread instead of mucking 
with Scala Console streams.

Opened https://issues.apache.org/jira/browse/SPARK-21094
about making the piping capability a part of
the API.
They were working by timing luck before. Need
to resolve adtech-labs#21 to re-enable. (Surprisingly, 
output in the notebook appears in the correct
cell now even though the tests indicate it’s 
appearing after idle instead of before.)
@codecov
Copy link

codecov bot commented Jun 14, 2017

Codecov Report

Merging #38 into master will decrease coverage by 0.02%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master      #38      +/-   ##
==========================================
- Coverage   85.21%   85.18%   -0.03%     
==========================================
  Files           6        6              
  Lines         399      378      -21     
==========================================
- Hits          340      322      -18     
+ Misses         59       56       -3

@codecov
Copy link

codecov bot commented Jun 14, 2017

Codecov Report

Merging #38 into master will decrease coverage by 2.52%.
The diff coverage is 72.34%.

@@            Coverage Diff             @@
##           master      #38      +/-   ##
==========================================
- Coverage   85.21%   82.68%   -2.53%     
==========================================
  Files           6        6              
  Lines         399      387      -12     
==========================================
- Hits          340      320      -20     
- Misses         59       67       +8

@parente parente changed the title [WIP} Fix #26 - Rework stdout/stderr handling [WIP] Fix #26 - Rework stdout/stderr handling Jun 14, 2017
Copy link

@ericdill ericdill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some commentary. This is terrifying code that I am not going to pretend that I fully understand.

[edit] I mean it's intrinsically terrifying. Not that your implementation is terrifying 😀

kwargs['stderr'] = subprocess.PIPE
spark_jvm_proc = subprocess.Popen(*args, **kwargs)
return spark_jvm_proc
pyspark.java_gateway.Popen = Popen

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😱

await asyncio.sleep(0, loop=self.loop)
else:
await asyncio.sleep(0.01, loop=self.loop)
buff = fd.read(8192)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the rationale for the chunk size here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greater-than-zero to specify that we don't want to wait until the pipe closes. Less than 65k which is (probably) the maximum pipe size these days (https://unix.stackexchange.com/questions/11946/how-big-is-the-pipe-buffer).


If you want to get the result as a Python object, follow this with a
call to `last_result`.
Follow this with a call `last_result` to retrieve the result as a

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/with a call/with a call to/

ScalaException
When there is a problem interpreting the code
"""
# Ensure the cell is not incomplete. Same approach taken by Apache Zeppelin.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there an obvious place in the zeppelin code where you could link this comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


try:
res = self.jimain.interpret(code, synthetic)
res = self.jimain.interpret(code, False)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you include the rationale for hard coding the synthetic class option to False?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing in the codebase here ever set it to True that I could find, so I preferred to remove it as YAGNI. As for what the parameter means, I haven't found anything in the API doc (http://www.scala-lang.org/api/2.12.1/scala-compiler/scala/tools/nsc/interpreter/IMain.html) but http://docs.scala-lang.org/glossary/#synthetic-class might apply and explain why False is the correct value. That's a guess, at best.

code_generate_error = "4 / 0"

def test_execute_stderr(self):
raise SkipTest("needs execute result, stream output synchronization")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does pytest interpret unittest.SkipTest correctly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

test_spylon_kernel_jkt.py::SpylonKernelTests::test_execute_stderr SKIPPED
test_spylon_kernel_jkt.py::SpylonKernelTests::test_execute_stdout SKIPPED

code = dedent("""\
%%init_spark
application_name = 'Dave'
launcher.conf.spark.app.name = 'Dave'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use something ungendered. strawman (yes I realize the irony of me using that term): launcher.conf.spark.app.name = whatzit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been seeking an ungendered version of strawman. Draft? Suggestion? Starting point? None are as good.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spylon-kernel-task

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean ungendered version of "strawman". I'll certainly replace "Dave".

"""
nonlocal spark_jvm_proc
# Override these in kwargs to avoid duplicate value errors
kwargs['bufsize'] = 0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this match the size you're reading below (8k)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Setting to zero sets the streams to unbuffered so they don't block and allow for short reads. Adding a comment with a link to the Python doc about it.

parente added 3 commits June 15, 2017 16:18
Ensures immediate output of JVM stdout/stderr
stream bytes instead of buffering until cell
execution completed because the main tornado
ioloop is blocked.

Fixes most disabled tests.
@ericdill
Copy link

last three commits LGTM

@mariusvniekerk
Copy link
Collaborator

Yeah looks good. Pretty scary but good :)

parente added 4 commits June 18, 2017 22:22
* stdout and stderr tests in test_spylon_kernel_jkt.py
  are duplicates of those in the base class
* Various references to application_name remain
* Send stderr to the default JVM log4j location
  by default
* Allow user to %%init_spark --stderr to capture
  it in the notebook
* Keep working tests in order
using Python code.
# Use argparse to parse the whitespace delimited cell magic options
# just as we would parse a command line.
@option(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this use click under the hood?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@parente parente changed the title [WIP] Fix #26 - Rework stdout/stderr handling Fix #26 - Rework stdout/stderr handling Jun 19, 2017
help="Capture stderr in the notebook instead of in the kernel log"
)
def cell_init_spark(self, stderr=False):
"""%%init_spark --stderr CODE - starts a SparkContext with a custom

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what CODE is supposed to be here. It seems like it is just supposed to be a boolean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CODE is the body of the cell containing the Python code to initialize the Spark context. I'll try to make it clearly in the description.

try:
handler(chunk)
except Exception as ex:
self.log.exception('Exception handling stdout')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log.exception() automagically bundles the traceback right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.


@pytest.mark.skip("fails randomly, maybe because interpreter is reused")
def test_stderr(spylon_kernel):
@pytest.mark.skip('fails randomly, possibly because of mock reuse across tests')
Copy link

@ericdill ericdill Jun 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you walk me through your thoughts on mock reuse in person today?

[edit] not sure why I specified "in person" 🤷‍♂️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In-person is a good idea. Let's do that tomorrow.

Summary: I spent a good couple of hours trying to track down why this test is flaky and a better way to write it. I wound up back where I started with this test disabled. Which isn't to say that stdout/err are untested: they are as part of the jupyter_kernel_test suite that spawns a kernel and puts it through the paces before any of the spylon specific tests.

@ericdill
Copy link

All of this LGTM. I'm 👍 on getting this in and continuing to iterate in subsequent PRs. I'm starting to just go commit-by-commit which is going to get tricky to keep track of

@parente
Copy link
Contributor Author

parente commented Jun 19, 2017

Thanks for all the feedback @ericdill . I agree with merging real-soon-now and continuing in other PRs.

@parente parente merged commit a1ef663 into adtech-labs:master Jun 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants