Skip to content

Fix lost stdout/stderr in secondary threads#31

Merged
parente merged 2 commits intoadtech-labs:masterfrom
parente:fix-console-out
May 11, 2017
Merged

Fix lost stdout/stderr in secondary threads#31
parente merged 2 commits intoadtech-labs:masterfrom
parente:fix-console-out

Conversation

@parente
Copy link
Contributor

@parente parente commented May 11, 2017

NOTE: Builds on #30 which should be merged first.

Inject code to set Console stdout/stderr before executing user code. Origin code set them once in the main thread only. Spark creates additional threads and these were still being directed to the terminal.

It'd be nice if we could wrap the creation of the SparkContext in a block that does the redirect so that Spark's threads inherit our settings. However, because we're using pyspark to do that initialization, we don't have good control over that step.

This PR also increases the file read buffer size from Scala to Python from a line per poll to 4096 bytes. This reduces a visible lag in results appearing in a Jupyter notebook, and stabilizes the read size instead of having some very short and some very long reads.

Fixes #26 and #22.

@codecov
Copy link

codecov bot commented May 11, 2017

Codecov Report

Merging #31 into master will increase coverage by 0.99%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master      #31      +/-   ##
==========================================
+ Coverage   84.36%   85.35%   +0.99%     
==========================================
  Files           6        6              
  Lines         403      403              
==========================================
+ Hits          340      344       +4     
+ Misses         63       59       -4

Copy link
Collaborator

@marshall245 marshall245 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewed in person with @parente

@parente parente force-pushed the fix-console-out branch from 589160b to 32ef343 Compare May 11, 2017 14:21
@parente
Copy link
Contributor Author

parente commented May 11, 2017

Rebased on latest #30.

parente added 2 commits May 11, 2017 11:27
Fixes lost Spark output when Spark has multiple
threads and Console.setOut/setErr only affects
the current thread.
Read stdout/stderr in larger blocks of bytes
not line by line
@parente parente force-pushed the fix-console-out branch from 32ef343 to 7ee3e08 Compare May 11, 2017 15:27
@parente parente merged commit 1a6d127 into adtech-labs:master May 11, 2017
parente added a commit that referenced this pull request May 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants