New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rebase SparkSubmitTask onto ExternalProgramTask #1525
Rebase SparkSubmitTask onto ExternalProgramTask #1525
Conversation
The |
@@ -189,12 +146,18 @@ def hadoop_conf_dir(self): | |||
return configuration.get_config().get("spark", "hadoop-conf-dir", None) | |||
|
|||
def get_environment(self): | |||
env = os.environ.copy() | |||
env = super(SparkSubmitTask, self).program_environment() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a big fan of calling methods on the parent class. should only be doable for constructors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! It's just doing os.environ.copy()
anyway. I'll change it.
@deuxpi Good idea. But It could be a nice new feature though, to add an option to turn it off? |
@ehdr Ack! Sorry, I meant |
Yes I think this is acceptable. :) |
I like this patch. I'm +1 the idea. @ehdr, When you think you've fixed all the comments from this PR, let us know and we'll merge. |
002d95b
to
db48a78
Compare
Add a property to ExternalProgramTask, which can be overridden to only log `stderr` when the program execution fails. The default remains to always log `stderr`, even if execution succeeds.
Seems to be an obsolete left-over from something.
db48a78
to
8d7876c
Compare
The public interface of `SparkSubmitTask` remains as-is. However, there will be subtle changes to the output to `stdout` and logs (e.g. 'Program failed[...]' with this patch vs. 'Spark job failed[...]' before). Also it will raise a `ExternalProgramRunError` on execution errors instead of a `SparkJobError` as before.
8d7876c
to
a0417c6
Compare
Great! All improvements are welcome and most code in this code base is not really elegant ^^ |
…-task Rebase SparkSubmitTask onto ExternalProgramTask
As a continuation of #1520, this is an example of how
SparkSubmitTask
could be "rebased" ontoExternalProgramTask
, since most of the code is shared between them. Not sure if this is a good way to go?The public interface of
SparkSubmitTask
remains as-is.However, there will be subtle changes to the output to
stdout
and logs (e.g. 'Program failed[...]' with this patch vs. 'Spark job failed[...]' before). Also it will raise aExternalProgramRunError
on execution errors instead of aSparkJobError
as before. Is this considered acceptable deviations from backwards compatibility?Finally, it will now log[changed back to preserve the old behavior in a later commit, thanks to comment from @deuxpi]stdout
andstderr
whenever it's (and only when it's) non-empty, even if the process exits successfully (return code 0). Previously,stderr
was only logged if the return code was non-zero.