New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let user choose how to deal with execution result. #107
Conversation
Codecov Report
@@ Coverage Diff @@
## master #107 +/- ##
==========================================
+ Coverage 63.23% 63.27% +0.03%
==========================================
Files 8 8
Lines 922 923 +1
==========================================
+ Hits 583 584 +1
Misses 339 339 |
papermill/execute.py
Outdated
print("Input Notebook: %s" % get_pretty_path(notebook)) | ||
print("Output Notebook: %s" % get_pretty_path(output)) | ||
# print("Input Notebook: %s" % get_pretty_path(notebook)) | ||
# print("Output Notebook: %s" % get_pretty_path(output)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is useful to keep printing the input and output paths. (If we decide it isn't useful we should delete the code instead of commenting it out)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get it, it doesn't matter much :)
papermill/execute.py
Outdated
# Write final Notebook to disk. | ||
write_ipynb(nb, output) | ||
raise_for_execution_errors(nb, output) | ||
if output: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> if output is not None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get it 👍
papermill/execute.py
Outdated
if output: | ||
# Write final Notebook to disk. | ||
write_ipynb(nb, output) | ||
raise_for_execution_errors(nb, output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with this only executed when we write the notebook, what happens if there is an exception and the notebook isn't written to a file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@betatim Thanks, perhaps I have a wrong indent with raise_for_execution_errors(nb, output)
Hey! The idea is to allow people to use This change is principle fine with me. Needs some tests :) |
@betatim The main purpose of this PR is to reduce the time of I will test this at local and perhaps also add some unit tests. Thanks a ton! |
papermill/execute.py
Outdated
if output is not None: | ||
# Write final Notebook to disk. | ||
write_ipynb(nb, output) | ||
return nbformat.writes(nb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will write the entire notebook to stdout, correct? We definitely don't want that to always happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rgbkrk Thanks, but what if I want to deal with output again? The only way is to read output file for now. Is there a way to drop file I/O step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rgbkrk This use case is in Python API, not in the command line. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh silly me, I see!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, it would be much better to return back the notebook object instead of the raw string? That allows you to strip out exactly what you need without having to deserialize the whole thing again. It should make testing the return much easier as well.
if output is not None:
# Write final Notebook to disk.
write_ipynb(nb, output)
return nb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rgbkrk Yeah, dealing with notebook object is much easier, I made the change. Please review, thanks!
Two comments: |
papermill/execute.py
Outdated
@@ -136,7 +136,7 @@ def log_outputs(cell): | |||
|
|||
|
|||
def execute_notebook(notebook, | |||
output, | |||
output=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a spec addition for this behavior :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jarrekk, I'm a bit confused why we would have output as None for the default? In general, output will need to go somewhere, and it's likely simpler to be explicit about where the output goes. Would you be able to give some additional context to specifically how you see this working? Thanks!
papermill/execute.py
Outdated
@@ -69,7 +67,7 @@ def preprocess(self, nb, resources): | |||
cell.outputs = [] | |||
|
|||
# Execute each cell and update the output in real time. | |||
with futures.ThreadPoolExecutor(max_workers=1) as executor: | |||
with futures.ThreadPoolExecutor(max_workers=4) as executor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why increase this? I thought the reason for using a thread pool here is to move the execution of the notebook to somewhere else, not to run things in parallel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@betatim Yeah, here I just want to try, I will change it back. :)
papermill/execute.py
Outdated
@@ -79,7 +77,6 @@ def preprocess(self, nb, resources): | |||
|
|||
for index, cell in execution_iterator: | |||
cell.metadata["papermill"]["status"] = RUNNING | |||
future = executor.submit(write_ipynb, nb, output_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change looks wrong; you sure you meant to mess with the future pool execution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MSeal Thanks for your comment, here if I don't use output option in https://github.com/nteract/papermill/blob/master/papermill/execute.py#L138, I will get output file error, if I remove these code, it would be OK, perhaps I need a better way to deal with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to check if output_path is None
and not always write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@betatim Thanks, I added this now. :)
papermill/execute.py
Outdated
@@ -151,8 +147,6 @@ def execute_notebook(notebook, | |||
progress_bar (bool): Flag for whether or not to show the progress bar. | |||
log_output (bool): Flag for whether or not to write notebook output to stderr. | |||
""" | |||
print("Input Notebook: %s" % get_pretty_path(notebook)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please leave these in, you can change get_pretty_path to handle None instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same about dealing with the output file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From your comment I don't think you need to remove this. This generates some output on stdout but that doesn't get in the way of what you are trying to do.
I'm still confused about what the end goal is. I originally thought you want to call |
@betatim Thanks, I want to use it in papermill API: #!/usr/bin/env python
nb_object = papermill.execute_notebook(input_file, parameters=parameters) # without output parameter
metadata = nb_object.metadata
...
# continue deal with nb_object with python code But papermill also supports command line, so it would be better if this feature support on both Python API and command line. Perhaps an example: #!/usr/bin/env bash
OUTPUT=$(papermill local/input.ipynb -p alpha 0.6 -p l1_ratio 0.1) For me, I care about the first one(papermill API). :) |
Hi @jarrekk,
Wouldn't you see the same behavior with:
or possibly:
I'm a bit concerned that giving a default of None to output would give users who are new to using this unexpected results (i.e. they would expect the output notebook to be saved somewhere and would not have it saved if using the defaults). |
@willingc |
@willingc I updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pending completion of @willingc's request, I'm in favor of the changes as is for returning the notebook from execute_notebook
@@ -185,6 +185,8 @@ def execute_notebook(notebook, | |||
# Write final Notebook to disk. | |||
write_ipynb(nb, output) | |||
raise_for_execution_errors(nb, output) | |||
# always return notebook object | |||
return nb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with this approach.
return notebook object instead of raw string let user choose how to deal with execution result. let user choose how to deal with execution result. let user choose how to deal with execution result. update max_worker back add output_path update execute.py update execute.py
With the rebase I'm +1 |
@MSeal Thanks! |
@jarrekk We're in the home stretch now :-) Please update the docstring for the function since we use the docstring to generate some documentation. Here's a suggestion of two lines to add to the existing docstring. Thanks!
|
@willingc Great, many thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Thanks @jarrekk for being flexible and collaborative. I'm happy that you contributed. Nicely done. 🍰 |
👍 ! |
Hi, here I add this to let the user choose how to deal with execution result if I use it in an application, I can use variable to deal with the result, no need to read again from the output file. Thanks.