Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let user choose how to deal with execution result. #107

Merged
merged 3 commits into from Feb 2, 2018

Conversation

jarrekk
Copy link
Contributor

@jarrekk jarrekk commented Jan 23, 2018

Hi, here I add this to let the user choose how to deal with execution result if I use it in an application, I can use variable to deal with the result, no need to read again from the output file. Thanks.

@codecov
Copy link

codecov bot commented Jan 23, 2018

Codecov Report

Merging #107 into master will increase coverage by 0.03%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #107      +/-   ##
==========================================
+ Coverage   63.23%   63.27%   +0.03%     
==========================================
  Files           8        8              
  Lines         922      923       +1     
==========================================
+ Hits          583      584       +1     
  Misses        339      339

print("Input Notebook: %s" % get_pretty_path(notebook))
print("Output Notebook: %s" % get_pretty_path(output))
# print("Input Notebook: %s" % get_pretty_path(notebook))
# print("Output Notebook: %s" % get_pretty_path(output))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is useful to keep printing the input and output paths. (If we decide it isn't useful we should delete the code instead of commenting it out)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get it, it doesn't matter much :)

# Write final Notebook to disk.
write_ipynb(nb, output)
raise_for_execution_errors(nb, output)
if output:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> if output is not None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get it 👍

if output:
# Write final Notebook to disk.
write_ipynb(nb, output)
raise_for_execution_errors(nb, output)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with this only executed when we write the notebook, what happens if there is an exception and the notebook isn't written to a file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@betatim Thanks, perhaps I have a wrong indent with raise_for_execution_errors(nb, output)

@betatim
Copy link
Member

betatim commented Jan 23, 2018

Hey! The idea is to allow people to use execute_notebook() from python to run notebooks? Just wanted to double check because I was unsure after reading the title.

This change is principle fine with me.

Needs some tests :)

@jarrekk
Copy link
Contributor Author

jarrekk commented Jan 23, 2018

@betatim The main purpose of this PR is to reduce the time of papermill.execute_notebook function, in my case, I need to execute a notebook with parameters, then deal output. But default output is in the output file, I have to read this file again, so here I want to get output directly instead of reading the file.

I will test this at local and perhaps also add some unit tests.

Thanks a ton!

if output is not None:
# Write final Notebook to disk.
write_ipynb(nb, output)
return nbformat.writes(nb)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will write the entire notebook to stdout, correct? We definitely don't want that to always happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rgbkrk Thanks, but what if I want to deal with output again? The only way is to read output file for now. Is there a way to drop file I/O step?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rgbkrk This use case is in Python API, not in the command line. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh silly me, I see!

Copy link
Member

@rgbkrk rgbkrk Jan 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, it would be much better to return back the notebook object instead of the raw string? That allows you to strip out exactly what you need without having to deserialize the whole thing again. It should make testing the return much easier as well.

if output is not None:
  # Write final Notebook to disk.
  write_ipynb(nb, output)

return nb

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rgbkrk Yeah, dealing with notebook object is much easier, I made the change. Please review, thanks!

@MSeal
Copy link
Member

MSeal commented Jan 23, 2018

Two comments:
We'd likely want this argument passed through from the cli as well.
If you rerun a notebook in place you will slowly produce additional default_parameter cells over time. You may want to change the logic on injecting parameter cells to replace rather than inject if the prior cell has default_parameter tag defined.

@@ -136,7 +136,7 @@ def log_outputs(cell):


def execute_notebook(notebook,
output,
output=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a spec addition for this behavior :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jarrekk, I'm a bit confused why we would have output as None for the default? In general, output will need to go somewhere, and it's likely simpler to be explicit about where the output goes. Would you be able to give some additional context to specifically how you see this working? Thanks!

@@ -69,7 +67,7 @@ def preprocess(self, nb, resources):
cell.outputs = []

# Execute each cell and update the output in real time.
with futures.ThreadPoolExecutor(max_workers=1) as executor:
with futures.ThreadPoolExecutor(max_workers=4) as executor:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why increase this? I thought the reason for using a thread pool here is to move the execution of the notebook to somewhere else, not to run things in parallel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@betatim Yeah, here I just want to try, I will change it back. :)

@@ -79,7 +77,6 @@ def preprocess(self, nb, resources):

for index, cell in execution_iterator:
cell.metadata["papermill"]["status"] = RUNNING
future = executor.submit(write_ipynb, nb, output_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change looks wrong; you sure you meant to mess with the future pool execution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MSeal Thanks for your comment, here if I don't use output option in https://github.com/nteract/papermill/blob/master/papermill/execute.py#L138, I will get output file error, if I remove these code, it would be OK, perhaps I need a better way to deal with this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to check if output_path is None and not always write.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@betatim Thanks, I added this now. :)

@@ -151,8 +147,6 @@ def execute_notebook(notebook,
progress_bar (bool): Flag for whether or not to show the progress bar.
log_output (bool): Flag for whether or not to write notebook output to stderr.
"""
print("Input Notebook: %s" % get_pretty_path(notebook))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please leave these in, you can change get_pretty_path to handle None instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same about dealing with the output file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From your comment I don't think you need to remove this. This generates some output on stdout but that doesn't get in the way of what you are trying to do.

@betatim
Copy link
Member

betatim commented Jan 25, 2018

I'm still confused about what the end goal is. I originally thought you want to call execute_notebook from a different bit of python code, and then do something with the executed notebook. From your comments it sounds like you want to call the papermill executable from the command-line and then capture stdout though. Maybe you could create a short example (in code) of how you want to use it.

@jarrekk
Copy link
Contributor Author

jarrekk commented Jan 25, 2018

@betatim Thanks, I want to use it in papermill API:

#!/usr/bin/env python
nb_object = papermill.execute_notebook(input_file, parameters=parameters) # without output parameter
metadata  = nb_object.metadata
...
# continue deal with nb_object with python code

But papermill also supports command line, so it would be better if this feature support on both Python API and command line. Perhaps an example:

#!/usr/bin/env bash
OUTPUT=$(papermill local/input.ipynb -p alpha 0.6 -p l1_ratio 0.1)

For me, I care about the first one(papermill API). :)

@willingc
Copy link
Member

willingc commented Jan 31, 2018

Hi @jarrekk,

#!/usr/bin/env python
nb_object = papermill.execute_notebook(input_file, parameters=parameters) # without output parameter
metadata = nb_object.metadata
...

Wouldn't you see the same behavior with:

nb_object = papermill.execute_notebook(input_file, None, parameters=parameters)

or possibly:

nb_object = papermill.execute_notebook(input_file, '/dev/null', parameters=parameters)

I'm a bit concerned that giving a default of None to output would give users who are new to using this unexpected results (i.e. they would expect the output notebook to be saved somewhere and would not have it saved if using the defaults).

@jarrekk
Copy link
Contributor Author

jarrekk commented Feb 1, 2018

@willingc
I get your point, perhaps it should use a way to get the nb_object without effect on output_file. But function execute_notebook has no response, so in nb_object = papermill.execute_notebook(input_file, '/dev/null', parameters=parameters), nb_object is empty.

@jarrekk
Copy link
Contributor Author

jarrekk commented Feb 1, 2018

@willingc I updated execute.py make it have a response in function execute_notebook. :)

@willingc
Copy link
Member

willingc commented Feb 1, 2018

@jarrekk Thanks for the updates. I would go ahead and add a line in the docstring indicating what is returned.

@MSeal @betatim Are you cool with this approach?

Copy link
Member

@rgbkrk rgbkrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending completion of @willingc's request, I'm in favor of the changes as is for returning the notebook from execute_notebook

@@ -185,6 +185,8 @@ def execute_notebook(notebook,
# Write final Notebook to disk.
write_ipynb(nb, output)
raise_for_execution_errors(nb, output)
# always return notebook object
return nb
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this approach.

return notebook object instead of raw string

let user choose how to deal with execution result.

let user choose how to deal with execution result.

let user choose how to deal with execution result.

update max_worker back

add output_path

update execute.py

update execute.py
@MSeal
Copy link
Member

MSeal commented Feb 2, 2018

With the rebase I'm +1

@jarrekk
Copy link
Contributor Author

jarrekk commented Feb 2, 2018

@MSeal Thanks!

@willingc
Copy link
Member

willingc commented Feb 2, 2018

@jarrekk We're in the home stretch now :-)

Please update the docstring for the function since we use the docstring to generate some documentation. Here's a suggestion of two lines to add to the existing docstring. Thanks!

"""Executes a single notebook locally.
 
     Args:
         notebook (str): Path to input notebook.
         output (str): Path to save executed notebook.
         parameters (dict): Arbitrary keyword arguments to pass to the notebook parameters.
         kernel_name (str): Name of kernel to execute the notebook against.
         progress_bar (bool): Flag for whether or not to show the progress bar.
         log_output (bool): Flag for whether or not to write notebook output to stderr.

     Returns:
         nb (NotebookNode): executed notebook object
"""

@jarrekk
Copy link
Contributor Author

jarrekk commented Feb 2, 2018

@willingc Great, many thanks!

Copy link
Member

@willingc willingc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@willingc willingc merged commit 5917482 into nteract:master Feb 2, 2018
@willingc
Copy link
Member

willingc commented Feb 2, 2018

Thanks @jarrekk for being flexible and collaborative. I'm happy that you contributed. Nicely done. 🍰

@betatim
Copy link
Member

betatim commented Feb 4, 2018

👍 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants