Let user choose how to deal with execution result. #107

jarrekk · 2018-01-23T08:59:24Z

Hi, here I add this to let the user choose how to deal with execution result if I use it in an application, I can use variable to deal with the result, no need to read again from the output file. Thanks.

codecov · 2018-01-23T09:02:29Z

Codecov Report

Merging #107 into master will increase coverage by 0.03%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #107      +/-   ##
==========================================
+ Coverage   63.23%   63.27%   +0.03%     
==========================================
  Files           8        8              
  Lines         922      923       +1     
==========================================
+ Hits          583      584       +1     
  Misses        339      339

betatim · 2018-01-23T14:19:39Z

papermill/execute.py

-    print("Input Notebook:  %s" % get_pretty_path(notebook))
-    print("Output Notebook: %s" % get_pretty_path(output))
+    # print("Input Notebook:  %s" % get_pretty_path(notebook))
+    # print("Output Notebook: %s" % get_pretty_path(output))


I think it is useful to keep printing the input and output paths. (If we decide it isn't useful we should delete the code instead of commenting it out)

Get it, it doesn't matter much :)

betatim · 2018-01-23T14:20:30Z

papermill/execute.py

-    # Write final Notebook to disk.
-    write_ipynb(nb, output)
-    raise_for_execution_errors(nb, output)
+    if output:


-> if output is not None

Get it 👍

betatim · 2018-01-23T14:24:54Z

papermill/execute.py

+    if output:
+        # Write final Notebook to disk.
+        write_ipynb(nb, output)
+        raise_for_execution_errors(nb, output)


with this only executed when we write the notebook, what happens if there is an exception and the notebook isn't written to a file?

@betatim Thanks, perhaps I have a wrong indent with raise_for_execution_errors(nb, output)

betatim · 2018-01-23T14:26:02Z

Hey! The idea is to allow people to use execute_notebook() from python to run notebooks? Just wanted to double check because I was unsure after reading the title.

This change is principle fine with me.

Needs some tests :)

jarrekk · 2018-01-23T14:36:22Z

@betatim The main purpose of this PR is to reduce the time of papermill.execute_notebook function, in my case, I need to execute a notebook with parameters, then deal output. But default output is in the output file, I have to read this file again, so here I want to get output directly instead of reading the file.

I will test this at local and perhaps also add some unit tests.

Thanks a ton!

rgbkrk · 2018-01-23T15:13:49Z

papermill/execute.py

+    if output is not None:
+        # Write final Notebook to disk.
+        write_ipynb(nb, output)
+    return nbformat.writes(nb)


This will write the entire notebook to stdout, correct? We definitely don't want that to always happen.

@rgbkrk Thanks, but what if I want to deal with output again? The only way is to read output file for now. Is there a way to drop file I/O step?

@rgbkrk This use case is in Python API, not in the command line. :)

Oh silly me, I see!

In that case, it would be much better to return back the notebook object instead of the raw string? That allows you to strip out exactly what you need without having to deserialize the whole thing again. It should make testing the return much easier as well.

if output is not None: # Write final Notebook to disk. write_ipynb(nb, output) return nb

@rgbkrk Yeah, dealing with notebook object is much easier, I made the change. Please review, thanks!

MSeal · 2018-01-23T19:24:38Z

Two comments:
We'd likely want this argument passed through from the cli as well.
If you rerun a notebook in place you will slowly produce additional default_parameter cells over time. You may want to change the logic on injecting parameter cells to replace rather than inject if the prior cell has default_parameter tag defined.

MSeal · 2018-01-23T19:25:14Z

papermill/execute.py

@@ -136,7 +136,7 @@ def log_outputs(cell):


 def execute_notebook(notebook,
-                     output,
+                     output=None,


Need a spec addition for this behavior :)

Hi @jarrekk, I'm a bit confused why we would have output as None for the default? In general, output will need to go somewhere, and it's likely simpler to be explicit about where the output goes. Would you be able to give some additional context to specifically how you see this working? Thanks!

betatim · 2018-01-24T18:38:57Z

papermill/execute.py

@@ -69,7 +67,7 @@ def preprocess(self, nb, resources):
            cell.outputs = []

    # Execute each cell and update the output in real time.
-    with futures.ThreadPoolExecutor(max_workers=1) as executor:
+    with futures.ThreadPoolExecutor(max_workers=4) as executor:


Why increase this? I thought the reason for using a thread pool here is to move the execution of the notebook to somewhere else, not to run things in parallel?

@betatim Yeah, here I just want to try, I will change it back. :)

MSeal · 2018-01-24T18:53:23Z

papermill/execute.py

@@ -79,7 +77,6 @@ def preprocess(self, nb, resources):

        for index, cell in execution_iterator:
            cell.metadata["papermill"]["status"] = RUNNING
-            future = executor.submit(write_ipynb, nb, output_path)


this change looks wrong; you sure you meant to mess with the future pool execution?

@MSeal Thanks for your comment, here if I don't use output option in https://github.com/nteract/papermill/blob/master/papermill/execute.py#L138, I will get output file error, if I remove these code, it would be OK, perhaps I need a better way to deal with this.

You need to check if output_path is None and not always write.

@betatim Thanks, I added this now. :)

MSeal · 2018-01-24T18:53:43Z

papermill/execute.py

@@ -151,8 +147,6 @@ def execute_notebook(notebook,
        progress_bar (bool): Flag for whether or not to show the progress bar.
        log_output (bool): Flag for whether or not to write notebook output to stderr.
    """
-    print("Input Notebook:  %s" % get_pretty_path(notebook))


Please leave these in, you can change get_pretty_path to handle None instead

This is the same about dealing with the output file.

From your comment I don't think you need to remove this. This generates some output on stdout but that doesn't get in the way of what you are trying to do.

betatim · 2018-01-25T06:39:56Z

I'm still confused about what the end goal is. I originally thought you want to call execute_notebook from a different bit of python code, and then do something with the executed notebook. From your comments it sounds like you want to call the papermill executable from the command-line and then capture stdout though. Maybe you could create a short example (in code) of how you want to use it.

jarrekk · 2018-01-25T06:47:14Z

@betatim Thanks, I want to use it in papermill API:

#!/usr/bin/env python
nb_object = papermill.execute_notebook(input_file, parameters=parameters) # without output parameter
metadata  = nb_object.metadata
...
# continue deal with nb_object with python code

But papermill also supports command line, so it would be better if this feature support on both Python API and command line. Perhaps an example:

#!/usr/bin/env bash
OUTPUT=$(papermill local/input.ipynb -p alpha 0.6 -p l1_ratio 0.1)

For me, I care about the first one(papermill API). :)

willingc · 2018-01-31T19:51:17Z

Hi @jarrekk,

#!/usr/bin/env python
nb_object = papermill.execute_notebook(input_file, parameters=parameters) # without output parameter
metadata = nb_object.metadata
...

Wouldn't you see the same behavior with:

nb_object = papermill.execute_notebook(input_file, None, parameters=parameters)

or possibly:

nb_object = papermill.execute_notebook(input_file, '/dev/null', parameters=parameters)

I'm a bit concerned that giving a default of None to output would give users who are new to using this unexpected results (i.e. they would expect the output notebook to be saved somewhere and would not have it saved if using the defaults).

jarrekk · 2018-02-01T02:17:45Z

@willingc
I get your point, perhaps it should use a way to get the nb_object without effect on output_file. But function execute_notebook has no response, so in nb_object = papermill.execute_notebook(input_file, '/dev/null', parameters=parameters), nb_object is empty.

jarrekk · 2018-02-01T02:22:13Z

@willingc I updated execute.py make it have a response in function execute_notebook. :)

willingc · 2018-02-01T23:41:18Z

@jarrekk Thanks for the updates. I would go ahead and add a line in the docstring indicating what is returned.

@MSeal @betatim Are you cool with this approach?

rgbkrk

Pending completion of @willingc's request, I'm in favor of the changes as is for returning the notebook from execute_notebook

rgbkrk · 2018-02-02T00:28:47Z

papermill/execute.py

@@ -185,6 +185,8 @@ def execute_notebook(notebook,
    # Write final Notebook to disk.
    write_ipynb(nb, output)
    raise_for_execution_errors(nb, output)
+    # always return notebook object
+    return nb


I'm happy with this approach.

return notebook object instead of raw string let user choose how to deal with execution result. let user choose how to deal with execution result. let user choose how to deal with execution result. update max_worker back add output_path update execute.py update execute.py

MSeal · 2018-02-02T02:17:49Z

With the rebase I'm +1

jarrekk · 2018-02-02T02:18:44Z

@MSeal Thanks!

willingc · 2018-02-02T02:49:17Z

@jarrekk We're in the home stretch now :-)

Please update the docstring for the function since we use the docstring to generate some documentation. Here's a suggestion of two lines to add to the existing docstring. Thanks!

"""Executes a single notebook locally.
 
     Args:
         notebook (str): Path to input notebook.
         output (str): Path to save executed notebook.
         parameters (dict): Arbitrary keyword arguments to pass to the notebook parameters.
         kernel_name (str): Name of kernel to execute the notebook against.
         progress_bar (bool): Flag for whether or not to show the progress bar.
         log_output (bool): Flag for whether or not to write notebook output to stderr.

     Returns:
         nb (NotebookNode): executed notebook object
"""

jarrekk · 2018-02-02T02:53:29Z

@willingc Great, many thanks!

willingc

Thanks!

willingc · 2018-02-02T03:20:49Z

Thanks @jarrekk for being flexible and collaborative. I'm happy that you contributed. Nicely done. 🍰

betatim · 2018-02-04T05:39:54Z

👍 !

let user choose how to deal with execution result.

c267a64

betatim reviewed Jan 23, 2018

View reviewed changes

rgbkrk reviewed Jan 23, 2018

View reviewed changes

MSeal reviewed Jan 23, 2018

View reviewed changes

betatim reviewed Jan 24, 2018

View reviewed changes

MSeal requested changes Jan 24, 2018

View reviewed changes

rgbkrk approved these changes Feb 2, 2018

View reviewed changes

jarrekk force-pushed the master branch from a554d9e to 8cb9bf1 Compare February 2, 2018 02:16

MSeal approved these changes Feb 2, 2018

View reviewed changes

update docstring for execute_notebook

c2ea660

willingc approved these changes Feb 2, 2018

View reviewed changes

willingc merged commit 5917482 into nteract:master Feb 2, 2018

zachwill mentioned this pull request Mar 25, 2019

Writing to /dev/null from CLI results in warnings #335

Closed

dcnadler mentioned this pull request Jun 27, 2022

Option for not writing an output ipynb file #669

Merged

Let user choose how to deal with execution result. #107

Let user choose how to deal with execution result. #107

Conversation

jarrekk commented Jan 23, 2018

codecov bot commented Jan 23, 2018 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

betatim commented Jan 23, 2018 • edited

jarrekk commented Jan 23, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rgbkrk Jan 23, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MSeal commented Jan 23, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

betatim commented Jan 25, 2018

jarrekk commented Jan 25, 2018 • edited

willingc commented Jan 31, 2018 • edited

jarrekk commented Feb 1, 2018

jarrekk commented Feb 1, 2018

willingc commented Feb 1, 2018

rgbkrk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MSeal commented Feb 2, 2018

jarrekk commented Feb 2, 2018

willingc commented Feb 2, 2018

jarrekk commented Feb 2, 2018

willingc left a comment

Choose a reason for hiding this comment

willingc commented Feb 2, 2018

betatim commented Feb 4, 2018

codecov bot commented Jan 23, 2018 •

edited

betatim commented Jan 23, 2018 •

edited

rgbkrk Jan 23, 2018 •

edited

jarrekk commented Jan 25, 2018 •

edited

willingc commented Jan 31, 2018 •

edited