Skip notebook generation #614

fferegrino · 2022-02-23T21:43:16Z

When working on a pipeline containing Python scripts a la:

tasks:
  - source: 1-get.py
    product:
      nb: output/1-get.ipynb
      data: output/data.csv

It would be nice to have a way to skip notebook generation for some of the tasks, maybe if we don't specify the nb entry or via an argument to ploomber build.

The text was updated successfully, but these errors were encountered:

edublancas · 2022-02-24T13:57:04Z

Thanks for the feedback, this feature makes sense. The current solution is to use functions as tasks but I think we should also support scripts.

I did some quick research, here are my notes for future reference:

Currently, scripts/notebooks executed this way support IPython magics (%time), I checked if it's possible to run a script with magics and found that line magics kind of work (ipython script.ipy - note the .ipy) extension but cell magics don't work (Request: Enable cell magics inside of .ipy scripts ipython/ipython#2539). Even the authors of IPython do not encourage their use. So that would be a limitation, if users want to skip notebook generation, the script should not contain magics.
Still on the edge of how users decide whether to generate or notebook or not. Currently thinking of two options.

Option A:

tasks:
  - source: 1-get.py
    product:
      # missing nb, no notebook is generated but a warning is displayed, suggesting the user to pass null
      data: output/data.csv

tasks:
  - source: 1-get.py
    product:
      # no notebook generated and no warning - this is explicit so that's good
      nb: null
      data: output/data.csv

Option B:

tasks:
  - source: 1-get.py
    product:
      # missing nb, no notebook is generated but a warning is displayed, suggesting the user to change the runner
      data: output/data.csv

tasks:
  - source: 1-get.py
    product:
    # no notebook generated and no warning
      data: output/data.csv
    # tell ploomber to run as script, no notebook generated
     runner: script

Thoughts?

fferegrino · 2022-03-08T20:23:07Z

Regarding the first point: Is there a way to safely detect and ignore cell magics?

As for the second, point: The first option looks more natural, but I'd argue that the second one is the most flexible one (it could be used to specify the type of runner for other kind of tasks?) since one could set runner: ipython to run scripts with magics.

edublancas · 2022-03-09T02:45:11Z

Thanks for the feedback!

Yes, we could detect cell magics, but it might be confusing to users since the code they write will be different from the code we execute. There is also an unlikely possibility that we break user's code since there are cell magics that can modify the interpreter's state, for example %%capture

# output variable does not exist

# cell magic creates output variable
%%capture output
print('stuff')

# this works now
print(output)

So I think it's best to tell the user to use the "notebook" runner if we detect the code contains cell magics.

edublancas · 2022-03-31T01:46:06Z

this is fixed, it will be part of the next release! 🎉

edublancas · 2022-03-31T04:52:28Z

this will be on 0.17.2 but the user guide isn't ready, so it won't be on the docs yet. FYI, this is how it works:

tasks:
  - source: 1-get.py
    class: ScriptRunner
    product:
      data: output/data.csv
      # no notebook needed!

edublancas added the enhancement label Mar 19, 2022

edublancas added a commit that referenced this issue Mar 31, 2022

finishes ScriptRunner implementation - closes #614

8e8125b

edublancas closed this as completed in 57952c9 Mar 31, 2022

neelasha23 pushed a commit to neelasha23/ploomber that referenced this issue May 22, 2022

add ScriptRunner - closes ploomber#614

ba53a11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip notebook generation #614

Skip notebook generation #614

fferegrino commented Feb 23, 2022

edublancas commented Feb 24, 2022

fferegrino commented Mar 8, 2022

edublancas commented Mar 9, 2022

edublancas commented Mar 31, 2022

edublancas commented Mar 31, 2022

Skip notebook generation #614

Skip notebook generation #614

Comments

fferegrino commented Feb 23, 2022

edublancas commented Feb 24, 2022

fferegrino commented Mar 8, 2022

edublancas commented Mar 9, 2022

edublancas commented Mar 31, 2022

edublancas commented Mar 31, 2022