Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip notebook generation #614

Closed
fferegrino opened this issue Feb 23, 2022 · 5 comments
Closed

Skip notebook generation #614

fferegrino opened this issue Feb 23, 2022 · 5 comments

Comments

@fferegrino
Copy link
Contributor

When working on a pipeline containing Python scripts a la:

tasks:
  - source: 1-get.py
    product:
      nb: output/1-get.ipynb
      data: output/data.csv

It would be nice to have a way to skip notebook generation for some of the tasks, maybe if we don't specify the nb entry or via an argument to ploomber build.

@edublancas
Copy link
Contributor

Thanks for the feedback, this feature makes sense. The current solution is to use functions as tasks but I think we should also support scripts.

I did some quick research, here are my notes for future reference:

  • Currently, scripts/notebooks executed this way support IPython magics (%time), I checked if it's possible to run a script with magics and found that line magics kind of work (ipython script.ipy - note the .ipy) extension but cell magics don't work (Request: Enable cell magics inside of .ipy scripts ipython/ipython#2539). Even the authors of IPython do not encourage their use. So that would be a limitation, if users want to skip notebook generation, the script should not contain magics.
  • Still on the edge of how users decide whether to generate or notebook or not. Currently thinking of two options.

Option A:

tasks:
  - source: 1-get.py
    product:
      # missing nb, no notebook is generated but a warning is displayed, suggesting the user to pass null
      data: output/data.csv
tasks:
  - source: 1-get.py
    product:
      # no notebook generated and no warning - this is explicit so that's good
      nb: null
      data: output/data.csv

Option B:

tasks:
  - source: 1-get.py
    product:
      # missing nb, no notebook is generated but a warning is displayed, suggesting the user to change the runner
      data: output/data.csv
tasks:
  - source: 1-get.py
    product:
    # no notebook generated and no warning
      data: output/data.csv
    # tell ploomber to run as script, no notebook generated
     runner: script

Thoughts?

@fferegrino
Copy link
Contributor Author

Regarding the first point: Is there a way to safely detect and ignore cell magics?

As for the second, point: The first option looks more natural, but I'd argue that the second one is the most flexible one (it could be used to specify the type of runner for other kind of tasks?) since one could set runner: ipython to run scripts with magics.

@edublancas
Copy link
Contributor

Thanks for the feedback!

Yes, we could detect cell magics, but it might be confusing to users since the code they write will be different from the code we execute. There is also an unlikely possibility that we break user's code since there are cell magics that can modify the interpreter's state, for example %%capture

# output variable does not exist

# cell magic creates output variable
%%capture output
print('stuff')

# this works now
print(output)

So I think it's best to tell the user to use the "notebook" runner if we detect the code contains cell magics.

@edublancas
Copy link
Contributor

this is fixed, it will be part of the next release! 🎉

@edublancas
Copy link
Contributor

this will be on 0.17.2 but the user guide isn't ready, so it won't be on the docs yet. FYI, this is how it works:

tasks:
  - source: 1-get.py
    class: ScriptRunner
    product:
      data: output/data.csv
      # no notebook needed!

neelasha23 pushed a commit to neelasha23/ploomber that referenced this issue May 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants