Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate multiple nb products from a single task #708

Merged
merged 198 commits into from
Jun 1, 2022

Conversation

neelasha23
Copy link
Contributor

@neelasha23 neelasha23 commented Apr 14, 2022

For changes related to : #673

@edublancas Need to understand how the field nbconvert_exporter_name works in case where user specifies multiple nb products. Which nb product will it be mapped to? Or should the nbconvert_exporter_name also be a dict where each item corresponds to the respective nb product?

@neelasha23 neelasha23 changed the title Generate multiple nb products from a single task [Work in progress] Generate multiple nb products from a single task Apr 17, 2022
@edublancas
Copy link
Contributor

Thanks for working on this! I think you're on the right track. I'd suggest collapsing the nb_product_doc_key, nb_product_report_key into the original nb_product_key, because in the future we may support more formats than HTML and PDF, this is what I'm thinking:

This is how things work right now:

# simplest case, product is a path
tasks:
  - source: notebook.ipynb
    product: report.html

A bit more elaborated, multiple products

# multiple products
tasks:
  - source: notebook.ipynb
    product:
      nb: report.html
    # no need to specify nb_product_key since it's "nb" by default

Now, if the user wants to output multiple reports:

tasks:
  - source: notebook.ipynb
    nb_product_key: [nb_ipynb, nb_html, nb_pdf]
    product:
      nb_ipynb: report.ipynb
      nb_html: report.html
      nb_pdf: report.pdf

If the user wants to customize nbconvert_exporter_name:

tasks:
  - source: notebook.ipynb
    nb_product_key: [nb_ipynb, nb_html, nb_pdf]
    nbconvert_exporter_name:
      # user webpdf for nb_pdf, the rest should use the default one based on the extension
      nb_pdf: webpdf
    product:
      nb_ipynb: report.ipynb
      nb_html: report.html
      nb_pdf: report.pdf

Edge cases:

  1. If nb_product_key is a list and any of the elements don't appear in product, we should throw an error
  2. if nbconvert_exporter_name is a dict and contains unexpected keys (they don't appear in nb_product_key), we should throw an error

This is the cleanest API I can think of. thoughts?

@neelasha23
Copy link
Contributor Author

neelasha23 commented Apr 18, 2022

Yes this makes sense.
Just to clarify, we are now passing an explicit nb_product_key: [nb_ipynb, nb_html, nb_pdf] entry so that in

nb_product_key='nb',
there would be just one nb_product_key variable instead of multiple variables.
So default would be nb_product_key='nb', however in the case where user specifies multiple nb products this value will be a list instead of nb like nb_product_key = ['nb_ipynb', 'nb_html', 'nb_pdf']

The only issue seems to be that user needs to specify the keys multiple times : once in the list and again as the dict keys

@neelasha23
Copy link
Contributor Author

Also, the other edge case can be nbconvert_exporter_name is not a dict but user has specified nb_product_key list.

@edublancas
Copy link
Contributor

So default would be nb_product_key='nb', however in the case where user specifies multiple nb products this value will be a list instead of nb like nb_product_key = ['nb_ipynb', 'nb_html', 'nb_pdf']

correct

The only issue seems to be that user needs to specify the keys multiple times : once in the list and again as the dict keys

I agree. it's kind of redundant, but since users will only create lists of 2-3 elements, I think it's better to be explicit.

Also, the other edge case can be nbconvert_exporter_name is not a dict but user has specified nb_product_key list.

good catch. there has to be some validation here. if nb_product_key is a list, then nbconvert_exporter_name must be a dictionary

another validation is that all values in nb_product_key must appear in product

@neelasha23 neelasha23 changed the title Generate multiple nb products from a single task Generate multiple nb products from a single task [Work in progress] Apr 21, 2022
@neelasha23 neelasha23 changed the title Generate multiple nb products from a single task [Work in progress] Generate multiple nb products from a single task Apr 26, 2022
@neelasha23
Copy link
Contributor Author

@edublancas I have added the functionalities discussed above , and the corresponding test cases. Do let me know if changes are required. Thanks!

@edublancas
Copy link
Contributor

thanks a lot! just a heads up that I'm traveling to PyCon, so it may take me a few days to review this

@neelasha23
Copy link
Contributor Author

Sure! No issues :)

Copy link
Contributor

@edublancas edublancas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great! thanks so much for your work! just minor comments

self.multiple_nb_params = {
'valid_keys': ['nb_ipynb', 'nb_html', 'nb_pdf'],
'ipynb_key': 'nb_ipynb'
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be defined by the user, for example, a user might call this: [notebook, report_html, report_pdf], then the validation should be done on those user-submitted values. Looks like this code is looking for a nb_ipynb key.

For example, I may wanna do this:

  - source: fit.py
    nb_product_key: [notebook, nb_html, nb_pdf]
    product:
      notebook: output/report.ipynb
      nb_html: output/report.html
      nb_pdf: output/report.pdf
      model: output/model.pickle

Or may not even want the ipynb file:

  - source: fit.py
    nb_product_key: [report_web, report_document]
    product:
      report_web: output/report.html
      report_document: output/report.pdf
      model: output/model.pickle

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edublancas
Made the requested changes. Do let me know if anything looks incorrect

@edublancas
Copy link
Contributor

great work @neelasha23! one thing before we merge. can you add an example here?

please add a sample pipeline.yaml with a few comments showing how to create multiple products from the same script. something like:

- source: script.py
  nb_product_key: [nb_ipynb, nb_pdf, nb_html]
  nbconvert_exporter_name:
    pdf: webpdf
  product:
    # paths to products

@neelasha23
Copy link
Contributor Author

Yes I will add it

@edublancas edublancas changed the base branch from master to multiple-nb June 1, 2022 12:37
@edublancas edublancas merged commit e4acf33 into ploomber:multiple-nb Jun 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet