Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tag regex #403

Closed
wants to merge 6 commits into from
Closed

Tag regex #403

wants to merge 6 commits into from

Conversation

casperdcl
Copy link
Member

@casperdcl casperdcl commented Jul 16, 2019

examples

# Only include code cells with tags exactly equal to
# "whitelisted", "testing", "parameters", or "injected-parameters".
# Untagged code cells are removed.
papermill ... --tag-include-regex '^(whitelisted|testing)$'

# Exclude if "debug" (case insensitive) is a substring in any of a code cell's tags.
# Untagged code cells are included.
papermill ... --tag-exclude-regex '[dD][eE][bB][uU][gG]'

# Combine both of the above.
# Untagged code cells are removed.
papermill ... -t '^(whitelisted|testing)$' -T '[dD][eE][bB][uU][gG]'

# Only include code cells with tags.
# Untagged code cells are removed.
papermill ... -t '.*'

@casperdcl casperdcl self-assigned this Jul 16, 2019
@casperdcl
Copy link
Member Author

I've tested this on my machine and find it very useful. Would prefer if someone else fixes the unit tests.

@casperdcl
Copy link
Member Author

fixed unit tests and added basic CLI parsing tests. Would be nice if someone added an actual test to verify correct cell tag parsing.

@casperdcl casperdcl requested a review from MSeal July 16, 2019 23:05
@aayush-jain18
Copy link

examples

# Only include code cells with tags exactly equal to
# "whitelisted", "testing", "parameters", or "injected-parameters".
# Untagged code cells are removed.
papermill ... --tag-include-regex '^(whitelisted|testing)$'

# Exclude if "debug" (case insensitive) is a substring in any of a code cell's tags.
# Untagged code cells are included.
papermill ... --tag-exclude-regex '[dD][eE][bB][uU][gG]'

# Combine both of the above.
# Untagged code cells are removed.
papermill ... -t '^(whitelisted|testing)$' -T '[dD][eE][bB][uU][gG]'

# Only include code cells with tags.
# Untagged code cells are removed.
papermill ... -t '.*'

I was trying and testing this PR at my local, and it worked out great. However, when using cli it is important to note that quoting and escaping mechanism is different for windows batch and linux shell.

windows:

papermill notebooks\input_notebook.ipynb notebooks\output_notebook.ipynb -t ^^(import^|data)$

shell:

papermill notebooks/input_notebook.ipynb notebooks/output_notebook.ipynb -t '^(import|data)$'

@MSeal
Copy link
Member

MSeal commented Jul 22, 2019

So I don't like pushing back on contributions, but I might pause this PR here and suggest this shouldn't be in papermill core. It adds a lot of complexity to the AST execution pattern and makes it more difficult to reason about how a notebook will be executed. Papermill is meant to be a very simple tool with few, opinionated options around templated execution of notebooks. This would bend that simplicity.

Furthermore, I think there's already a decent execution pattern for removing cells with particular tags via nbconvert. Specifically the TagRemovePreprocessor allows for one to make a new notebook file that has the tags prunned out. This helps preserve a functional execution pattern where you have the intermediate pruned object pass along to the simpler execution tool in papermill.

jupyter nbconvert --TagRemovePreprocessor.enabled=True --TagRemovePreprocessor.remove_cell_tags="first_cell" --to=notebook --output=RemovedTag Tagged.ipynb
papermill RemovedTag.ipynb FinalOutput.ipynb -p foo bar ...

Does this sound reasonable and understandable?

@willingc
Copy link
Member

Hi @casperdcl.

Thanks for using papermill as well as taking the time to explore new uses and to contribute back updates for tqdm. As @MSeal commented, this PR's use case today falls outside of the scope of papermill's lightweight design.

Papermill's current design goal is to be a simple tool to add parameters to a notebook and automate a notebook's execution with the parameters. Your proposed PR goes beyond that goal by adding logic to extend the use of tags beyond parameters for variables within a code cell. This PR introduces the use of additional tags (such as debug) to decorate a notebook cell and control which cells are executed based on the decorated tag. Right now, this is beyond the scope of the core papermill project.

It's an interesting use case to use tags as decorators to determine which cells are executed. I'm going to close this PR as being beyond the scope of the current project core. I'm also going to label this PR as reference: tags should you or others wish to create as an optional extension to papermill instead of the core.

@willingc willingc added reference:extensions information and ideas about extensions reference:tags information and ideas and removed enhancement help wanted labels Aug 15, 2019
@willingc willingc closed this Aug 15, 2019
@casperdcl casperdcl linked an issue May 15, 2020 that may be closed by this pull request
@MSeal MSeal deleted the tag_regex branch August 17, 2020 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reference:extensions information and ideas about extensions reference:tags information and ideas
Projects
None yet
4 participants