-
Notifications
You must be signed in to change notification settings - Fork 564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add preprocessor to remove empty cells. #575
Conversation
Seems reasonable. I'll let @mpacer take a look. |
Just to explain the motivation behind this PR and #569: Adding jupyter notebooks to git is a bit painful because of the generated output. There are a range of scripts that strip the output and other content that may not be relevant for versioning (such as empty cells). But as the notebook format evolves, these scripts become outdated. My hope was to be able to use nbconvert to run git filters to ensure that the format of the notebook is always in sync with the filter script. |
As you can guess from my participation in jupyter/notebook#1928 — that sounds great! I'd even be open to having this included in most exporters by default — @takluyver @Carreau is that a reasonable inclusion? That can be a second PR though. My only issue is that It might be better to have it assert the emptiness of the notebook as one branch (of an |
One of the questions I'll have is what if metadata or other things are set ? Should the cell be considered empty (IIRC gene pattern may use only metadata to store cell informations) ? Nitpicking but what if the notebook is using [this programming language](https://en.wikipedia.org/wiki/Whitespace_(programming_language) ? Regardless I think it is a useful addition. It's common to have trailing empty cells indeed. |
# Copyright (c) IPython Development Team. | ||
# Distributed under the terms of the Modified BSD License. | ||
|
||
from traitlets import Set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set is un necessary, but can be remove in a subsequent PR.
nb, res = preprocessor(nb, res) | ||
|
||
for cell in nb.cells: | ||
assert cell.source.strip(), "found unexpected empty cell" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed with @mpacer using something like nose_tools.assert_not_equal
will also give a better error message, and won't be compiled out if Python is by mistake configured with -O something
.
Ok, will have a look at making the tests a bit easier to read. We could define a string traitlet that defines a regular expression such that a cell is considered empty if it matches the regex. Thoughts? |
Hm, this is now just a regex-based filter for cells. The only reason it should still be called |
What you could do is change the current one to Yes it's an straightforward use case of the But even if we don't do that, this should be added as an example to the docs of how to use traitlets to create a subclass of a particular kind of Also, please add the |
@Carreau & @takluyver this is looking good to me. Docs, default value, default inclusion in
|
Possible resources for docs • http://www.rexegg.com/regex-python.html |
Yep, this should work: patterns = List(Unicode, default_value=[r'\s*$']) |
Thanks for all the comments and references. The www.pyregex.com tool is great! 👍 I like the idea of using a list of patterns such that users can extend the patterns without having to manipulate regex patterns. Unfortunately, I will only have sporadic internet access for the next two weeks but will try to implement the following before:
|
This could also be useful for stripping out sensitive info from a notebook, such as AWS keys. I think using a List is fine for the config, but under the hood you should probably convert everything to a single regex using |
Improved regex testing suggestion: https://regex101.com/ ↑ it actually goes through and explains the logic of the matches. |
This looks great! Thank you for all the hard work. I had actually meant docs in the sense of the narrative docs (rather than in the docstring) but that works too! Merging! |
No description provided.