New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gammapy jupyter CLI for developers #1766

Merged
merged 11 commits into from Sep 9, 2018

Conversation

2 participants
@Bultako
Member

Bultako commented Sep 6, 2018

This PR provides a new CLI gammapy jupyter to perform global actions on a list of notebooks present in the current working dir, in a specific folder or for a specific file. The list of actions that provides are: strip output cells, execute, unittest to find broken notebooks and black-format.

$ gammapy jupyter 
Usage: gammapy jupyter [OPTIONS] COMMAND [ARGS]...

  Perform a series of actions on Jupyter notebooks.

  The chosen action is applied for every Jupyter notebook present in the
  current working directory. Option --file allows to chose a single file,
  while option --fold allows to choose a different folder to scan. These
  options are mutually exclusive, only one is allowed.

  Examples
  --------
  
  $ gammapy jupyter stripout
  $ gammapy jupyter --src=mynotebooks.ipynb execute
  $ gammapy jupyter --src=myfolder/tutorials test
  $ gammapy jupyter black

Options:
  --src TEXT  Local folder or Jupyter notebook filename.
  -h, --help  Show this message and exit.

Commands:
  black     Format code cells with black.
  execute   Execute Jupyter notebooks.
  stripout  Strip output cells.
  test      Check if Jupyter notebooks are broken.

@Bultako Bultako force-pushed the Bultako:gp-jupyter branch 2 times, most recently from 4d70d2d to 28ee292 Sep 7, 2018

return output
def build_nblist(ctx):

This comment has been minimized.

@cdeil

cdeil Sep 7, 2018

Member

Suggest to make this a generator function that yields the paths. That means you can remove the lines that make the empty list and return it at the end, and the nblist.append calls become yield statements.
You can then use it as an iterable in the for loop directly, or call list() on the generator if you need a list, but i nthis case you don't need a list.

This comment has been minimized.

@Bultako

Bultako Sep 7, 2018

Member

Done.

"""Comment magic commands when formatting cells."""
lines = input.splitlines(True)
output = ""

This comment has been minimized.

@cdeil

cdeil Sep 7, 2018

Member

Again, this would be little nicer as a generator function that yields the lines.

This comment has been minimized.

@Bultako

Bultako Sep 7, 2018

Member

Well, we need the whole code in cell to send it to black, not only the lines.
I do not see how to a generator may help..

This comment has been minimized.

@cdeil

cdeil Sep 7, 2018

Member

The generator helps a little bit, it reduces the number of lines you have here, and it's also a bit more efficient. Note that Python strings are immutable, so doing "+" repeatedly to build larger and larger strings isn't efficient.
If you yield strings for each line here, then in the caller you can do this:

txt = '\n'.join(comment_magics(input))

I think str.join will accept a generator that yields strings.

It's not a big improvement, just a very minor one.

This comment has been minimized.

@Bultako

Bultako Sep 7, 2018

Member

Ok, I got it 👍
Thanks for the review

ignorelist.append(nbname)
folder = notebook.parent
print('oups')

This comment has been minimized.

@cdeil

cdeil Sep 7, 2018

Member

What's up with this print? Remove or make more clear?

This comment has been minimized.

@Bultako

Bultako Sep 7, 2018

Member

Oups ! removed.

nbformat.write(nb, str(notebook))
# inform
print('Jupyter notebook {} painted in black.'.format(str(notebook)))

This comment has been minimized.

@cdeil

cdeil Sep 7, 2018

Member

Does the mix of print statements and log messages have a logic to it?
Maybe change to log messages everywhere?

This comment has been minimized.

@Bultako

Bultako Sep 7, 2018

Member

done.

@@ -37,10 +37,10 @@ dependencies:
- pandoc==2.1.3
- pylint
- flake8
- nbstripout

This comment has been minimized.

@cdeil

cdeil Sep 7, 2018

Member

I'm surprised that it's useful to have a dependency to strip output cells.
Shouldn't it be < 5 lines to do this via the nbconvert or nbformat API to loop over output cells and make them empty?
Of course, if it's nontrivial / useful to use nbstripout, OK to add the dependency.

This comment has been minimized.

@Bultako

Bultako Sep 7, 2018

Member

ok, done.
anyways, I have foreseen for developers to use nbstripout to configure a git filter before committing notebooks, so thought would be installed in the devel environment. But I agree, better to reduce dependencies.

@cdeil cdeil added the docs label Sep 7, 2018

@cdeil cdeil added this to To Do in Documentation via automation Sep 7, 2018

@cdeil cdeil added this to the 0.9 milestone Sep 7, 2018

@Bultako Bultako force-pushed the Bultako:gp-jupyter branch 2 times, most recently from cd611d7 to 94954c3 Sep 7, 2018

# strip output cells
for cellnumber, cell in enumerate(nb.cells):
if nb.cells[cellnumber]['cell_type'] == 'code':
nb.cells[cellnumber]['outputs'] = []

This comment has been minimized.

@cdeil

cdeil Sep 7, 2018

Member

Is this equivalent?

for cell in nb.cells:
    if cell['cell_type'] == 'code':
        cell['outputs'] = []
for notebook in nblist:
for notebook in build_nblist(ctx):
try:
t = time.time()
subprocess.call(
"jupyter nbconvert --allow-errors --ExecutePreprocessor.timeout=None --ExecutePreprocessor.kernel_name=python3 --to notebook --execute '{}'.format(notebook) --inplace",

This comment has been minimized.

@cdeil

cdeil Sep 7, 2018

Member

Can you please break this long string?
Suggest something like this:

subprocess.call(
    "jupyter nbconvert "
    "--allow-errors "
    "--ExecutePreprocessor.timeout=None "
...

Python will just concatenate the strings, and this makes it easy to read and to comment some option out or add code comments in between if something needs an explanation.

semicolon = 0
fmt = '\n'.join(comment_magics(fmt))
#fmt = comment_magics(fmt)
if fmt.endswith(';'):

This comment has been minimized.

@cdeil

cdeil Sep 7, 2018

Member

I think you can just put:

has_semicolon = fmt.endswith(';')

and remove the if here and line semicolon = 0 above.

# write formatted notebook
nbformat.write(nb, str(notebook))
# inform

This comment has been minimized.

@cdeil

cdeil Sep 7, 2018

Member

remove comment

fmt = fmt.replace("###-MAGIC COMMAND-", "")
nb.cells[cellnumber]['source'] = fmt
# write formatted notebook

This comment has been minimized.

@cdeil

cdeil Sep 7, 2018

Member

remove comment, just restates the next code line.

try:
semicolon = 0
fmt = '\n'.join(comment_magics(fmt))
#fmt = comment_magics(fmt)

This comment has been minimized.

@cdeil

cdeil Sep 7, 2018

Member

remove comment.

def cli_jupyter_black(ctx):
"""Format code cells with black."""
for notebook in build_nblist(ctx):

This comment has been minimized.

@cdeil

cdeil Sep 7, 2018

Member

What you call notebook here are Path objects, not Notebook objects?
Maybe rename the variable to "path"?

@Bultako Bultako force-pushed the Bultako:gp-jupyter branch 5 times, most recently from fccfab5 to 59d1ad3 Sep 7, 2018

@Bultako

This comment has been minimized.

Member

Bultako commented Sep 8, 2018

@cdeil

I have refactored most of the code and removed the testipynb dependency. The functionality for testing notebooks is coded in the CLI script jupyter.py, so no need to call functions in gammapy.utils. We could discuss later if we move all these functionalities to gammapy.utils, but I think that for the moment they could live in gammapy jupyter CLI, remove scripts black_notebooks.py and test_notebooks.py, and close PR #1765

Surely there are still fixes to address in code review, and I thank you for your patience.

@Bultako Bultako force-pushed the Bultako:gp-jupyter branch 4 times, most recently from 81d6802 to 14269b4 Sep 8, 2018

@Bultako Bultako force-pushed the Bultako:gp-jupyter branch 5 times, most recently from 45b6f6c to 6ed5b35 Sep 9, 2018

log = logging.getLogger(__name__)
try:
import nbformat

This comment has been minimized.

@cdeil

cdeil Sep 9, 2018

Member

You can't have this at the top of the module, it needs to be a delayed import in the functions that use this.
See https://ci.appveyor.com/project/cdeil/gammapy/build/1.0.4455/job/qfhona74oc6yy5sw#L1569

I would just let the ImportError occur. We do this even for users e.g. in functions that use scipy or matplotlib in Gammapy, so even more so for dev tools I think it's OK to see an ImportError occur.

This comment has been minimized.

@Bultako

Bultako Sep 9, 2018

Member

What do you mean by a 'delayed import'? Am I supposed to leave the imports even if CI fails? I was struggling to fix the CI by any means.

This comment has been minimized.

@cdeil

cdeil Sep 9, 2018

Member

With delayed import I mean to put the import into the function where you use it on the first line. I don't know if there's a proper term for that.

Then CI and tests will not fail if they just import the module.
And when writing tests for code that uses an optional dependency, we have the requires_dependency decorator that we use to mark each such test function.

@click.pass_context
def cli_jupyter_black(ctx):
"""Format code cells with black."""
try:

This comment has been minimized.

@cdeil

cdeil Sep 9, 2018

Member

Same here, I'd suggest to remove the try / except and just import black.

fmt = fmt.replace(self.MAGIC_TAG, "")
cell['source'] = fmt
def tag_magics(self, cellcode):

This comment has been minimized.

@cdeil

cdeil Sep 9, 2018

Member

Make this a @staticmethod since self isn't accessed?

This comment has been minimized.

@Bultako

Bultako Sep 9, 2018

Member

There is self.MAGIC_TAG
How about moving self.MAGIC_TAG out of the function and have it s a global var in jupyter.py? At this moment we could have @staticmethod

This comment has been minimized.

@cdeil

cdeil Sep 9, 2018

Member

No, then just leave as-is. It's perfectly fine, I just didn't see the access to self in this method.

This comment has been minimized.

@cdeil

cdeil Sep 9, 2018

Member

Better to have everything on this one class. Then it's easier to maintain and others can copy & paste it.

Actually, I think a separate blacknb or black-notebook Python package that does this would find a userbase; or we could file an issue in one of the Jupyter repos asking if they think it's appropriate to be included there directly.

sys.exit()
if path.is_dir():
for f in path.iterdir():

This comment has been minimized.

@cdeil

cdeil Sep 9, 2018

Member

I think this would be equivalent and a tiny bit shorter:

if path.is_dir():
    path = path.glob('*.ipynb')

This comment has been minimized.

@cdeil

cdeil Sep 9, 2018

Member

Probably making a list instead of a generator here is less error-prone (e.g. if it's looped over twice a generator will be empty on the second go):

if path.is_dir():
    paths = list(path.glob('*.ipynb'))

@Bultako Bultako force-pushed the Bultako:gp-jupyter branch from 6ed5b35 to 09ef949 Sep 9, 2018

@Bultako Bultako force-pushed the Bultako:gp-jupyter branch from 09ef949 to da39581 Sep 9, 2018

@Bultako Bultako merged commit 8708094 into gammapy:master Sep 9, 2018

1 of 4 checks passed

Scrutinizer Analysis: Errored – Tests: errored
Details
continuous-integration/travis-ci/pr The Travis CI build could not complete due to an error
Details
Codacy/PR Quality Review Not up to standards. This pull request quality could be better.
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details

Documentation automation moved this from To Do to Done Sep 9, 2018

@Bultako Bultako deleted the Bultako:gp-jupyter branch Sep 9, 2018

@cdeil cdeil changed the title from gammapy jupyter CLI to perform global actions on notebooks to Add gammapy jupyter CLI for developers Nov 23, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment