Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add notebook formatting and CI check #124

Merged
merged 11 commits into from Jan 25, 2021
Merged

Conversation

rmlarose
Copy link
Contributor

Fixes #56.

Copy link
Collaborator

@mpharrigan mpharrigan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof that diff.

I'll need to download this and look at it in pycharm's differ (which handles the change in indent much more elegantly). Is it just changing the indent or are there additional changes?

How hard would it be to maintain a list of grandfathered-in 2-indent notebooks so we can kick this can down the road some more?

#!/usr/bin/env bash

###############################################################################
# Formats ipython notebooks with tensorflow-docs nbformat tool.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you write this or take it from somewhere?

If the former: ick bash. I guess it's ok here since you aren't doing looping (which is my rule of thumb for when a script definitely needs to be in a real programming langauge)

If the latter: note where you got it from so we can keep them in sync

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is taken directly from Cirq: https://github.com/quantumlib/Cirq/blob/master/check/nbformat

In addition to indentation, metadata is also removed and probably a few other things - @lamberta may know more here.

Not sure why the diff is so large - may be because GH has trouble analyzing notebooks but I'll check. At least by quick glance the notebooks are very similar before/after formatting.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try passing --indent=2 to nbfmt. The default is 4 but the quantum folks prefer 2

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lamberta I think the collab default is 2 but the jupyter default is 1 and we prefer 1

@rmlarose the diff is messed up because of the indentation of the json in the ipynb files. Currently recirq has some with 2 and some with 1

cd "$(git rev-parse --show-toplevel)"

# Install a pinned version of tf-docs
TF_DOCS_COMMIT=b3dc8a922d8bdc6e998a8ad4f4953359dc6e576a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinning a commit makes sense, but we don't really do releases in tensorflow/docs so this commit is arbitrary.

It might be better to download the cirq dev_tools/nbformat script into your GitHub Actions VM so you're at least using the same thing and don't get too out of sync.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 downloading cirq's script

@rmlarose
Copy link
Contributor Author

Diff is about 1/2 as large with indent=2. (But note Cirq sets indent=1.)

How hard would it be to maintain a list of grandfathered-in 2-indent notebooks so we can kick this can down the road some more?

I think it's fine to punt on this. This PR is the simplest fix for #56 using the tool already in Cirq, but it sounds like there's some things to decide on.

It might be better to download the cirq dev_tools/nbformat script into your GitHub Actions VM so you're at least using the same thing and don't get too out of sync.

Great point. So would we want to just grab this file from Cirq in CI to make sure they're the same? I guess this works as long as the file location doesn't change.

@lamberta
Copy link
Collaborator

Ah, you're right (and I misspoke): nbfmt indent defaults to 2 and the quantum folks mentioned they prefer 1.

So would we want to just grab this file from Cirq in CI to make sure they're the same?

Curl'ing the file is probably easiest for a one-off. Though you can checkout the entire Cirq repo (shallow clone) next to this one using the GitHub Actions checkout module—if there's additional tooling in that repo you needed to access.

@mpharrigan
Copy link
Collaborator

How hard would it be to maintain a list of grandfathered-in 2-indent notebooks

If we use Cirq's script (which I think we should) then we can't do this (without a lot of headache) so I'm fine "biting the bullet" now.

Like I said, pycharm seems to handle the change-in-indent diffs much better so I'll review the PR that way once it's finalized with indent=1 and using cirq's script

@mpharrigan
Copy link
Collaborator

Another fun trick is that you can append ?w=1 to a github diff url to omit whitespace changes:

https://github.com/quantumlib/ReCirq/pull/124/files?w=1

@mpharrigan
Copy link
Collaborator

@rmlarose what do you think about wgetting the script from Cirq?

@rmlarose
Copy link
Contributor Author

@rmlarose what do you think about wgetting the script from Cirq?

I like it! I should be able to get to this by the end of the week, just haven't had time yet.

Only potential downside I see is a new contributor to ReCirq wondering how to format a notebook to pass CI. It can be figured out though, and I agree its a priority to be in sync with Cirq.

@mpharrigan
Copy link
Collaborator

Only potential downside I see is a new contributor to ReCirq wondering how to format a notebook to pass CI. It can be figured out though, and I agree its a priority to be in sync with Cirq.

Good point. Maybe we can have a simple shell script in this repo to optionally download and then execute the script

if [[ check/nbfmt doesn't exist ]]
  wget it from cirq
fi

check/nbfmt

which the CI can just call (as well as contributors). [and add the downloaded file to .gitignore so people don't commit it]

@lamberta
Copy link
Collaborator

Only potential downside I see is a new contributor to ReCirq wondering how to format a notebook to pass CI.

True. Depending how fancy you want to get, you can add a GitHub Action to automatically post a comment with instructions to the pull request, for example: tensorflow/docs-l10n#474 (comment)
But the simplest thing is to add a saved reply to your GitHub settings and use that if you're monitoring PRs. And if you have a open source contributing guide you can add instructions there (at least you'll have a consistent place to point).

We actually have a bot set up to automatically format new pull requests. It's nice but requires auth setup for GitHub with a Cloud Function. Probably not worth it for now

@rmlarose
Copy link
Contributor Author

Good point. Maybe we can have a simple shell script in this repo to optionally download and then execute the script

if [[ check/nbfmt doesn't exist ]]
  wget it from cirq
fi

check/nbfmt

which the CI can just call (as well as contributors). [and add the downloaded file to .gitignore so people don't commit it]

Went with this method.

  • One can do dev_tools/nbfmt [--apply] locally which will grab the Cirq nbformat script (if it doesn't already exist) then run it.
  • nbformat added to gitignore.
  • Back to large --indent=1 diff, but on par with Cirq.

Works well I think, but I guess we're kind of crossing our fingers that no one moves or renames nbformat in Cirq.

@mpharrigan
Copy link
Collaborator

we're kind of crossing our fingers that no one moves or renames nbformat in Cirq.

This would break the CI but not any user-facing functionality, i.e. the least-bad breakage. I'll keep an eye on Cirq PRs with this in mind anyways though :)

@@ -39,3 +39,6 @@ docs/generated

# Default pycharm virtual env
.venv/

# Notebook formatting script.
nbformat
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it downloads into dev_tools/, right? I guess this still works but may be overzealous. Probably doesn't matter too much

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed downloads into dev_tools

}
},
"source": [
"king_moves = cirq.Circuit(\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this deleting code?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's moving it down lower

Copy link
Collaborator

@mpharrigan mpharrigan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked the diff offline. Looks good; quantum chess notebook is a little wonky. Why are we saving outputs there? And if we are, we should run it through "fresh" so the indices are contiguous from zero

@mpharrigan mpharrigan merged commit 048b099 into quantumlib:master Jan 25, 2021
@rmlarose
Copy link
Contributor Author

quantum chess notebook is a little wonky. Why are we saving outputs there? And if we are, we should run it through "fresh" so the indices are contiguous from zero

Not sure. Maybe there was a good reason for saving output? The first cell starting at 97 is certainly not necessary - at least this doesn't show on the website or on colab though.

See #135 which follows up on this. I can address any remaining comments from this PR there if desired.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add nbfmt check to CI
3 participants