Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - ci(scripts/*): linting for copyright, imports, module docstrings, line length #4486

Closed
wants to merge 28 commits into from

Conversation

bryangingechen
Copy link
Collaborator

@bryangingechen bryangingechen commented Oct 6, 2020

This PR adds some scripts to check the .lean files in src/ and archive/ for the following issues (which are out of scope for the current linter):

  • Malformed or missing copyright header
  • More than one file imported per line
  • Module docstring missing, or too late
  • Lines of length > 100 (unless they contain http)

The scripts are run at the end of our "tests" job since the "lint" job usually takes longer to run. (This isn't a big deal though, since they're quick.)

Current problems are saved in the file scripts/copy-mod-doc-exceptions.txt and ignored so that we don't have to fix everything up front. Over time, this should get shorter as we fix things!

Separately, this also fixes some warnings in our GitHub actions workflow (see this blog post).

Co-authored-by: Johan Commelin johan@commelin.net


We could potentially add a line-length linter to this. Any other ideas?

Reminder for myself; after this is merged, add a step that calls scripts/update-copy-mod-doc-exceptions.sh to the nolints workflow in azure-scripts.

@bryangingechen bryangingechen added awaiting-review The author would like community review of the PR CI This issue or PR is about continuous integration labels Oct 6, 2020
@fpvandoorn
Copy link
Member

Any other ideas

It could test a lot of the style guide automatically. However, at that point, we might want to integrate it into a syntactic linter that runs continuously when editing a file.
Some easy ideas: Spaces (or ` in comments) after (a fixed list of) binders, spaces around (a fixed list of) infix operators.

@bryangingechen
Copy link
Collaborator Author

bryangingechen commented Oct 7, 2020

I played around with adding "ERR_LLN" to catch lines > 100 characters. The script counted >3000 long lines in src/ and archive/. Thinking about it more though, there are a number of cases where long lines are OK (e.g. long URLs in comments, various cases where reflowing code would make it more difficult to read) and making contributors edit the exceptions file by hand could be a pain, so I'm not sure we want to strictly enforce this now.

If we want to warn about new long lines but still allow merging, we could do something complicated like only check for long lines when the branch isn't staging or master, but I'm not sure that would be worth it.

@fpvandoorn
Copy link
Member

Makes sense, there are indeed allowed exceptions for long lines.

Here is another interesting Copyright header failure: https://github.com/leanprover-community/mathlib/blob/master/archive/sensitivity.lean
Most other failures seem to be the old style to have a 1-line description of the file in the same comment as the copyright header (example). Those failures are all probably caught by the missing module docstring test, so you could consider removing the test for copy_start_line_nr + 4.

@bryangingechen
Copy link
Collaborator Author

Oh good catch. That actually reminds me, I committed a file today that fails this too: 7948b5a#diff-e6caf1da183d89ccee7cd7361943f0d7.

I think I'll just try to collect all lines between copy_started and copy_done into one string and then search that for the magic keywords "Copyright (c)", "Apache" and "Author". (Hoping that nobody tries to do anything really strange just to get around this...)

@bryangingechen
Copy link
Collaborator Author

Here's another case of a valid copyright header that fails because the opening /- is not on its own line:

/- Copyright (c) 2018 Johannes Hölzl. All rights reserved.
I guess there's an argument for making these more uniform though.

@jcommelin
Copy link
Member

My idea was indeed to have a non-urgent effort to make the copyright headers more uniform.
Arguably we could even ask that there are no empty lines above the copyright header.

@bryangingechen
Copy link
Collaborator Author

Arguably we could even ask that there are no empty lines above the copyright header.

I'd be fine with that. (Feel free to push more commits to this branch if you have more tweaks you'd like to make, I'm done until tomorrow.)

Copy link
Member

@jcommelin jcommelin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this

bryangingechen and others added 9 commits October 7, 2020 06:27
A few files with missing copyright headers in #4477 came from splitting up older files, so the attribution was incorrect:
- `data/setoid/partition.lean` was split from `data/setoid.lean` in #2853.
- `data/finset/order.lean` was split from `algebra/big_operators.lean` in #3495.
- `group_theory/submonoid/operations.lean` was split from `group_theory/submonoid.lean` in  #3058.
…t{_prop} (#4464)

This is the first PR in a series on a fundamental theorem for Witt polynomials.


Co-Authored-By: Rob Y. Lewis <rob.y.lewis@gmail.com>
…(support f) > 0 (#4410)

Also add a few supporting lemmas
Add the lemma that, for natural numbers, if `a < b` and `c < d` then
`a + c + 1 < b + d` (i.e. a stronger version of `add_lt_add` for the
natural number case).  `library_search` did not find this in mathlib.
Docstrings in `ring_theory/ideal/operations`, `ring_theory/multiplicity`, and `ring_theory/ring_invo`.
…ations (#4487)

Added composition law for applicative transformations, added rest of interface for coercion of applicative transformations to functions (lifted from `monoid_hom`), and proved composition was associative and has an identity.  Also corrected some documentation.
Adds docstrings to `rel_embedding.nat_lt` and `rel_embedding.nat_gt`
@@ -30,8 +30,8 @@ jobs:
set -o pipefail
curl https://raw.githubusercontent.com/Kha/elan/master/elan-init.sh -sSf | sh -s -- --default-toolchain none -y
~/.elan/bin/lean --version
echo "::add-path::$HOME/.elan/bin"
echo "::set-env name=short_lean_version::$(~/.elan/bin/lean --run scripts/lean_version.lean)"
echo "$HOME/.elan/bin" >> $GITHUB_PATH
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not unhappy with the change, but it's completely unrelated, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Bryan mentioned a link to a GH blog post about some vulnerability that is fixed by this change.

scripts/lint-copy-mod-doc.py Outdated Show resolved Hide resolved
scripts/nolints.txt Outdated Show resolved Hide resolved
src/control/traversable/basic.lean Outdated Show resolved Hide resolved
awainverse and others added 3 commits October 7, 2020 07:59
Adds missing docstrings in `group_theory/perm/cycles` and `group_theory/perm/sign`.
An inhabited instance for `presented_group`
Docstrings in `group_theory/abelianization` and `group_theory/congruence`.



Co-authored-by: Aaron Anderson <65780815+awainverse@users.noreply.github.com>
Co-authored-by: Bryan Gin-ge Chen <bryangingechen@gmail.com>
@@ -30,8 +30,8 @@ jobs:
set -o pipefail
curl https://raw.githubusercontent.com/Kha/elan/master/elan-init.sh -sSf | sh -s -- --default-toolchain none -y
~/.elan/bin/lean --version
echo "::add-path::$HOME/.elan/bin"
echo "::set-env name=short_lean_version::$(~/.elan/bin/lean --run scripts/lean_version.lean)"
echo "$HOME/.elan/bin" >> $GITHUB_PATH
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Bryan mentioned a link to a GH blog post about some vulnerability that is fixed by this change.

scripts/update-copy-mod-doc-exceptions.sh Outdated Show resolved Hide resolved
jcommelin and others added 3 commits October 7, 2020 10:03
# This is a little wrapper around the python script
# lint-copy-mod-doc.py

find src archive -name '*.lean' | xargs ./scripts/lint-copy-mod-doc.py | sort > scripts/copy-mod-doc-exceptions.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the creative use of the > operator that makes lint-copy-mod-doc.py read its own output, which happens to be empty at first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it too evil?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be be fine. I don't know if the shell guarantees to open the file before launching the subprocesses, but even if it doesn't find takes long enough to prevent the race condition in practice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it looks like the > line got removed. I'd prefer to have that done in this script. I'll open a new PR.

@jcommelin jcommelin changed the title ci(scripts/*): linting for copyright, imports, module docstrings ci(scripts/*): linting for copyright, imports, module docstrings, line length Oct 7, 2020
@gebner
Copy link
Member

gebner commented Oct 7, 2020

bors r+

@github-actions github-actions bot added ready-to-merge All that is left is for bors to build and merge this PR. (Remember you need to say `bors r+`.) and removed awaiting-review The author would like community review of the PR labels Oct 7, 2020
bors bot pushed a commit that referenced this pull request Oct 7, 2020
…e length (#4486)

This PR adds some scripts to check the `.lean` files in `src/` and `archive/` for the following issues (which are out of scope for the current linter):
- Malformed or missing copyright header
- More than one file imported per line
- Module docstring missing, or too late
- Lines of length > 100 (unless they contain `http`)

The scripts are run at the end of our "tests" job since the "lint" job usually takes longer to run. (This isn't a big deal though, since they're quick.)

Current problems are saved in the file `scripts/copy-mod-doc-exceptions.txt` and ignored so that we don't have to fix everything up front. Over time, this should get shorter as we fix things!

Separately, this also fixes some warnings in our GitHub actions workflow (see [this blog post](https://github.blog/changelog/2020-10-01-github-actions-deprecating-set-env-and-add-path-commands/)).

Co-authored-by: Johan Commelin <johan@commelin.net>



Co-authored-by: Bryan Gin-ge Chen <bryangingechen@gmail.com>
Co-authored-by: damiano <adomani@gmail.com>
Co-authored-by: Yury G. Kudryashov <urkud@urkud.name>
Co-authored-by: Joseph Myers <jsm@polyomino.org.uk>
Co-authored-by: hrmacbeth <25316162+hrmacbeth@users.noreply.github.com>
Co-authored-by: Aaron Anderson <awainverse@gmail.com>
Co-authored-by: Kyle Miller <kmill31415@gmail.com>
Co-authored-by: leanprover-community-bot <leanprover.community@gmail.com>
@bors
Copy link

bors bot commented Oct 7, 2020

Build failed (retrying...):

@gebner
Copy link
Member

gebner commented Oct 7, 2020

This PR shouldn't cause any recompilation, so we're probably better off merging it by itself before it is batched together with yet another long-lined PR.

bors p=10

bors bot pushed a commit that referenced this pull request Oct 7, 2020
…e length (#4486)

This PR adds some scripts to check the `.lean` files in `src/` and `archive/` for the following issues (which are out of scope for the current linter):
- Malformed or missing copyright header
- More than one file imported per line
- Module docstring missing, or too late
- Lines of length > 100 (unless they contain `http`)

The scripts are run at the end of our "tests" job since the "lint" job usually takes longer to run. (This isn't a big deal though, since they're quick.)

Current problems are saved in the file `scripts/copy-mod-doc-exceptions.txt` and ignored so that we don't have to fix everything up front. Over time, this should get shorter as we fix things!

Separately, this also fixes some warnings in our GitHub actions workflow (see [this blog post](https://github.blog/changelog/2020-10-01-github-actions-deprecating-set-env-and-add-path-commands/)).

Co-authored-by: Johan Commelin <johan@commelin.net>



Co-authored-by: Bryan Gin-ge Chen <bryangingechen@gmail.com>
Co-authored-by: damiano <adomani@gmail.com>
Co-authored-by: Yury G. Kudryashov <urkud@urkud.name>
Co-authored-by: Joseph Myers <jsm@polyomino.org.uk>
Co-authored-by: hrmacbeth <25316162+hrmacbeth@users.noreply.github.com>
Co-authored-by: Aaron Anderson <awainverse@gmail.com>
Co-authored-by: Kyle Miller <kmill31415@gmail.com>
Co-authored-by: leanprover-community-bot <leanprover.community@gmail.com>
Comment on lines +147 to +149

- name: check for copyright headers and module docstrings
run: |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be run before any of the really slow build steps? I don't want to wait hours for a build, only for it to then say my line is too long and have to start all over again because the file I wrapped the line in needs rebuilding.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning was the opposite; since fixing the issues caught by this script shouldn't typically create new build / linter issues, it's better to report those "more serious" problems first so that those can be addressed / discussed. For example, fixing a build / linter issue (e.g. from a github suggestion) might temporarily make a line too long, and I think it'd be better to first check that the suggestion "works" rather than force people to reflow everything up front.

Feel free to bring this up on Zulip, I'd be curious to hear others' thoughts on this as well.

Note that you should be able to run this script manually by executing ./scripts/lint-copy-mod-doc.sh.

@eric-wieser
Copy link
Member

Current problems are saved in the file scripts/copy-mod-doc-exceptions.txt and ignored so that we don't have to fix everything up front. Over time, this should get shorter as we fix things!

Does this handle the case where new lines are inserted and the line numbers change?

@bryangingechen
Copy link
Collaborator Author

Does this handle the case where new lines are inserted and the line numbers change?

Line numbers are ignored when reading exceptions back, so yes.

@gebner
Copy link
Member

gebner commented Oct 7, 2020

Can this be run before any of the really slow build steps? I don't want to wait hours for a build, only for it to then say my line is too long and have to start all over again because the file I wrapped the line in needs rebuilding.

It might actually be best to run it in a separate actions workflow/job altogether since it doesn't even depend on the build step. The build step should still be run even if the linter fails.

But please do that in a new PR. This one is almost merged.

@bors
Copy link

bors bot commented Oct 7, 2020

Pull request successfully merged into master.

Build succeeded:

@bors bors bot changed the title ci(scripts/*): linting for copyright, imports, module docstrings, line length [Merged by Bors] - ci(scripts/*): linting for copyright, imports, module docstrings, line length Oct 7, 2020
@bors bors bot closed this Oct 7, 2020
@bors bors bot deleted the mod-doc-lint branch October 7, 2020 16:07
bors bot pushed a commit that referenced this pull request Oct 8, 2020
…ions.sh (#4513)

Followup to #4486:
- run the linter in a separate parallel job, per request
- the update-*.sh script now correctly generates a full exceptions file
- add some more comments to the shell scripts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI This issue or PR is about continuous integration ready-to-merge All that is left is for bors to build and merge this PR. (Remember you need to say `bors r+`.)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet