Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add missingDocs linter #1390

Merged
merged 1 commit into from
Aug 1, 2022
Merged

Conversation

digama0
Copy link
Collaborator

@digama0 digama0 commented Jul 30, 2022

The main addition is a linter for missing doc strings. It's approximately similar to the mathlib doc string linter, except that it works directly on syntax instead of looking at the environment. (I think new API is required to be able to quickly find out what declarations were added by a given command, if we want the environment-sensitive version like the mathlib4 linter. Otherwise it would be too slow to run after each command.)

The linter does not yet support any options and is not yet extensible to new user-defined commands. The extensibility should be fairly straightforward, and linter options require feedback from users to actually turn the linter on and get too many or too few linter warnings.

This linter is disabled by default, and of course it is not enabled in the lean repo itself. (Fun fact, if you do you get 10391 missing doc string warnings.) It hasn't yet been tested on mathlib, but hopefully it should be possible to enable there.

Declarations which will be linted if they don't have a docstring (unless the declaration is local / private):

  • abbrev, axiom, declare_config_elab, declare_simp_like_tactic, declare_syntax_cat, def, elab, (class) inductive, (builtin_)initialize, macro, opaque, register_(builtin_)option, register_simp_attr, structure / class, syntax

Declarations that take a docstring but are not linted:

  • elab_rules, example, inductive, instance, macro_rules, theorem, unif_hint

Other modifications:

  • The default setting of linter.all has been changed to false, since we don't want all linters on by default (this will surely not be the last overly-pedantic lint).
  • The getLinterAll function has been largely replaced by getLinterValue, which does the correct defaulting for a linter setting: if the specific linter option is set, use that, else use linter.all if set, else use the specific linter default value.
  • declare_simp_like_tactic takes a docstring now (and it is linted). It probably won't get external use but those tactics need docstrings anyway.
  • Some commands have been given explicit names so I could refer to them. (I'm not sure whether it would be better to use syntax match since I'm worried about the performance impact, linters are called everywhere and we can't exit early because the user can turn on the linter in the middle of the command.)

@leodemoura leodemoura merged commit c952c69 into leanprover:master Aug 1, 2022
@Kha
Copy link
Member

Kha commented Aug 1, 2022

image
Hmm, this is a bit puzzling.

@digama0
Copy link
Collaborator Author

digama0 commented Aug 1, 2022

There is some performance cost here to running the linter, but I don't see any reason for the huge memory increase. The linter doesn't even have any state?

I don't know much about how profiling works around here. Where did you get that report, and how can I test it locally?

@digama0 digama0 deleted the missing_docs branch August 1, 2022 09:37
@Kha
Copy link
Member

Kha commented Aug 1, 2022

The report is at http://speedcenter.informatik.kit.edu/velcom/run-detail/1f93e04e-143b-4d33-b060-15d9f036265a, the benchmark spec is at

- attributes:
description: stdlib
tags: [slow]
time: &time
#runner: time
# alternative config: use `perf stat` for extended properties
runner: perf_stat
perf_stat:
properties: ['wall-clock', 'task-clock', 'instructions', 'branches', 'branch-misses']
rusage_properties: ['maxrss']
run_config:
<<: *time
cmd: |
bash -c 'set -eo pipefail; make LEAN_OPTS="-Dprofiler=true -Dprofiler.threshold=9999" -C ${BUILD:-../../build/release}/stage2 --output-sync --always-make -j5 make_stdlib 2>&1 > log | ./accumulate_profile.py'
max_runs: 2
parse_output: true
# initialize stage2 cmake + warmup
build_config:
cmd: |
bash -c 'make -C ${BUILD:-../../build/release} stage2 -j8'
. In short, it's a parallel build of stage 2. Measuring maxrss for a parallel build is a bit questionable now that I think about it, but it does look quite stable: http://speedcenter.informatik.kit.edu/velcom/repo-detail/bab50d33-13a1-40c4-85a0-fda5f93f8b22?zoomXStart=1646866800000&zoomXEnd=1659391200000&dimensions=stdlib%3Amaxrss&dayEquidistant=true
I don't think I've ever debugged a maxrss regression, but my hope would be that you can reproduce it locally using /usr/bin/time -v (on e.g. compilation of a random file, assuming that the regression is not specific to one or a few specific files).

@digama0
Copy link
Collaborator Author

digama0 commented Aug 1, 2022

@Kha
Copy link
Member

Kha commented Aug 1, 2022

Could be! But that in itself would be worth investigating.

@Kha
Copy link
Member

Kha commented Aug 2, 2022

Yes, it's MissingDocs.lean itself. Perhaps the code generator elaborator is struggling with the enormous match.
image

@Kha
Copy link
Member

Kha commented Aug 4, 2022

I did look into this a bit more, partially to test a new profiler, which apparently does a much better job at capturing deep stack traces (up to a limit of 64KB hard-coded in the kernel). Deep enough apparently that both Firefox and Chromium struggle to display even a small slice of the generated flamegraph, so I found a website dedicated to fast rendering of flamegraphs: https://www.speedscope.app/. You can load https://pp.ipd.kit.edu/~ullrich/tmp/coll (9MB) into it to get a pretty complete callgraph of lean src/Lean/Linter/MissingDocs.lean, only the last ~30% seem to be garbage because of the kernel stack snapshot limit.
@leodemoura What I found interesting is that most of the time is spent inside elimMVarDeps. I don't know if that is to be expected, and whether further investigations make sense at this point. It does also suggest that it might be good if we could inline small functions like MkBinding.visit in mutual recursions, both to save half of the stack frames, and to make it easier for analysis tools to collapse the recursion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants