feat: add missingDocs linter #1390

digama0 · 2022-07-30T11:33:24Z

The main addition is a linter for missing doc strings. It's approximately similar to the mathlib doc string linter, except that it works directly on syntax instead of looking at the environment. (I think new API is required to be able to quickly find out what declarations were added by a given command, if we want the environment-sensitive version like the mathlib4 linter. Otherwise it would be too slow to run after each command.)

The linter does not yet support any options and is not yet extensible to new user-defined commands. The extensibility should be fairly straightforward, and linter options require feedback from users to actually turn the linter on and get too many or too few linter warnings.

This linter is disabled by default, and of course it is not enabled in the lean repo itself. (Fun fact, if you do you get 10391 missing doc string warnings.) It hasn't yet been tested on mathlib, but hopefully it should be possible to enable there.

Declarations which will be linted if they don't have a docstring (unless the declaration is local / private):

abbrev, axiom, declare_config_elab, declare_simp_like_tactic, declare_syntax_cat, def, elab, (class) inductive, (builtin_)initialize, macro, opaque, register_(builtin_)option, register_simp_attr, structure / class, syntax

Declarations that take a docstring but are not linted:

elab_rules, example, inductive, instance, macro_rules, theorem, unif_hint

Other modifications:

The default setting of linter.all has been changed to false, since we don't want all linters on by default (this will surely not be the last overly-pedantic lint).
The getLinterAll function has been largely replaced by getLinterValue, which does the correct defaulting for a linter setting: if the specific linter option is set, use that, else use linter.all if set, else use the specific linter default value.
declare_simp_like_tactic takes a docstring now (and it is linted). It probably won't get external use but those tactics need docstrings anyway.
Some commands have been given explicit names so I could refer to them. (I'm not sure whether it would be better to use syntax match since I'm worried about the performance impact, linters are called everywhere and we can't exit early because the user can turn on the linter in the middle of the command.)

Kha · 2022-08-01T09:31:41Z

Hmm, this is a bit puzzling.

digama0 · 2022-08-01T09:36:21Z

There is some performance cost here to running the linter, but I don't see any reason for the huge memory increase. The linter doesn't even have any state?

I don't know much about how profiling works around here. Where did you get that report, and how can I test it locally?

Kha · 2022-08-01T10:22:01Z

The report is at http://speedcenter.informatik.kit.edu/velcom/run-detail/1f93e04e-143b-4d33-b060-15d9f036265a, the benchmark spec is at

lean4/tests/bench/speedcenter.exec.velcom.yaml

Lines 1 to 20 in c76fa06

    
           - attributes: 
        
               description: stdlib 
        
               tags: [slow] 
        
               time: &time 
        
                 #runner: time 
        
                 # alternative config: use `perf stat` for extended properties 
        
                 runner: perf_stat 
        
                 perf_stat: 
        
                   properties: ['wall-clock', 'task-clock', 'instructions', 'branches', 'branch-misses'] 
        
                 rusage_properties: ['maxrss'] 
        
             run_config: 
        
               <<: *time 
        
               cmd: | 
        
                 bash -c 'set -eo pipefail; make LEAN_OPTS="-Dprofiler=true -Dprofiler.threshold=9999" -C ${BUILD:-../../build/release}/stage2 --output-sync --always-make -j5 make_stdlib 2>&1 > log | ./accumulate_profile.py' 
        
               max_runs: 2 
        
               parse_output: true 
        
             # initialize stage2 cmake + warmup 
        
             build_config: 
        
               cmd: | 
        
                 bash -c 'make -C ${BUILD:-../../build/release} stage2 -j8'

. In short, it's a parallel build of stage 2. Measuring maxrss for a parallel build is a bit questionable now that I think about it, but it does look quite stable: http://speedcenter.informatik.kit.edu/velcom/repo-detail/bab50d33-13a1-40c4-85a0-fda5f93f8b22?zoomXStart=1646866800000&zoomXEnd=1659391200000&dimensions=stdlib%3Amaxrss&dayEquidistant=true
I don't think I've ever debugged a maxrss regression, but my hope would be that you can reproduce it locally using /usr/bin/time -v (on e.g. compilation of a random file, assuming that the regression is not specific to one or a few specific files).

digama0 · 2022-08-01T10:47:23Z

The maxrss measurement does seem to be pretty noise-free, and something weird definitely happened on this commit, but notice that it's also only showing up in the stdlib-maxrss measurement, the other maxrss benchmarks are stable: http://speedcenter.informatik.kit.edu/velcom/repo-detail/bab50d33-13a1-40c4-85a0-fda5f93f8b22?zoomXStart=1659178852314.3838&zoomXEnd=1659344250354.632&dimensions=deriv%3Amaxrss%3A%3Aunionfind%3Amaxrss%3A%3Abinarytrees%3Amaxrss%3A%3Arbmap%3Amaxrss%3A%3Anew%20parser%20Core.lean%3Amaxrss%3A%3Arbmap_10%3Amaxrss%3A%3Astdlib%3Amaxrss%3A%3Aqsort%3Amaxrss%3A%3Arbmap_1%3Amaxrss%3A%3Aliasolver%3Amaxrss%3A%3Atests%2Fbench%2F%20interpreted%3Amaxrss%3A%3Aconst_fold%3Amaxrss&dayEquidistant=true

So it might be that compiling the new file itself takes a lot of memory?

Kha · 2022-08-01T10:55:32Z

Could be! But that in itself would be worth investigating.

Kha · 2022-08-02T11:07:43Z

Yes, it's MissingDocs.lean itself. Perhaps the ~~code generator~~ elaborator is struggling with the enormous match.

Kha · 2022-08-04T11:06:03Z

I did look into this a bit more, partially to test a new profiler, which apparently does a much better job at capturing deep stack traces (up to a limit of 64KB hard-coded in the kernel). Deep enough apparently that both Firefox and Chromium struggle to display even a small slice of the generated flamegraph, so I found a website dedicated to fast rendering of flamegraphs: https://www.speedscope.app/. You can load https://pp.ipd.kit.edu/~ullrich/tmp/coll (9MB) into it to get a pretty complete callgraph of lean src/Lean/Linter/MissingDocs.lean, only the last ~30% seem to be garbage because of the kernel stack snapshot limit.
@leodemoura What I found interesting is that most of the time is spent inside elimMVarDeps. I don't know if that is to be expected, and whether further investigations make sense at this point. It does also suggest that it might be good if we could inline small functions like MkBinding.visit in mutual recursions, both to save half of the stack frames, and to make it easier for analysis tools to collapse the recursion.

digama0 force-pushed the missing_docs branch from 8f68921 to 868fbf8 Compare July 30, 2022 11:48

feat: add missingDocs linter

8e16d20

digama0 force-pushed the missing_docs branch from 868fbf8 to 8e16d20 Compare July 30, 2022 11:53

leodemoura merged commit c952c69 into leanprover:master Aug 1, 2022

digama0 deleted the missing_docs branch August 1, 2022 09:37

digama0 mentioned this pull request Aug 3, 2022

feat: make MissingDocs extensible #1414

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add missingDocs linter #1390

feat: add missingDocs linter #1390

digama0 commented Jul 30, 2022 •

edited

Loading

Kha commented Aug 1, 2022

digama0 commented Aug 1, 2022 •

edited

Loading

Kha commented Aug 1, 2022

digama0 commented Aug 1, 2022

Kha commented Aug 1, 2022

Kha commented Aug 2, 2022 •

edited

Loading

Kha commented Aug 4, 2022 •

edited

Loading

feat: add missingDocs linter #1390

feat: add missingDocs linter #1390

Conversation

digama0 commented Jul 30, 2022 • edited Loading

Kha commented Aug 1, 2022

digama0 commented Aug 1, 2022 • edited Loading

Kha commented Aug 1, 2022

digama0 commented Aug 1, 2022

Kha commented Aug 1, 2022

Kha commented Aug 2, 2022 • edited Loading

Kha commented Aug 4, 2022 • edited Loading

digama0 commented Jul 30, 2022 •

edited

Loading

digama0 commented Aug 1, 2022 •

edited

Loading

Kha commented Aug 2, 2022 •

edited

Loading

Kha commented Aug 4, 2022 •

edited

Loading