Add odoc lint command for mli and mld validation #226

rizo · 2018-10-23T23:25:39Z

(...) is it feasible to have a odoc-lint binary that that runs over an ml/mli source file and displays any parsing inconsistencies?

This PR implements an odoc lint command that does exactly that. It parser mli and mld files and uses an AST mapper to go through all [@@ocaml.doc ...]/[@@ocaml.text ...] attributes running odoc's parser on them and producing errors (see the test expect files in this PR for examples).

I got excited about this and started implementing a ppx (#147) to validate docstings and report errors with merlin. Unfortunately I was surprised to discover that merlin doesn't convert docstrings into documentation attributes (see ocaml/merlin#876). After that is fixed implementing the ppx should be simple.

Changes

Implement the odoc lint FILE.{ml,mld} command.
Remove an unused id argument from Loader.Attrs.read_attributes.
Expose Attrs module in Loader interface.
Add Compat.Filename.remove_extension.
Add tests for the new lint command.

Notes

The current parser does not recover from errors. This means that for a documentation attributes or mld files that have multiple errors only the first one will be reported (related to New docstrings parser too strict #108).
Some old versions of the compiler don't seem to produce the [@@ocaml.doc] attribute on
polymorphic variants (the CI should report all the culprits). I don't have a workaround for this.
The column number reported in errors seems to be slightly shifted (I'll investigate this soon).
- Update: seems like it is already present in master. The location of the attributes is off by -3, I assume it's because it's not counting (**.
Support for more file formats (like ml) should be easy to implement and can be done separately.
The current implementation could be improved but would require significant changes to the error/warning management in the whole project. @aantron has already plans for that AFAIK.
I think the command could be extended to check for unresolved cross-references in the future.

Thoughts and suggestions are welcome!

- Implement the `odoc lint FILE.{ml,mld}` command. - Remove unused `id` argument from `Loader.Attrs.read_attributes`. - Expose `Attrs` module in `Loader` interface. - Add `Compat.Filename.remove_extension`.

rizo · 2018-10-23T23:42:30Z

As expected there are some failures with the 4.02 compiler in the CI. Specifically:

Done: 1168/1183 (jobs: 1)File "test/parser/lint/expect/ast.expected", line 1, characters 0-0:
Done: 1169/1183 (jobs: 1)--- test/parser/lint/expect/ast.expected	2018-10-23 23:27:26.397396290 +0000
+++ test/parser/lint/ast.output	2018-10-23 23:27:26.473397012 +0000
@@ -14,10 +14,6 @@
 unknown tag '@ConstructorDeclaration'
 File "cases/ast.mli", line 14, characters 1-17:
 unknown tag '@TypeDeclaration'
-File "cases/ast.mli", line 19, characters 6-10:
-unknown tag '@Tag'
-File "cases/ast.mli", line 20, characters 6-10:
-unknown tag '@Tag'
 File "cases/ast.mli", line 25, characters 12-29:
 unknown tag '@LabelDeclaration'
 File "cases/ast.mli", line 26, characters 13-30:
        diff (internal) (exit 1)
(cd _build/default && /usr/bin/diff -u test/parser/lint/expect/ast.expected test/parser/lint/ast.output)

What is not expected is the fact that only the 327.8 build job (the esy configuration) is reported as failed while 327.1 and 327.2 actually failed too (both use the 4.02 compiler). Other jobs don't seem to have any issues.

aantron

The PR looks good technically.

It seems that the main differences between this and odoc compile is that odoc lint reads .mli files, while odoc compile reads .cmti (etc.) files (both read .mld files, but that is not an issue in this point).

This is simpler for the user, but will it be problematic in case of cppo being used, for example?

Sticking with odoc compile and Dune (and bsb?), don't we get, for free, a nice way to lint whole projects?

Note that
odoc lint foo.mli
and
ocamlfind opt -bin-annot -c foo.mli
odoc compile --package foo foo.cmti
produce identical output in terms of errors displayed. The difference, of course, is that the second set of commands produce intermediate files.

Given that, my suggestion would be to use existing commands instead of adding odoc lint, and hide it all behind Dune and/or bsb.

odoc compile's error handling needs improvement. I wrote some text about that but decided to leave it out of this comment, perhaps that is a separate discussion. Even with the current error handling, though, the output is the same.

Some other notes:

Some old versions of the compiler don't seem to produce the [@@ocaml.doc] attribute on polymorphic variants (the CI should report all the culprits). I don't have a workaround for this.

IIRC 4.02 is very broken. At least ocamldoc would lose free text in the middle of a .mli file. I don't know if doc attributes were affected. I suggest to add an unassociated doc comment to the test cases. The one that is there now is at the top of the module, but I suggest to add one in the middle.
The column number reported in errors seems to be slightly shifted (I'll investigate this soon).

Update: seems like it is already present in master. The location of the attributes is off by -3, I assume it's because it's not counting (**.

Indeed. IIRC it's the responsibility of the calling code to pass an adjusted "base" location into the parser (~location argument). Looking at the code in attrs.ml right now, we might not be adjusting it correctly even during odoc compile.

As a little note, we can't unconditionally add 3 because .mld files don't have (**.

aantron · 2018-10-24T02:57:14Z

test/parser/lint/dune

+  (targets parser_errors.output)
+  (action (with-stderr-to parser_errors.output
+		        (run %{workspace_root}/src/odoc/bin/main.exe lint %{input}))))
+(alias


This alias should be called from the Makefile. We are triggering the @runtest aliases manually there, IIRC to control the execution order.

Actually, it occurs to me that we might be able to express the order using dependencies in the dune files.

aantron · 2018-10-24T02:57:35Z

test/parser/lint/dune

@@ -0,0 +1,19 @@
+


My only real nit at this point is that I think this should be in test/lint, not test/parser/lint.

dbuenzli · 2018-10-24T09:21:05Z

I'm not exactly sure what this is trying to achieve.

One of the improvement with cmti files is that it allowed to process documentation after any pre-processor has done its job (which was a problem with ocamldoc's workflow for documenting software installs since even if you had installed the .mli you had lost the information about how you needed to pre-process it).

Working again directly on .mli files you might report an inaccurate state of what will actually be in it's corresponding cmti file after it has been built.

rizo · 2018-10-24T09:59:46Z

It seems that the main differences between this and odoc compile is that odoc lint reads .mli files, while odoc compile reads .cmti (etc.) files

The lint subcommand is for convenience only, the same effect can be achieved with compile and ignoring the result. Maybe we could add a --dry-run option to compile, but I see odoc lint as essentially an alias for that.

@aantron:

This is simpler for the user, but will it be problematic in case of cppo being used, for example?

@dbuenzli:

One of the improvement with cmti files is that it allowed to process documentation after any pre-processor has done its job

These are both good points, but mli is just one of the options. I mentioned that odoc lint can be extended to support other formats like cmti (or even odoc files). odoc lint on mli files is a convenient way to "run the odoc over the entire opam codebase" (see #217 (comment)). The intent is to follow @avsm's suggestion and run this on all opam source files.

rizo · 2018-10-24T10:07:55Z

For reference I assume this is an example of what you were talking about, @dbuenzli: b0-system/odig#12.

rizo · 2018-10-24T10:13:44Z

IIRC it's the responsibility of the calling code to pass an adjusted "base" location into the parser (~location argument). Looking at the code in attrs.ml right now, we might not be adjusting it correctly even during odoc compile.

I verified that, yes.

As a little note, we can't unconditionally add 3 because .mld files don't have (**.

The mld files are parsed by a different function (Loader.Attrs.read_string), so it could be done specifically for [@@ocaml.doc ...] attributes in Loader.Attrs.read_attributes after getting the location of the payload.

Reported in #227.

dbuenzli · 2018-10-24T10:33:10Z

odoc lint on mli files is a convenient way to "run the odoc over the entire opam codebase" (see #217 (comment)). The intent is to follow @avsm's suggestion and run this on all opam source files.

I don't know what the "entire opam codebase" is. But you really rather want to check odoc compile on entire "consistant opam install bases" and thus rather on cmti files (and yes what you pointed to is an example of the problem I'm mentioning).

I can certainly see a use for this, but again I don't think it is the right answer to the problem that was highlighted there.

rizo · 2018-10-24T11:03:46Z

you really rather want to check odoc compile on entire "consistant opam install bases"

Ok. Are you also opposed to the idea of supporting odoc lint FILE.cmti? Or adding --dry-run to odoc compile?

I can close this PR if this doesn't seem useful.

dbuenzli · 2018-10-24T11:43:07Z

Note that I'm not opposed, I'm just asking how this fits in the picture to understand where this should go -- if it should -- and what problem it solves. It's always easy to pile up the features and produce code, but it has costs.

Personally I still don't understand the problem it solves.

I think odoc compile --dry-run can make sense if it buys you enough w.r.t. odoc compile -o /dev/null but that I don't know enough about the actual work odoc compile does to know for sure.

aantron · 2018-10-24T15:37:37Z

To further agree with the above, running odoc lint on all source files in opam is probably not feasible because some of them require preprocessing. We almost certainly want to run odoc lint on all .cmti files instead, and odoc compile, driven by Dune, already does that.

It's possible that in some projects conditional compilation, unmet optional dependencies, OS constraints, etc., will cause .cmti files not to be generated for part of the codebase, or the projects are not compiled with Dune. Then, odoc lint could extend coverage to the files of those projects, as long as they don't also require preprocessing.

However, are we actually having a problem with these uncovered .mli files at this point? We could stick with covering the duniverse, and see if there is actually a problem to be solved beyond that first.

We likely don't need --dry-run yet. A build process that runs odoc compile already produces intermediate files, .cmtis. It's probably not a big deal to produce some .odoc files, too. Once we know that we don't want .odoc files for a specific reason, perhaps organization or performance, we can add --dry-run.

So, agreeing with the above, I don't see a need for odoc lint, so, in the absence of further arguments, I'd suggest to close this PR.

For comparison, corpus.sh is a really bad way to run odoc compile on all .cmti files, but it already lints potentially all of opam, to the same degree that opam lint would (if it downloaded sources instead of trying to install a consistent switch). All we need to achieve this, at a minimum, is to write some separate script that will run find to discover all .mli and .mld files, compile each .mli to a single .cmti, and run odoc compile on it.

aantron · 2018-10-24T15:56:13Z

A new test in this PR, parser_errors.mli, would be quite valuable if ported to odoc compile and committed to master. It would catch the 3-column offset bug, as well as any differences in how many errors odoc compile reports, their formatting, etc. We don't have an end-to-end test for parser errors today, only end-to-end tests for overall odoc success (in test/html), and unit tests for parser errors that are largely insensitive to context (in test/parser).

rizo · 2018-10-24T16:20:19Z

That makes sense, thanks for putting things into perspective. I'll submit a separate PR with parser_errors tests later.

The new tests are "end-to-end" tests, meaning rather than examining the error ASTs emitted by the parser, they check what is printed by odoc compile to the user. See #226 (comment).

@rizo

The new tests are "end-to-end" tests, meaning rather than examining the error ASTs emitted by the parser, they check what is printed by odoc compile to the user. See #226 (comment). Code is by @rizo, porting and commit text by @aantron.

@rizo

The new tests are "end-to-end" tests, meaning rather than examining the error ASTs emitted by the parser, they check what is printed by odoc compile to the user. See #226 (comment). Code is by @rizo, porting and commit text by @aantron.

@rizo

The new tests are "end-to-end" tests, meaning rather than examining the error ASTs emitted by the parser, they check what is printed by odoc compile to the user. See ocaml/odoc#226 (comment). Code is by @rizo, porting and commit text by @aantron.

@rizo

The new tests are "end-to-end" tests, meaning rather than examining the error ASTs emitted by the parser, they check what is printed by odoc compile to the user. See ocaml/odoc#226 (comment). Code is by @rizo, porting and commit text by @aantron.

Add odoc lint command for mli and mld validation

9709da5

- Implement the `odoc lint FILE.{ml,mld}` command. - Remove unused `id` argument from `Loader.Attrs.read_attributes`. - Expose `Attrs` module in `Loader` interface. - Add `Compat.Filename.remove_extension`.

rizo added parser command line labels Oct 23, 2018

rizo requested a review from aantron October 23, 2018 23:32

aantron reviewed Oct 24, 2018

View reviewed changes

This was referenced Oct 24, 2018

Make a ppx_odoc that does the comment checking that ppx_js_style does #147

Open

Incorrect location reported for errors in doc attributes #227

Closed

rizo closed this Oct 24, 2018

rizo deleted the add-lint branch October 24, 2018 16:20

aantron mentioned this pull request Oct 24, 2018

How to style-check comments in all packages in opam (a.k.a. "odoc lint") #228

Closed

aantron mentioned this pull request Oct 31, 2018

Port new error message tests from #226 to master #231

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add odoc lint command for mli and mld validation #226

Add odoc lint command for mli and mld validation #226

rizo commented Oct 23, 2018 •

edited

Loading

rizo commented Oct 23, 2018

aantron left a comment

aantron Oct 24, 2018

aantron Oct 24, 2018

dbuenzli commented Oct 24, 2018 •

edited

Loading

rizo commented Oct 24, 2018 •

edited

Loading

rizo commented Oct 24, 2018

rizo commented Oct 24, 2018 •

edited

Loading

dbuenzli commented Oct 24, 2018 •

edited

Loading

rizo commented Oct 24, 2018

dbuenzli commented Oct 24, 2018

aantron commented Oct 24, 2018 •

edited

Loading

aantron commented Oct 24, 2018

rizo commented Oct 24, 2018

Add odoc lint command for mli and mld validation #226

Add odoc lint command for mli and mld validation #226

Conversation

rizo commented Oct 23, 2018 • edited Loading

Changes

Notes

rizo commented Oct 23, 2018

aantron left a comment

Choose a reason for hiding this comment

aantron Oct 24, 2018

Choose a reason for hiding this comment

aantron Oct 24, 2018

Choose a reason for hiding this comment

dbuenzli commented Oct 24, 2018 • edited Loading

rizo commented Oct 24, 2018 • edited Loading

rizo commented Oct 24, 2018

rizo commented Oct 24, 2018 • edited Loading

dbuenzli commented Oct 24, 2018 • edited Loading

rizo commented Oct 24, 2018

dbuenzli commented Oct 24, 2018

aantron commented Oct 24, 2018 • edited Loading

aantron commented Oct 24, 2018

rizo commented Oct 24, 2018

rizo commented Oct 23, 2018 •

edited

Loading

dbuenzli commented Oct 24, 2018 •

edited

Loading

rizo commented Oct 24, 2018 •

edited

Loading

rizo commented Oct 24, 2018 •

edited

Loading

dbuenzli commented Oct 24, 2018 •

edited

Loading

aantron commented Oct 24, 2018 •

edited

Loading