Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add odoc lint command for mli and mld validation #226

Closed
wants to merge 1 commit into from

Conversation

rizo
Copy link
Collaborator

@rizo rizo commented Oct 23, 2018

From #217 (comment) by @avsm:

(...) is it feasible to have a odoc-lint binary that that runs over an ml/mli source file and displays any parsing inconsistencies?

This PR implements an odoc lint command that does exactly that. It parser mli and mld files and uses an AST mapper to go through all [@@ocaml.doc ...]/[@@ocaml.text ...] attributes running odoc's parser on them and producing errors (see the test expect files in this PR for examples).

I got excited about this and started implementing a ppx (#147) to validate docstings and report errors with merlin. Unfortunately I was surprised to discover that merlin doesn't convert docstrings into documentation attributes (see ocaml/merlin#876). After that is fixed implementing the ppx should be simple.

Changes

  • Implement the odoc lint FILE.{ml,mld} command.
  • Remove an unused id argument from Loader.Attrs.read_attributes.
  • Expose Attrs module in Loader interface.
  • Add Compat.Filename.remove_extension.
  • Add tests for the new lint command.

Notes

  • The current parser does not recover from errors. This means that for a documentation attributes or mld files that have multiple errors only the first one will be reported (related to New docstrings parser too strict #108).
  • Some old versions of the compiler don't seem to produce the [@@ocaml.doc] attribute on
    polymorphic variants (the CI should report all the culprits). I don't have a workaround for this.
  • The column number reported in errors seems to be slightly shifted (I'll investigate this soon).
    • Update: seems like it is already present in master. The location of the attributes is off by -3, I assume it's because it's not counting (**.
  • Support for more file formats (like ml) should be easy to implement and can be done separately.
  • The current implementation could be improved but would require significant changes to the error/warning management in the whole project. @aantron has already plans for that AFAIK.
  • I think the command could be extended to check for unresolved cross-references in the future.

Thoughts and suggestions are welcome!

- Implement the `odoc lint FILE.{ml,mld}` command.
- Remove unused `id` argument from `Loader.Attrs.read_attributes`.
- Expose `Attrs` module in `Loader` interface.
- Add `Compat.Filename.remove_extension`.
@rizo
Copy link
Collaborator Author

rizo commented Oct 23, 2018

As expected there are some failures with the 4.02 compiler in the CI. Specifically:

Done: 1168/1183 (jobs: 1)File "test/parser/lint/expect/ast.expected", line 1, characters 0-0:
Done: 1169/1183 (jobs: 1)--- test/parser/lint/expect/ast.expected	2018-10-23 23:27:26.397396290 +0000
+++ test/parser/lint/ast.output	2018-10-23 23:27:26.473397012 +0000
@@ -14,10 +14,6 @@
 unknown tag '@ConstructorDeclaration'
 File "cases/ast.mli", line 14, characters 1-17:
 unknown tag '@TypeDeclaration'
-File "cases/ast.mli", line 19, characters 6-10:
-unknown tag '@Tag'
-File "cases/ast.mli", line 20, characters 6-10:
-unknown tag '@Tag'
 File "cases/ast.mli", line 25, characters 12-29:
 unknown tag '@LabelDeclaration'
 File "cases/ast.mli", line 26, characters 13-30:
        diff (internal) (exit 1)
(cd _build/default && /usr/bin/diff -u test/parser/lint/expect/ast.expected test/parser/lint/ast.output)

What is not expected is the fact that only the 327.8 build job (the esy configuration) is reported as failed while 327.1 and 327.2 actually failed too (both use the 4.02 compiler). Other jobs don't seem to have any issues.

Copy link
Contributor

@aantron aantron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good technically.

It seems that the main differences between this and odoc compile is that odoc lint reads .mli files, while odoc compile reads .cmti (etc.) files (both read .mld files, but that is not an issue in this point).

This is simpler for the user, but will it be problematic in case of cppo being used, for example?

Sticking with odoc compile and Dune (and bsb?), don't we get, for free, a nice way to lint whole projects?

Note that

odoc lint foo.mli

and

ocamlfind opt -bin-annot -c foo.mli
odoc compile --package foo foo.cmti

produce identical output in terms of errors displayed. The difference, of course, is that the second set of commands produce intermediate files.

Given that, my suggestion would be to use existing commands instead of adding odoc lint, and hide it all behind Dune and/or bsb.

odoc compile's error handling needs improvement. I wrote some text about that but decided to leave it out of this comment, perhaps that is a separate discussion. Even with the current error handling, though, the output is the same.


Some other notes:

  1. Some old versions of the compiler don't seem to produce the [@@ocaml.doc] attribute on polymorphic variants (the CI should report all the culprits). I don't have a workaround for this.

    IIRC 4.02 is very broken. At least ocamldoc would lose free text in the middle of a .mli file. I don't know if doc attributes were affected. I suggest to add an unassociated doc comment to the test cases. The one that is there now is at the top of the module, but I suggest to add one in the middle.

  2. The column number reported in errors seems to be slightly shifted (I'll investigate this soon).

    Update: seems like it is already present in master. The location of the attributes is off by -3, I assume it's because it's not counting (**.

    Indeed. IIRC it's the responsibility of the calling code to pass an adjusted "base" location into the parser (~location argument). Looking at the code in attrs.ml right now, we might not be adjusting it correctly even during odoc compile.

    As a little note, we can't unconditionally add 3 because .mld files don't have (**.

(targets parser_errors.output)
(action (with-stderr-to parser_errors.output
(run %{workspace_root}/src/odoc/bin/main.exe lint %{input}))))
(alias
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This alias should be called from the Makefile. We are triggering the @runtest aliases manually there, IIRC to control the execution order.

Actually, it occurs to me that we might be able to express the order using dependencies in the dune files.

@@ -0,0 +1,19 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only real nit at this point is that I think this should be in test/lint, not test/parser/lint.

@dbuenzli
Copy link
Contributor

dbuenzli commented Oct 24, 2018

I'm not exactly sure what this is trying to achieve.

One of the improvement with cmti files is that it allowed to process documentation after any pre-processor has done its job (which was a problem with ocamldoc's workflow for documenting software installs since even if you had installed the .mli you had lost the information about how you needed to pre-process it).

Working again directly on .mli files you might report an inaccurate state of what will actually be in it's corresponding cmti file after it has been built.

@rizo
Copy link
Collaborator Author

rizo commented Oct 24, 2018

It seems that the main differences between this and odoc compile is that odoc lint reads .mli files, while odoc compile reads .cmti (etc.) files

The lint subcommand is for convenience only, the same effect can be achieved with compile and ignoring the result. Maybe we could add a --dry-run option to compile, but I see odoc lint as essentially an alias for that.

@aantron:

This is simpler for the user, but will it be problematic in case of cppo being used, for example?

@dbuenzli:

One of the improvement with cmti files is that it allowed to process documentation after any pre-processor has done its job

These are both good points, but mli is just one of the options. I mentioned that odoc lint can be extended to support other formats like cmti (or even odoc files). odoc lint on mli files is a convenient way to "run the odoc over the entire opam codebase" (see #217 (comment)). The intent is to follow @avsm's suggestion and run this on all opam source files.

@rizo
Copy link
Collaborator Author

rizo commented Oct 24, 2018

For reference I assume this is an example of what you were talking about, @dbuenzli: b0-system/odig#12.

@rizo
Copy link
Collaborator Author

rizo commented Oct 24, 2018

IIRC it's the responsibility of the calling code to pass an adjusted "base" location into the parser (~location argument). Looking at the code in attrs.ml right now, we might not be adjusting it correctly even during odoc compile.

I verified that, yes.

As a little note, we can't unconditionally add 3 because .mld files don't have (**.

The mld files are parsed by a different function (Loader.Attrs.read_string), so it could be done specifically for [@@ocaml.doc ...] attributes in Loader.Attrs.read_attributes after getting the location of the payload.

Reported in #227.

@dbuenzli
Copy link
Contributor

dbuenzli commented Oct 24, 2018

odoc lint on mli files is a convenient way to "run the odoc over the entire opam codebase" (see #217 (comment)). The intent is to follow @avsm's suggestion and run this on all opam source files.

I don't know what the "entire opam codebase" is. But you really rather want to check odoc compile on entire "consistant opam install bases" and thus rather on cmti files (and yes what you pointed to is an example of the problem I'm mentioning).

I can certainly see a use for this, but again I don't think it is the right answer to the problem that was highlighted there.

@rizo
Copy link
Collaborator Author

rizo commented Oct 24, 2018

you really rather want to check odoc compile on entire "consistant opam install bases"

Ok. Are you also opposed to the idea of supporting odoc lint FILE.cmti? Or adding --dry-run to odoc compile?

I can close this PR if this doesn't seem useful.

@dbuenzli
Copy link
Contributor

Note that I'm not opposed, I'm just asking how this fits in the picture to understand where this should go -- if it should -- and what problem it solves. It's always easy to pile up the features and produce code, but it has costs.

Personally I still don't understand the problem it solves.

I think odoc compile --dry-run can make sense if it buys you enough w.r.t. odoc compile -o /dev/null but that I don't know enough about the actual work odoc compile does to know for sure.

@aantron
Copy link
Contributor

aantron commented Oct 24, 2018

To further agree with the above, running odoc lint on all source files in opam is probably not feasible because some of them require preprocessing. We almost certainly want to run odoc lint on all .cmti files instead, and odoc compile, driven by Dune, already does that.

It's possible that in some projects conditional compilation, unmet optional dependencies, OS constraints, etc., will cause .cmti files not to be generated for part of the codebase, or the projects are not compiled with Dune. Then, odoc lint could extend coverage to the files of those projects, as long as they don't also require preprocessing.

However, are we actually having a problem with these uncovered .mli files at this point? We could stick with covering the duniverse, and see if there is actually a problem to be solved beyond that first.

We likely don't need --dry-run yet. A build process that runs odoc compile already produces intermediate files, .cmtis. It's probably not a big deal to produce some .odoc files, too. Once we know that we don't want .odoc files for a specific reason, perhaps organization or performance, we can add --dry-run.

So, agreeing with the above, I don't see a need for odoc lint, so, in the absence of further arguments, I'd suggest to close this PR.

For comparison, corpus.sh is a really bad way to run odoc compile on all .cmti files, but it already lints potentially all of opam, to the same degree that opam lint would (if it downloaded sources instead of trying to install a consistent switch). All we need to achieve this, at a minimum, is to write some separate script that will run find to discover all .mli and .mld files, compile each .mli to a single .cmti, and run odoc compile on it.

@aantron
Copy link
Contributor

aantron commented Oct 24, 2018

A new test in this PR, parser_errors.mli, would be quite valuable if ported to odoc compile and committed to master. It would catch the 3-column offset bug, as well as any differences in how many errors odoc compile reports, their formatting, etc. We don't have an end-to-end test for parser errors today, only end-to-end tests for overall odoc success (in test/html), and unit tests for parser errors that are largely insensitive to context (in test/parser).

@rizo
Copy link
Collaborator Author

rizo commented Oct 24, 2018

That makes sense, thanks for putting things into perspective. I'll submit a separate PR with parser_errors tests later.

@rizo rizo closed this Oct 24, 2018
@rizo rizo deleted the add-lint branch October 24, 2018 16:20
aantron added a commit that referenced this pull request Oct 31, 2018
The new tests are "end-to-end" tests, meaning rather than examining the
error ASTs emitted by the parser, they check what is printed by odoc
compile to the user.

See #226 (comment).
aantron pushed a commit that referenced this pull request Oct 31, 2018
The new tests are "end-to-end" tests, meaning rather than examining the
error ASTs emitted by the parser, they check what is printed by odoc
compile to the user.

See #226 (comment).

Code is by @rizo, porting and commit text by @aantron.
aantron added a commit that referenced this pull request Oct 31, 2018
The new tests are "end-to-end" tests, meaning rather than examining the
error ASTs emitted by the parser, they check what is printed by odoc
compile to the user.

See #226 (comment).

Code is by @rizo, porting and commit text by @aantron.
jonludlam pushed a commit to jonludlam/odoc-parser-cleaned that referenced this pull request Jul 1, 2021
The new tests are "end-to-end" tests, meaning rather than examining the
error ASTs emitted by the parser, they check what is printed by odoc
compile to the user.

See ocaml/odoc#226 (comment).

Code is by @rizo, porting and commit text by @aantron.
jonludlam pushed a commit to ocaml-doc/odoc-parser that referenced this pull request Jul 2, 2021
The new tests are "end-to-end" tests, meaning rather than examining the
error ASTs emitted by the parser, they check what is printed by odoc
compile to the user.

See ocaml/odoc#226 (comment).

Code is by @rizo, porting and commit text by @aantron.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants