Octavius 2 #678

jonludlam · 2021-05-19T22:31:57Z

Rebrand the odoc parser as 'octavius'.

Also tidy the interface quite a lot.

It's tricky to get the cram tests to play nicely with the two packages with dune 2.7, but upgrading to dune 2.8 will currently break esy. As a compromise the octavius opam file doesn't run the tests, they're all run in the odoc opam file.

While the opam package has been split into octavius and odoc, the esy build remains one. Package build and install steps have been amended for this.

Julow

If the parser stays in this repository, I think it should be called odoc-parser so it's easier to relate packages in Opam and to guess the library name.

Julow · 2021-05-21T13:46:01Z

src/parser/test/test.ml

+        {|
+        ((output
+          (((f.ml (1 0) (1 46))
+            (code_block "ocaml env=f1 version>=4.06 " "code goes here"))))


Should we remove the trailing whitespaces ?

Julow · 2021-05-21T13:54:12Z

src/parser/lexer.mll

+    { emit_code_block input None c }
+
+  | "{@" (code_block_meta as m) "[" (code_block_text as c) "]}"
+    { emit_code_block input (Some m) c }


Will Odoc use this information in the future ? For example, the first word for syntax highlighting. I think we should parse the "language" field separately then.

Julow · 2021-05-21T13:58:24Z

src/parser/lexer.mll

 let raw_markup_target =
  ([^ ':' '%'] | '%'+ [^ ':' '%' '}'])* '%'*
+let code_block_meta =
+  ([^ '['])*


Should we allow quotes in case someone wants brackets here ? I propose

{{@ocaml foo=["bar"]}[ ... ]}, the meta part is ocaml foo=["bar"] and it should allow balanced curly braces.

Julow · 2021-05-21T14:31:20Z

src/parser/test/test.ml

+
+    let unterminated_code_block_with_meta =
+      test "{@met";
+      [%expect]


The test are not running (disabled by a (enabled_if false)). I've been able to fix the tests: Julow@061b091
Is here a reason to keep them disabled ?

Julow · 2021-05-21T14:43:04Z

src/parser/ast.ml

 type nestable_block_element =
  [ `Paragraph of inline_element with_location list
-  | `Code_block of string
+  | `Code_block of string option * string


Now that there is two fields, I think both of them should have a with_location.

Julow · 2021-05-21T15:06:24Z

src/parser/ast.ml

+    are never rendered in the output *)

-type external_tag =
+type ocamldoc_tag =


I think this reference to ocamldoc isn't necessary, we can call this tag instead.

aantron · 2021-05-25T16:16:57Z

src/parser/warning.mli

@@ -0,0 +1,8 @@
+type t = { location : Location.span; message : string }


The way this file has been moved with changes loses its git history and easy access to its authorship information.

aantron · 2021-05-25T16:45:59Z

octavius.opam

+  "Anton Bachin <antonbachin@yahoo.com>"
+  "Leo White <leo@lpw25.net>"
+  "Jules Aguillon <juloo.dsi@gmail.com>"
+  "Jon Ludlam <jon@recoil.org>"]


@jonludlam Per our Discord discussion, can you justify this authorship metadata? I could be wrong about this, but I will detail what I see:

While Leo has given important feedback that affected and advanced the development of the parser, looking at the commit history of the entire parser directory in the branch this PR is from, I don't see any commits from Leo, nor is the parser in any other way based on any code authored by Leo. Both the code and design are my original work (I don't claim that the overall approach is in any way original with respect to the "state of the art" of parsers in general, just this particular parser is not originally based on the work of anyone else listed here except me).

The next part is more debatable. Looking at the same commit history, I see a large number of important contributions from you and Jules. However, though I did not check absolutely every commit, and skipped many commits that, from their message, seemed to touch the parser only incidentally, I doubt that these contributions amount to authorship. They appear to be maintenance, which is what the maintainer field is for. Of course, this is a judgment call, but I saw many commits that added features locally, or were refactorings. While contributing important pieces of original code, these don't amount to authorship at large, in my opinion. For comparison, to illustrate (still just) my opinion in operation, even though I maintained Lwt for four years, and added many features, did a lot of clean-up, rewrote lwt.ml, lwt.mli, and several other files completely, and wrote almost all the current tests, I am still not an author of Lwt:

authors: [ "Jérôme Vouillon" "Jérémie Dimino" ] maintainer: [ "Raphaël Proust <code@bnwr.net>" "Anton Bachin <antonbachin@yahoo.com>" ]

I am a contributor and a former maintainer. I did not set the original structure of Lwt, nor write any major new modules (though I assigned authorship credit for sublibraries to some people that did). I didn't perform any fundamental rework that could have consituted authorship on the scale of the library (as you have done with odoc's new model, in my understanding).

I don't see that any such thing has happened (yet?) with odoc's parser.

I'm less concerned about the second point than the first, however, which I truly don't understand. My first worry is that it is simply a holdover field from Octavius v1. If that's so, I think it's clear that we can't assign authorship on the basis of renaming a library so that it shares a name with another one :)

From an opam-repository perspective, we don't take a strong position on how projects ascribe metadata such as authorship vs maintainers. Thinking about it more now, we could take advantage of the multi-versioning packaging to achieve two goals:

precise author information for each published version of the package (as suggested by @aantron above), so that the version-specific page on the docs site reflects who the primary author(s) of the package in question are.

aggregated author information from all versions of the opam package on the main package homepage, which shows the full historical set of authors associated with that package name. This would be attribute those who were associated with the project in the past, but do not have current code.

This would be attribute those who were associated with the project in the past, but do not have current code.

In this particular case, however, the previous authors are not associated with v2 at all (as authors). It's just a completely separate project. It's not even a question of current code — it's not that their code was deleted inside the project, and then replaced by new code. The current parser, as a parser, is a completely separate, fresh project, that is now being renamed to match the name of another project. How are the previous authors associated with the present project in the past? They are associated with a separate, past project, which, confusingly, has the same name.

The current parser, compared to the original Octavius, is a fresh design, fresh code, fresh metadata, fresh tests, etc., strarted completely from scratch, with no inspiration from Octavius, and, indeed, in a sort of "opposition" (in design terms) to it. The only relation it has to the original Octavius is that it was meant to fill the same role within the larger odoc project, but as a parser, it is completely new and separate. There is no way in which the authors of the past Octavius can automatically be considered as past authors of the "current" "Octavius," except, of course, unless they actually did contribute as authors to the new one (which they did not).

aggregated author information from all versions of the opam package on the main package homepage, which shows the full historical set of authors associated with that package name.

This might make sense for almost all packages, but, for the reasons above, it does not make sense if we will practice dropping in one project into the name slot of another, as is happening in this PR. The package page, especially if it tends to display the latest version's other metadata, will be misleading if it shows authorship metadata of earlier, completely unrelated versions of the supposedly "same" package, mixed in with current authorship metadata.

The per-version metadata should still display the authors of that version, however, I do agree with that.

we have no way of distinguishing the subtlety of a rewrite in opam

Sorry to spam, but re-reading this... Octavius v2 is not a "rewrite" of Octavius v1. It's just a different project. It's not API-compatible, similar in design, etc. There is no subtlety about it. I understand that opam can't express that it's a new project, but, as already mentioned by me and several people, that seems like a good reason not to shadow Octavius v1 if you're not willing to also be accurate in the project's metadata about the actual origin of the so-called "Octavius v2" (a name I never had in mind).

To give some detail on the API incompatibility, it's not just some minor breaking change, but a completely new API with new data structures (though there is inevitably some similarity due to them representing the same thing — on the other hand, there are fundamental differences like the representation of spacing, EDIT: and the clear separation of flow and phrasing content). When I integrated the new parser into odoc, I replaced the comment markup generator, and made invasive changes throughout odoc. So it's not a matter of some "unfortunate minor breakage" and maybe a few internal cleanups causing v1 to become v2. It's literally a different unrelated project with different concepts deliberately chosen even at the type level.

EDIT: so how can this be called a "rewrite?" I wasn't "rewriting" Octavius v1.

Your objection is noted, and your work won't be renamed to octavius v2. We'll have to figure out then how to ensure that octavius v1 ceases to be used by anyone in a future odoc developer meeting (which you are, of course, most welcome to join if you want).

Thank you. I don't have a specific goal of this not being called octavius, by the way. Just if it's in combination with the above.

As another thought, it just occurred to me that perhaps nobody working here on this issue has looked in detail at the code of Octavius v1 (could be wrong), and so is immediately aware of how substantially different the two parsers are.

I'm entirely aware of the history of both octavius v1 and this parser, having been around since day 0 of this project :-) My example of dns was deliberate: it is a total rewrite in exactly the same style as the odoc parsers being discussed here; see mirage/ocaml-dns#159 for more (no common provenances between the codebases).

The point of keeping the same name from an opam perspective is that both the old and new libraries are fulfilling the same purpose. A user who comes along and was using octavius v1 would note there is a new version with a major bump, and follow a migration guide, and eventually be on v2 (with its new interface). The goal is to hide the internal details of how it came to be from users and make a statement that "the successor to octavius v1 is v2". opam has upper bounds available for older consumers of the library.

However, this only really works if there's consensus among the developers of the old and the new. In this case, there isn't (which is just fine), and the fastest way to move on is just to provide clarity to our users about which ocamldoc parser they are supposed to be using as part of the release of the new one. For instance, a quick search reveals Imandra uses Octavius currently.

Create a new octavius package

9fc50d1

It's tricky to get the cram tests to play nicely with the two packages with dune 2.7, but upgrading to dune 2.8 will currently break esy. As a compromise the octavius opam file doesn't run the tests, they're all run in the odoc opam file.

jonludlam force-pushed the odoc-parser-octavius branch from bef11e7 to e0d7094 Compare May 19, 2021 23:55

jonludlam added 8 commits May 20, 2021 01:09

Remove inline tests

1cd818a

Remove esy.lock

2ad228a

Fix esy build and install

adb594b

While the opam package has been split into octavius and odoc, the esy build remains one. Package build and install steps have been amended for this.

Add support for code blocks with metadata

27ce60a

Add Octavius.Warnings.to_loc

7d45698

Sanitise Octavius interface some more

ad042db

More tidying of the Octavius interface

1e26116

Rename Location to Loc

0312a90

jonludlam force-pushed the odoc-parser-octavius branch from e0d7094 to 0312a90 Compare May 20, 2021 00:09

Julow reviewed May 21, 2021

View reviewed changes

aantron reviewed May 25, 2021

View reviewed changes

avsm closed this May 26, 2021

Julow mentioned this pull request Jun 10, 2021

Odoc parser opam package #685

Closed

		@@ -0,0 +1,8 @@
		type t = { location : Location.span; message : string }

Octavius 2 #678

Octavius 2 #678

Uh oh!

Conversation

jonludlam commented May 19, 2021

Uh oh!

Julow left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aantron May 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aantron May 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aantron May 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aantron May 26, 2021 •

edited

Loading

aantron May 26, 2021 •

edited

Loading

aantron May 26, 2021 •

edited

Loading