Skip to content

Conversation

jonludlam
Copy link
Member

Rebrand the odoc parser as 'octavius'.

Also tidy the interface quite a lot.

It's tricky to get the cram tests to play nicely with the two packages
with dune 2.7, but upgrading to dune 2.8 will currently break esy.
As a compromise the octavius opam file doesn't run the tests, they're
all run in the odoc opam file.
@jonludlam jonludlam force-pushed the odoc-parser-octavius branch from bef11e7 to e0d7094 Compare May 19, 2021 23:55
@jonludlam jonludlam force-pushed the odoc-parser-octavius branch from e0d7094 to 0312a90 Compare May 20, 2021 00:09
Copy link
Collaborator

@Julow Julow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the parser stays in this repository, I think it should be called odoc-parser so it's easier to relate packages in Opam and to guess the library name.

{|
((output
(((f.ml (1 0) (1 46))
(code_block "ocaml env=f1 version>=4.06 " "code goes here"))))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove the trailing whitespaces ?

{ emit_code_block input None c }

| "{@" (code_block_meta as m) "[" (code_block_text as c) "]}"
{ emit_code_block input (Some m) c }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will Odoc use this information in the future ? For example, the first word for syntax highlighting. I think we should parse the "language" field separately then.

let raw_markup_target =
([^ ':' '%'] | '%'+ [^ ':' '%' '}'])* '%'*
let code_block_meta =
([^ '['])*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we allow quotes in case someone wants brackets here ? I propose

  • {{@ocaml foo=["bar"]}[ ... ]}, the meta part is ocaml foo=["bar"] and it should allow balanced curly braces.


let unterminated_code_block_with_meta =
test "{@met";
[%expect]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test are not running (disabled by a (enabled_if false)). I've been able to fix the tests: Julow@061b091
Is here a reason to keep them disabled ?

type nestable_block_element =
[ `Paragraph of inline_element with_location list
| `Code_block of string
| `Code_block of string option * string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that there is two fields, I think both of them should have a with_location.

are never rendered in the output *)

type external_tag =
type ocamldoc_tag =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this reference to ocamldoc isn't necessary, we can call this tag instead.

@@ -0,0 +1,8 @@
type t = { location : Location.span; message : string }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way this file has been moved with changes loses its git history and easy access to its authorship information.

"Anton Bachin <antonbachin@yahoo.com>"
"Leo White <leo@lpw25.net>"
"Jules Aguillon <juloo.dsi@gmail.com>"
"Jon Ludlam <jon@recoil.org>"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonludlam Per our Discord discussion, can you justify this authorship metadata? I could be wrong about this, but I will detail what I see:

  • While Leo has given important feedback that affected and advanced the development of the parser, looking at the commit history of the entire parser directory in the branch this PR is from, I don't see any commits from Leo, nor is the parser in any other way based on any code authored by Leo. Both the code and design are my original work (I don't claim that the overall approach is in any way original with respect to the "state of the art" of parsers in general, just this particular parser is not originally based on the work of anyone else listed here except me).

  • The next part is more debatable. Looking at the same commit history, I see a large number of important contributions from you and Jules. However, though I did not check absolutely every commit, and skipped many commits that, from their message, seemed to touch the parser only incidentally, I doubt that these contributions amount to authorship. They appear to be maintenance, which is what the maintainer field is for. Of course, this is a judgment call, but I saw many commits that added features locally, or were refactorings. While contributing important pieces of original code, these don't amount to authorship at large, in my opinion. For comparison, to illustrate (still just) my opinion in operation, even though I maintained Lwt for four years, and added many features, did a lot of clean-up, rewrote lwt.ml, lwt.mli, and several other files completely, and wrote almost all the current tests, I am still not an author of Lwt:

    authors: [
      "Jérôme Vouillon"
      "Jérémie Dimino"
    ]
    maintainer: [
      "Raphaël Proust <code@bnwr.net>"
      "Anton Bachin <antonbachin@yahoo.com>"
    ]
    

    I am a contributor and a former maintainer. I did not set the original structure of Lwt, nor write any major new modules (though I assigned authorship credit for sublibraries to some people that did). I didn't perform any fundamental rework that could have consituted authorship on the scale of the library (as you have done with odoc's new model, in my understanding).

    I don't see that any such thing has happened (yet?) with odoc's parser.

I'm less concerned about the second point than the first, however, which I truly don't understand. My first worry is that it is simply a holdover field from Octavius v1. If that's so, I think it's clear that we can't assign authorship on the basis of renaming a library so that it shares a name with another one :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an opam-repository perspective, we don't take a strong position on how projects ascribe metadata such as authorship vs maintainers. Thinking about it more now, we could take advantage of the multi-versioning packaging to achieve two goals:

  • precise author information for each published version of the package (as suggested by @aantron above), so that the version-specific page on the docs site reflects who the primary author(s) of the package in question are.
  • aggregated author information from all versions of the opam package on the main package homepage, which shows the full historical set of authors associated with that package name. This would be attribute those who were associated with the project in the past, but do not have current code.

Copy link
Contributor

@aantron aantron May 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be attribute those who were associated with the project in the past, but do not have current code.

In this particular case, however, the previous authors are not associated with v2 at all (as authors). It's just a completely separate project. It's not even a question of current code — it's not that their code was deleted inside the project, and then replaced by new code. The current parser, as a parser, is a completely separate, fresh project, that is now being renamed to match the name of another project. How are the previous authors associated with the present project in the past? They are associated with a separate, past project, which, confusingly, has the same name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current parser, compared to the original Octavius, is a fresh design, fresh code, fresh metadata, fresh tests, etc., strarted completely from scratch, with no inspiration from Octavius, and, indeed, in a sort of "opposition" (in design terms) to it. The only relation it has to the original Octavius is that it was meant to fill the same role within the larger odoc project, but as a parser, it is completely new and separate. There is no way in which the authors of the past Octavius can automatically be considered as past authors of the "current" "Octavius," except, of course, unless they actually did contribute as authors to the new one (which they did not).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • aggregated author information from all versions of the opam package on the main package homepage, which shows the full historical set of authors associated with that package name.

This might make sense for almost all packages, but, for the reasons above, it does not make sense if we will practice dropping in one project into the name slot of another, as is happening in this PR. The package page, especially if it tends to display the latest version's other metadata, will be misleading if it shows authorship metadata of earlier, completely unrelated versions of the supposedly "same" package, mixed in with current authorship metadata.

The per-version metadata should still display the authors of that version, however, I do agree with that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have no way of distinguishing the subtlety of a rewrite in opam

Sorry to spam, but re-reading this... Octavius v2 is not a "rewrite" of Octavius v1. It's just a different project. It's not API-compatible, similar in design, etc. There is no subtlety about it. I understand that opam can't express that it's a new project, but, as already mentioned by me and several people, that seems like a good reason not to shadow Octavius v1 if you're not willing to also be accurate in the project's metadata about the actual origin of the so-called "Octavius v2" (a name I never had in mind).

Copy link
Contributor

@aantron aantron May 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To give some detail on the API incompatibility, it's not just some minor breaking change, but a completely new API with new data structures (though there is inevitably some similarity due to them representing the same thing — on the other hand, there are fundamental differences like the representation of spacing, EDIT: and the clear separation of flow and phrasing content). When I integrated the new parser into odoc, I replaced the comment markup generator, and made invasive changes throughout odoc. So it's not a matter of some "unfortunate minor breakage" and maybe a few internal cleanups causing v1 to become v2. It's literally a different unrelated project with different concepts deliberately chosen even at the type level.

EDIT: so how can this be called a "rewrite?" I wasn't "rewriting" Octavius v1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your objection is noted, and your work won't be renamed to octavius v2. We'll have to figure out then how to ensure that octavius v1 ceases to be used by anyone in a future odoc developer meeting (which you are, of course, most welcome to join if you want).

Copy link
Contributor

@aantron aantron May 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I don't have a specific goal of this not being called octavius, by the way. Just if it's in combination with the above.

As another thought, it just occurred to me that perhaps nobody working here on this issue has looked in detail at the code of Octavius v1 (could be wrong), and so is immediately aware of how substantially different the two parsers are.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm entirely aware of the history of both octavius v1 and this parser, having been around since day 0 of this project :-) My example of dns was deliberate: it is a total rewrite in exactly the same style as the odoc parsers being discussed here; see mirage/ocaml-dns#159 for more (no common provenances between the codebases).

The point of keeping the same name from an opam perspective is that both the old and new libraries are fulfilling the same purpose. A user who comes along and was using octavius v1 would note there is a new version with a major bump, and follow a migration guide, and eventually be on v2 (with its new interface). The goal is to hide the internal details of how it came to be from users and make a statement that "the successor to octavius v1 is v2". opam has upper bounds available for older consumers of the library.

However, this only really works if there's consensus among the developers of the old and the new. In this case, there isn't (which is just fine), and the fastest way to move on is just to provide clarity to our users about which ocamldoc parser they are supposed to be using as part of the release of the new one. For instance, a quick search reveals Imandra uses Octavius currently.

@avsm avsm closed this May 26, 2021
@Julow Julow mentioned this pull request Jun 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants