Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fewer opam packages #1004

Closed
hannesm opened this issue Oct 15, 2019 · 19 comments
Closed

fewer opam packages #1004

hannesm opened this issue Oct 15, 2019 · 19 comments

Comments

@hannesm
Copy link
Member

@hannesm hannesm commented Oct 15, 2019

I intended to work on cstruct with capabilities integration into MirageOS interfaces. A first attempt was to extend various interfaces with instead of a type buffer having a type read_buffer and type write_buffer. This does not scale. I thought a bit more about how the current ecosystem is hackable (i.e. easy to understand and find definitions). Now, while documentation is great, this landing is a bit rough when I intend to understand "how is IP configured"?

So, going five steps backwards, maybe "abstract all the things" is not too useful in the end. As facts, I actually do not intend to execute a MirageOS unikernel with multiple concrete IO abstractions. I do not even intend to have a mirage release that is able to deal with multiple IO abstractions (lwt, core, effect system). The very nice work by @dinosaure on Cstruct_cap (where type t is abstract now) also led me to think that I do not really intend to have various type buffer which get disjoint concrete types. And to be honest, I find a single Ipaddr.V4.t (and Macaddr.t) sufficient -- and dozens of type aliases and equalities that need to be written down rather boilerplate than "neat system design". Furthermore, various signatures were completely abstract, but the -lwt package made not only the type 'a io = 'a Lwt.t concrete, but as well type value = string or type ipaddr = Ipaddr.V4.t.

This is the main motivation why I sat down for some time, folding the -lwt packages into the base ones, and making the interfaces more concrete (by no means I want to stick with lwt forever, I'm eager to switch to the upcoming effect system rather sooner than later -- but I'd suggest to switch over with a major release of mirage, and remove lwt in the same release). This patchset is not complete (if you want to try it out, add this opam remote https://github.com/mirage/mirage-dev/tree/easy and enjoy some sample unikernels at https://github.com/hannesm/mirage-skeleton/tree/easy) -- its sufficiently mature to successfully run mirage-tcpip tests, or compile a device-usage/network and tutorial/hello unikernel on hvt (the resulting binaries are only marginally smaller (maybe 1%) -- I did not measure the reduction in opam solver time due to the reduced package set). You can take a look (and compare with existing) at the odig odoc generated documentation.

You can as well look at the diffs of e.g. mirage-protocols or e.g. mirage-tcpip -- which imho show pretty nice cleanups (removal of type aliases).

This proposal removes quite a bit of flexibility from MirageOS, so I'm interested how to move forward? (While implementing, I also discovered that some opam packages that I thought were interface-only actually contain some code (e.g. mirage-block-lwt, mirage-flow-lwt, mirage-channel-lwt) -- imho we should ship this code in a distinct opam package, and have the interfaces used in mirage-types be mostly interface (obviously some pretty printers are ok). My point of view is that if we push this change through, both (a) moving to target-specific variants and (b) revising the signatures will be much simpler to do. Thanks for reading so far! I'm really interested in further suggestions and feedback on that matter.

EDIT another motivation was a discussion with @cfcs about all the opam packages (and other discussions with various people new to MirageOS who were confused by the multitude of interfaces and opam packages). now that I think more about it, I'd actually prefer to have these core interfaces in a flat directories -- i.e. no subdirectories in the repositories -- making the content immediate.

@avsm

This comment has been minimized.

Copy link
Member

@avsm avsm commented Oct 16, 2019

I'm generally in favour of this change, particularly these days since we have well-established libraries for all those types. It does make me yearn for having all the type definitions in a single repository again, since it was so easy to have all the signatures in one place. That's a separate PR though -- I think this is a good change in the right direction, and particularly so as we're going to be focussing on effects+multicore in 2020 as that implementation lands upstream.

@hannesm

This comment has been minimized.

Copy link
Member Author

@hannesm hannesm commented Oct 16, 2019

It does make me yearn for having all the type definitions in a single repository again

I actually prefer the split into separate repositories of the signatures (for reduction of dependency cones, esp. in implementations -- no need to recompile ethernet just because something in mirage-kv changed)

@hannesm

This comment has been minimized.

Copy link
Member Author

@hannesm hannesm commented Oct 16, 2019

while working on this changeset, I discovered that I lack knowledge what mirage-device should be -- I understand in pre-functoria / more static mirage it was sensible to have "sth that has a type t, a connect and a disconnect function (t -> unit Lwt.t))" -- nowadays connect is not part of the interface (since its arguments vary depending on implementation), and disconnect is never called from MirageOS code. The error defined by mirage-device (`Unimplemented | `Disconnected) is a bit strange to handle programmatically (and AFAICT they're barely used/constructed). Also, keep in mind that solo5 has a rather static device model, i.e. no hotplug as we've on Xen.

Some current examples of devices are: mirage-net (ok, this makes sense to have the functions above -- and start / stop listening from the device), mclock + pclock (here connect and disconnect return unit-- and are the only reason why mirage-clock requires lwt -- i.e. I'm in favour of dropping it there), mirage-stack (but both implementations (unix & direct) "ignore" disconnect (i.e. print a log message and return)).

TL;DR that's a different changeset, but I think we can improve by removing even more code (one of my favourite tasks) ;)

NB: I just don't know whether anyone is using the code/abstractions beneficially, and don't want to step on anyones toe by proposing removal of their code or code they use. I usually try GitHub search, a local grep -r of repositories I have (by cloning opam reverse dependencies) to figure out what is being used and what not (but that of course unlikely covers your private git repositories / branches or software not released to opam).

@cfcs

This comment has been minimized.

Copy link

@cfcs cfcs commented Oct 16, 2019

  1. I think this is a good change, especially for Cstruct.t.

  2. Re: abstracting I/O, I don't think there's a good reason to not expect Lwt.t either.
    On an intellectual, idealized, plane I would be in favor of having abstracted I/O to permit unit testing of the whole stack, like we now do in dns, but since no one is actually doing that right now, I think it seems like a decent trade-off to lose that ability and instead gain readability and beginner-friendliness.

  3. Re: devices coming and going, and their interfaces: In reality we have two current types of usage: static initialization (solo5, unix, etc - where the device signatures are largely overkill) and dynamic initialization (qubes-mirage-firewall, which I have an impression required quite a lot of hacking from @talex5 to get to a working state).
    It would be interesting to get a clear idea of what the pain points were, and how a different Mirage API could have benefited that project. I doubt that's an immediately actionable point for now here, so feel free to ignore this paragraph. :)

  4. I actually prefer the split into separate repositories of the signatures (for reduction of dependency cones, esp. in implementations -- no need to recompile ethernet just because something in mirage-kv changed)

    1. +1. This is already insanely painful when some of the more popular dependencies get updated. Hopefully getting rid of the -lwt packages will indeed bring down the solver time some; at the moment it seems like most of the time spent during these upgrades are spent in opam-land, and that the actual OCaml/dune compilation stuff is pretty fast.
@dinosaure

This comment has been minimized.

Copy link
Member

@dinosaure dinosaure commented Oct 16, 2019

So from what I know about MirageOS and abstractions and what I did over theses abstractions, in many places, we did the choice to use directly LWT instead to have an async-compatible (or agnostic) implementation for some libraries (like ocaml-git).

So, for me, the choice of LWT in Mirage (and specially in unikernel.ml) is deep and should not be updated in a near future.

But, as I said, this choice should only affect Mirage stuff (or direct dependencies with unikernel.ml) and we should still continue to be agnostic to the IO for libraries - eg. tls, decompress, etc. But it's clear that it's not a part of this plan 👍.

So I'm in favor to specialize the io type only for signatures and to let the responsability to the developper of libraries to take care about this abstraction.

@hannesm

This comment has been minimized.

Copy link
Member Author

@hannesm hannesm commented Oct 17, 2019

Thanks for all your comments. Here is a plan on how to smoothly (for users of MirageOS) integrate such a breaking change (for review/feedback and to look back whether this plan worked whenever a similar breaking change is necessary).

Signature revision, bounds, conflicts and releases

Let's take mirage-time as a sample packet, its current release is 1.3.0. mirage 3.6.0 is the current release, 3.7.0 is the imaginary release with the described API adjustments (and retired opam packages):

  • add upper bounds "mirage-time" {< "2.0.0") to all released mirage-types (in opam-repository)
  • add the same upper bounds to all other users of mirage-time (in opam-repository)
  • conflict in mirage-time 2.0.0 opam with mirage-time-lwt (since it'll just not work anymore)
  • mirage-time 2.0.0 is to be adapted with the breaking changes (i.e. io specialized to Lwt.t directly, see this commit)
  • adapt all implementations and users of mirage-time (no need to define type 'a io = 'a Lwt.t), drop mirage-time-lwt usage
  • release mirage-time 2.0.0 to opam
  • add lower bound mirage-time {>= "2.0.0"} to mirage-types 3.7.0
  • release mirage-types 3.7.0, also mirage 3.7.0, which depends on >=3.7.0 & < 3.8.0 (fix code generated by mirage if needed (i.e. connect / ...))

this needs to be rinsed and repeated for all mirage-* signatures, and all reverse dependencies -- finally the mirage-types.3.7.0 can be released and conflict with mirage-types-lwt

NB: existing mirage releases (since 3.3.0) generate lower and upper bounds (>= 3.6.0 & < 3.7.0) to mirage-types during configure into the unikernel opam file, i.e. users of mirage (3.3.0 .. 3.6.0) will not get a mirage-types 3.7.0 and thus no mirage-time 2.0.0 (if they updated opam-repo after "add upper bounds to existing mirage-types releases" was completed -- which needs to happen before the mirage-time 2.0.0 release).

The signature updates can be done and released individually (i.e. mirage-time, mirage-clock, ...) to opam-repository (since upper bounds are put in place, and existing mirage won't use them) - of course it would be good to keep an opam overlay in place that ensures mirage-skeleton can be build.

Unikernels

Any existing unikernel works with either <3.7.0 or >= 3.7.0 (we unfortunately do not encode in config.ml which are the supported versions) -- we can add a dependency on mirage-runtime {< "3.7.0"} (then a 3.7.0 mirage configure will error with: Fatal error: exception Invalid_argument("version constraint for mirage-runtime must be that min is smaller than max")). we have to adapt these unikernels one by one, and while at it I'd suggest to add a mirage-runtime ~min:"3.7.0" ~max:"4.0.0" dependency. this will make subsequent signature revisions easier (at the cost that if we release a 4.0.0 that is compatible (signature-wise), we need to adjust all the unikernels again (with ~max:"5.0.0")). The adjustments to unikernels is pretty straightforward: s/_lwt//g (e.g. hello)

@cfcs

This comment has been minimized.

Copy link

@cfcs cfcs commented Oct 17, 2019

A tad off-topic, but since you brought up version numbers:
Would it make sense to adopt a versioning scheme for mirage- packets that reflect the version of mirage for which the last breaking change was made to that library?
So for example, if we release mirage-types.3.7.0, and this breaks mirage-time, would fix and bump to mirage-time.37.x.y (leaving y for minor patches and x for patches that break mirage-time api)? Is there a reason that 1.x.y, 2.x.y etc is preferable to such a scheme?

@hannesm

This comment has been minimized.

Copy link
Member Author

@hannesm hannesm commented Oct 17, 2019

Would it make sense to adopt a versioning scheme for mirage- packets that reflect the version of mirage for which the last breaking change was made to that library?

That's what we usually do -- see above mirage-time 1.3.0 -> 2.0.0 Or do you suggest to have all mirage- packages prefixed by the mirage major version? I don't think this is feasible: individual signatures and packages may have various API breakages within a mirage major release cycle (e.g. mirage-kv, mirage-protocols, dns is now at 4.x) -- while others may not change at all (e.g. mirage-device). From the mirage-types opam package you can infer the bounds of mirage- signatures that are part of this major + minor version.

So for example, if we release mirage-types.3.7.0, and this breaks mirage-time

This sentence does not make much sense, since the dependencies are the other way around: mirage-types depends on mirage-time, so a mirage-types release cannot break mirage-time.

would fix and bump to mirage-time.37.x.y (leaving y for minor patches and x for patches that break mirage-time api)?

I don't quite understand what you mean here. Yes, we stick mostly to semantic versioning. Yes, we could instead of 3.7.0 use 4.0.0 (for mirage / mirage-types) -- but that wouldn't make any difference. What I (attempted to) elaborate on are the current interdependencies and lower and upper bounds (which are needed, what is generated by mirage).

@cfcs

This comment has been minimized.

Copy link

@cfcs cfcs commented Oct 17, 2019

OK, I think I will just stay confused then. :)

@hannesm

This comment has been minimized.

Copy link
Member Author

@hannesm hannesm commented Oct 17, 2019

@cfcs I'm indifferent about version numbers, we can as well bump signatures to 3.7.0, but as I said I think it will be hard to keep the mirage and signatures version scheme synchronized (we can try, though)

@cfcs

This comment has been minimized.

Copy link

@cfcs cfcs commented Oct 17, 2019

I suggested that instead of arbitrarily bumping from 1.x.y to 2.x.y we could bump to 37.x.y, and have 37 be derived from mirage-types.3.7.0.

This would not be semantic versioning, but we don't use that anyway (or inconsistently so anyway), at least my impression is that we do something like: Instead of using major version number for big api breakage, and minor version for small api breakage. Patch version (y above) is usually used to patch bugs without changing the API, including the backwards-compatible stuff that semantic versioning would have you bump the minor version number for. Point-in-case: Mirage 3.7.0 being non-backwards compatible, but only incrementing minor version number. (EDIT: I don't think that's necessarily a bad way to do things)

Ah, I had the dependency backwards with mirage-types as you say. The main point was that it would be nice if there was some kind of correspondence, so instead I guess my suggestion should be reformulated as: It would be nice if there was some kind of correspondence between these version numbers. When mirage-time is updated in a breaking way, and we're incrementing the major version number, we could update it with something derived from anticipated version number of the first mirage-types release that would incorporate those changes.
If new versions of mirage-types were issued in between releases of mirage-time, you would just skip a few major versions.

That means that instead of having nondescriptive 1.x.y, 2.x.y, (or even worse, confusing 3.x.y and 4.x.y which kind of carries annotations that they would fit with Mirage v3 or v4) you would be able to see immediately the first release of mirage-types (not the latest) that incorporated those changes. This hopefully would make it a tiny bit easier to get constraints right in various places?

This reasoning for this proposal is based on the assumption that mirage-types will always be updated following a new release of e.g. mirage-time.

@hannesm

This comment has been minimized.

Copy link
Member Author

@hannesm hannesm commented Oct 17, 2019

Point-in-case: Mirage 3.7.0 being non-backwards compatible, but only incrementing minor version number. (EDIT: I don't think that's necessarily a bad way to do things)

As mentioned, above is mainly an outline how the workflow of such a change could be. Naming it 4.0.0 is fine with me as well.

This reasoning for this proposal is based on the assumption that mirage-types will always be updated following a new release of e.g. mirage-time.

In my experience, this is not the case. See mirage-protocols 3.0.0 which lead to a tcpip 3.7.6 release, but no mirage nor mirage-types release.

A release of mirage/mirage-types is needed if modules are added/removed/renamed (see the mirage 3.5.0 release to cope with mirage-protocols 1.4 -> 2.0 ETHIF -> ETHERNET) or the signature of connect (which is only part of the concrete implementations) changes (mirage generates code in main.ml which calls connect).

For library developers: pick your mirage- signature version you program against (and add lower/upper bounds in opam). For unikernel programmers: you can also specify lower and upper bounds of the signatures used (e.g. mirage-time 1.x), you as well get upper and lower bounds to mirage-types (which depends on the signature packages) during mirage configure.

Versioning (when to bump which numbers) is out of scope for this proposal, if you're keen please open another issue with a concrete versioning proposal (e.g. which version bounds should be where and how the release package workflow should look like). If you think that synchronizing version numbers at the cost of doing more releases (potentially without any code change) results in something "less confusing", it is worth pursuing. We could then also add upper bounds to the mirage-types opam file (< next-major).

hannesm added a commit to hannesm/opam-repository that referenced this issue Oct 19, 2019
…rage#1004):

opam admin add-constraint "mirage-device<2.0.0"
opam admin add-constraint "mirage-random<2.0.0"
opam admin add-constraint "mirage-time<2.0.0"
opam admin add-constraint "mirage-clock<3.0.0"
opam admin add-constraint "mirage-flow<2.0.0"
opam admin add-constraint "mirage-fs<3.0.0"
opam admin add-constraint "mirage-console<3.0.0"
opam admin add-constraint "mirage-protocols<4.0.0"
opam admin add-constraint "mirage-stack<2.0.0"
opam admin add-constraint "mirage-block<2.0.0"
opam admin add-constraint "mirage-net<3.0.0"
opam admin add-constraint "mirage-kv<3.0.0"
opam admin add-constraint "mirage-channel<4.0.0"
opam admin add-constraint "mirage-types<3.7.0"
opam admin add-constraint "mirage-types-lwt<3.7.0"
@hannesm

This comment has been minimized.

Copy link
Member Author

@hannesm hannesm commented Oct 20, 2019

since I received only positive feedback, and was on a long train journey, I just PRed (a) the opam-repository upper bounds ocaml/opam-repository#15082 and the inidividual core interface packages:

  • for each foo in mirage-foo, I added a conflict with mirage-foo-lwt to avoid bad installations (no plan to support them)
  • I bumped the lower OCaml bound to 4.06.0 in these opam packages -- which I think is sensible (and allows removal of safe-string flags from build systems (including adjusting travis to test 4.06 .. 4.09)
  • the lower cstruct bound is on 4.0.0, lwt 4.4.0
  • I moved flow/block combinators into a separate mirage-flow/block-combinators package
  • mirage-fs still contains some code (the To_kv functor), mirage-channel the of_flow functor (I'm fine with both!)
  • mirage-fs I marked as deprecated for MirageOS 4.0, since I remember that we worked hard on the mirage-kv interface to retire the former
  • mirage-clock.PCLOCK/MCLOCK are no longer devices (and thus no Lwt dependency)

I have local patches for mirage-skeleton and the libraries implementing the interfaces. to reduce CI struggle, I'd appreciate to first get the new interfaces through and released before PRing these changes.

While discussing a bit further about versioning of OCaml interfaces with @cfcs we concluded that any mli change (if we strictly follow semantic versioning) should bump the major version, and since mirage-types is a collection of interfaces, it'd need to be bumped at every release of a dependency thereof. This is a bit unbearable, so we suggest to deprecate mirage-types for MirageOS 4.0. Since mirage-skeleton (and other unikernels) still use mirage-types(-lwt), I've added deprecations (see #1006 as draft for the changes to the mirage repository) instead of removing it entirely right now.

@hannesm

This comment has been minimized.

Copy link
Member Author

@hannesm hannesm commented Oct 20, 2019

travis is fine with the interface adjustments -- I plan to cut releases tomorrow evening (see above the last comment the references to this issue for a complete list), then open PRs with implementation changes to the various repositories, and hopefully some day be able to cut a mirage 3.7.0 release (where I plan to incorporate @dinosaure patch(es) for mirage-os-shim/mirage-main dependency reversal).

@samoht

This comment has been minimized.

Copy link
Member

@samoht samoht commented Oct 20, 2019

I’m not sure about adding conflicts — I prefer adding version constraints instead as it puts less pressure on the solver. Let me know if I can be of any help to add these.

@hannesm

This comment has been minimized.

Copy link
Member Author

@hannesm hannesm commented Oct 21, 2019

@samoht ok, the reason for conflicts is because it feels safer to me. but instead of new mirage-foo conflict with mirage-foo-lwt, there could all old mirage-foo-lwt have an upper bound on mirage-foo <next_major (these are the planned new versions), feel free to PR that to opam-repository. I noticed (see ocaml/opam-repository#15085) that opam admin add-constraint does not deal with depopts.

@hannesm

This comment has been minimized.

Copy link
Member Author

@hannesm hannesm commented Oct 21, 2019

and @samoht wish is already granted: there's actually no need for conflicts since opam/opam-repository#15082 already added the necessary upper bounds

@hannesm

This comment has been minimized.

Copy link
Member Author

@hannesm hannesm commented Oct 25, 2019

ok, the module types major version releases are now in opam repository (mirage-stack is the last one, and PRed there) - all without conflict. the next step is to port and release the implementations of the interfaces (I have various branches that I'll PR later today), and then finally release mirage and adapt mirage-skeleton to the new release (unikernels that use Mirage_types_lwt will continue to work but emit deprecation warnings until we retire mirage-types{-lwt}; unikernels using Mirage_yy_lwt.S will fail (fix is to use Mirage_yy.S instead).

hannesm added a commit to hannesm/mirage-tcpip that referenced this issue Oct 25, 2019
hannesm added a commit to hannesm/charrua that referenced this issue Oct 25, 2019
hannesm added a commit to hannesm/charrua that referenced this issue Oct 25, 2019
hannesm added a commit to mirage/mirage-tcpip that referenced this issue Nov 1, 2019
adapt to mirage/mirage#1004 API changes
hannesm added a commit to mirage/mirage-xen that referenced this issue Nov 1, 2019
@hannesm

This comment has been minimized.

Copy link
Member Author

@hannesm hannesm commented Nov 2, 2019

closing, this is slowly getting into opam-repository, missing is still the mirage 3.7.0 release (working on that)

@hannesm hannesm closed this Nov 2, 2019
hannesm added a commit to hannesm/mirage that referenced this issue Nov 2, 2019
bump versions of packages according to release schedule mirage#1012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.