Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Dune instead of OCamlbuild to build unikernels #969

Closed
TheLortex opened this issue Feb 12, 2019 · 48 comments
Closed

Use Dune instead of OCamlbuild to build unikernels #969

TheLortex opened this issue Feb 12, 2019 · 48 comments

Comments

@TheLortex
Copy link
Member

@TheLortex TheLortex commented Feb 12, 2019

I used several unikernel samples to explore what are the problems blocking the Dune transition. I've taken them from mirage-skeleton. After identifying problems and workarounds, I managed to build entirely with Dune + ld/pkg-config the following unikernels with unix, xen and hvt target: DNS, http_fetch, echo_server. I'm aware this does not cover everything but it's a good start.

Used features of OCamlbuild that aren't implemented in Dune:

  1. -dontlink <pkg> used for unix, threads, str and num. This removes these packages from the final linking step, thus allowing to link on non-Unix targets.
  2. Predicates is sometimes used for library selection.
  3. Use of -output-obj in conjunction with ..._linkopts in META files to link C stubs and custom OCaml runtime.

Workaround to each problem

  1. Explore the dependency graph to figure out which packages have unneeded dependencies and update them accordingly. Use OCaml 4.07 and Stdlib.Bigarray in order not to depend on the legacy bigarray which includes undefined stubs on xen/freestanding.
  2. Use virtual libraries and implementations. To not break everything, it's possible to keep the library name package as the default implementation and make it depend on the virtual library package-virtual. This way the update is transparent for users of the package and the ones that want to use the virtual library feature just have to change their package dependency into a package-virtual dependency.
  3. Dune uses -output-complete-obj to build object files. This also links C stubs and runtime. Therefore we don't have to care about C stubs anywore as they are automatically linked along with used libraries. No need for ..._linkopts flags in META. For the custom runtime, ocamlopt has a -runtime-variant option to choose a custom runtime library to link.

Packages to update

  1. ocplib-endian, mirage-fs-lwt, mirage-console-lwt, xen-evtchn, sexplib, tcpip, asn1-combinators, mirage-entropy
  2. mirage-entropy, zarith + dependencies (asn1-combinators, nocrypto)
  3. mirage-xen-ocaml, ocaml-freestanding: just need to add a symlink from lib...asmrun.a to ocaml runtime directory so that ocaml finds the runtime variants. I've already modified these packages locally to make my experiments so it's not that hard to make these changes.

Issues and comments

  1. Duniverse forks may have already fixed some of these packages.
  2. The uses of virtual libraries disable cross-module optimisation such as inlining. We have to be careful with that.

How do you feel about this ? How should we proceed to update packages ? Am I missing something ? Feel free to comment !

@avsm
Copy link
Member

@avsm avsm commented Feb 12, 2019

This plan sounds good to me!

About specific libraries:

@avsm
Copy link
Member

@avsm avsm commented Feb 12, 2019

pushed xen-evtchn pr to move to dune

@TheLortex
Copy link
Member Author

@TheLortex TheLortex commented Feb 12, 2019

  • ocplib-endian: in opam file, cppo is not written as a build dependency
  • mirage-*-lwt: in opam file, unnecessary requirement of cstruct-lwt
  • xen-evtchn: in lib/dune:4, dependency on unix and bigarray
  • mirage-entropy: will push that on dune-universe !
avsm added a commit to dune-universe/ocplib-endian that referenced this issue Feb 12, 2019
TheLortex added a commit to dune-universe/mirage-entropy that referenced this issue Feb 12, 2019
TheLortex added a commit to dune-universe/mirage-entropy that referenced this issue Feb 12, 2019
@avsm
Copy link
Member

@avsm avsm commented Feb 12, 2019

I pushed a rebase of mirage/mirage-tcpip#384 now to use virtual_modules -- feel free to push to that branch as well with any fixes

avsm added a commit to avsm/ocaml-evtchn that referenced this issue Feb 12, 2019
avsm added a commit to avsm/mirage-console that referenced this issue Feb 13, 2019
TheLortex added a commit to TheLortex/mirage that referenced this issue Feb 13, 2019
TheLortex added a commit to dune-universe/ocaml-asn1-combinators that referenced this issue Feb 13, 2019
TheLortex added a commit to dune-universe/mirage-entropy that referenced this issue Feb 13, 2019
TheLortex added a commit to dune-universe/ocaml-nocrypto that referenced this issue Feb 13, 2019
For mirage/mirage#969
Add lwt/mirage-entropy dep
TheLortex added a commit to dune-universe/mirage-tcpip that referenced this issue Feb 13, 2019
For mirage/mirage#969
`implements` requires public library name.
Dummy tcpip lib so that `tcpip` is resolved.
Fix dependencies in the opam definition.
@TheLortex
Copy link
Member Author

@TheLortex TheLortex commented Feb 13, 2019

After struggling with bigarray today I finally opted in for a hack: even if ocamlopt links caml_ba_map_file and adds some unresolved symbols we can choose to ignore it.
My Mirage CLI fork is available on TheLortex/mirage. I needed some local opam pins to get http_fetch compiling on every target (except genode because of dynlink issues).
It's not that much ! Most problems are actually solved by taking the duniverse version of projects.

  • mirage cli
  • mirage-entropy + variants
  • nocrypto
  • tcpip
  • tcpip-checksum + variants
  • TLS: tls, tls-lwt, tls-mirage
  • Zarith: zarith, zarith-freestanding, zarith-xen, zarith-virtual
@avsm
Copy link
Member

@avsm avsm commented Feb 13, 2019

We might as well start releasing some of these... I think tcpip is a good one, and then a zarith-mirage fork.

TheLortex added a commit to dune-universe/mirage-tcpip that referenced this issue Feb 14, 2019
For mirage/mirage#969
`implements` requires public library name.
Dummy tcpip lib so that `tcpip` is resolved.
Fix dependencies in the opam definition.
TheLortex added a commit to TheLortex/mirage-platform that referenced this issue Feb 14, 2019
For mirage/mirage#969
Instead of pkg-config, one can use the following files to get the
compilation flags:

ocaml-freestanding/libs
ocaml-freestanding/cflags
TheLortex added a commit to TheLortex/mirage-platform that referenced this issue Feb 14, 2019
For mirage/mirage#969
Instead of PKG-CONFIG, one can use the following files to get
compilation flags:

mirage-xen-ocaml/libs
mirage-xen-ocaml/cflags
mirage-xen-posix/minios-cflags
mirage-xen-posix/minios-libs
mirage-xen-posix/posix-cflags
mirage-xen-posix/posix-libs
TheLortex added a commit to dune-universe/mirage-entropy that referenced this issue Feb 14, 2019
For mirage/mirage#969
Along with:
mirage-platform: 0612bdbfe1eeb004ad923991178256feb780d14c
ocaml-freestanding: c5bf86ce29872b65a12ac42b9104f15f063d644e

Instead of using pkg-config, cflags and libs are fetched from files in
the library folder.
@TheLortex
Copy link
Member Author

@TheLortex TheLortex commented Feb 14, 2019

New update for mirage-entropy:
Instead of using pkg-config to fetch parameters, we can take advantage of dune's %{lib:..} variable expansion and :include to gather C flags. This technique can also be used for every package that builds xen/freestanding stubs.

@hannesm
Copy link
Member

@hannesm hannesm commented Feb 14, 2019

@TheLortex thanks for your work on this. Yes, a good outcome would be to also remove pkg-config from our toolchain. :)

@avsm
Copy link
Member

@avsm avsm commented Feb 14, 2019

Looks good -- could you do a PR of mirage-platform that adds the files, @TheLortex? We can release that independently since it can remain backwards compatible.

@rgrinberg
Copy link
Member

@rgrinberg rgrinberg commented Feb 14, 2019

  • Use virtual libraries and implementations. To not break everything, it's possible to keep the library name package as the default implementation and make it depend on the virtual library package-virtual. This way the update is transparent for users of the package and the ones that want to use the virtual library feature just have to change their package dependency into a package-virtual dependency.

As discussed before, I feel like this should be a first class feature in dune. Making existing libraries virtual is something that is always going to come up if this feature gains any traction. Dune should properly support such transitions instead of having the users pollute the package namespace.

@ehirdoy
Copy link

@ehirdoy ehirdoy commented Feb 15, 2019

Any plan to support Bytecode in MirageOS?

@avsm
Copy link
Member

@avsm avsm commented Feb 15, 2019

@ehirdoy that question appears irrelevant to this issue. It would be helpful to create a new issue with your query. When doing so, please also specify what you want to accomplish with bytecode to provide more context.

@pqwy
Copy link
Contributor

@pqwy pqwy commented Apr 22, 2019

I'm trying to argue that this is making a bad situation worse.

Because even right now, Mirage is a monorepo project.

In fact, here is something missing from https://mirage.io:

MirageOS is not, in general, compatible with third-party OCaml code. Users cannot expect separately developed OCaml libraries to work in the unikernel environment, unless and until they are accepted into the MirageOS project and ported accordingly.

And while this could be phrased a little less directly, it cannot be made more true. It's not even by design; this is just a consequence of long-standing technical debt.

Just to put everyone on the same page, please bear with me for a moment, while I try to give some context:

Up until 2015 this was quite obvious. For example, there was a copy of cstruct stubs in the mirage runtime repo. Two .git directories in two Github repos, one project still. You could not change the implementation of cstruct without changing the Mirage core.

As more C came into play, it progressed to the next stage -- in order to run in a unikernel, every library had to build its stubs for all mirage targets and broadcast its intentions to the mirage tool using extra fields in the META files.

This gave us several .git directories, and several Github repos. Still, any otherwise OS-agnostic code had to take certain rather fragile steps to cater for the possibility of being included in a unikernel. So if anyone wanted to significantly change the very last step of assembling the unikernel, they had to make a pass over Github. Clearly, all these bodies of code are only maintainable in lock-step, so regardless of the number of nominal repositories, we still had a single effective project -- a monorepo.

From there, we've made the first steps towards making this all-in-one approach a formal reality.

Which brings us to the present. The current proposal is a perfectly logical consequence of bringing everything that could possibly be compiled into a unikernel under one roof (directly, by forking, or through an overlay). With everything building from a single project, there is nothing odd about a compilation scheme which requires

  • very specific library layout, for no library-internal purpose whatsoever; and
  • completely exporting the implementation details, down to constituent files, to separate libraries;

because what is being proposed here is understood as a Mirage-internal refactoring. As far as I can tell, the current state of Mirage is to completely embrace this idea that separate development is off the table.

If everything is not understood as simply being part of Mirage, then this scheme is so prohibitively invasive to the point of being anti-modular.

Now, I really am glad that we can agree on the ultimate vision, but we can probably all agree on what we want to have at the point at infinity. Namely, everything.

What is more interesting is whether the next step will

  • increase the interdependency, and fully enshrine the single-project system; or
  • decrease the interdependency, and try to pay off some of the immanent technical debt.

If it increases, it builds a larger wall around Mirage.

And if history is any guideline, people stop working on a problem the very moment any solution is available. So forgive me for being utterly sceptical that anyone will later disentangle all the code that has been comfortably internalized into the project.

The punchline?

With what is in Mirage right now -- but before creating a Mirage-specific shadow of most of opam -- disentangling the system seems about feasible.

Two major directions that couple everything together are:

  • the fact the mirage tool has specific code paths for any significant library that has backend-specific implementations; and
  • the general inability to build C stubs without making concessions to Mirage.

The first point is really mirage doing implementation switching. Variants alone are sufficient for this task, and disentangling in this direction is maybe a week of work.

The second point stands because we still don't know how to ship off the relevant backend-specific flags to the C compiler and linker. It's a case of cross-compilation, and this is routinely solved by setting up separate build environments, as opposed to packaging the cartesian product of build artefacts and targets.

If only we had a build tool that could pull this off. 👀

This price is a >100 packages fetch with no download cache and no build cache. It's not possible to only download packages with C stubs because dune also needs all the reverse dependencies between the package and the executable.

Here's a data point: sequentially running opam source on the transitive dependency cone of tcpip takes < 10s, on a hot download cache. Running dune over that initially takes 60s. IMHO this is not a catastrophic starting point.

Especially compared to embracing the compatibility barrier.

@rgrinberg
Copy link
Member

@rgrinberg rgrinberg commented May 5, 2019

Regarding this error:

Path outside the workspace: ../pkgconfig/mirage-xen.pc from .

Pointed out by Hannes in #969 (comment)

Could anyone clarify where is this path read from? @avsm, @yomimono or @TheLortex perhaps?

EDIT: I have fix a for the above: ocaml/dune#2124

@TheLortex
Copy link
Member Author

@TheLortex TheLortex commented May 22, 2019

I've opened a first set of PRs to switch mirage base packages to dune and use virtual libraries to have a nice mirage-os-shim common interface. This is important for mirage-entropy.
mirage/mirage-os-shim#7
mirage/mirage-unix#9
mirage/mirage-xen#15
mirage/mirage-solo5#44
mirage/mirage-entropy#41
What do you think about that ?

@TheLortex
Copy link
Member Author

@TheLortex TheLortex commented May 22, 2019

Package Change needed PR/upstream
sexplib need a PR to use bigarray-compat https://github.com/TheLortex/sexplib.git#master
ocplib-endian need a PR to use bigarray-compat https://github.com/TheLortex/ocplib-endian.git#duniverse-1.0
ocaml-freestanding new files (ld, libdir, ldflags) need a PR (testing a bit more beforehand) https://github.com/TheLortex/ocaml-freestanding.git#lortex-master
mirage-os-shim set as virtual library mirage/mirage-os-shim#7
mirage-xen implement mirage-os-shim mirage/mirage-xen#15
mirage-unix implement mirage-os-shim mirage/mirage-unix#9
mirage-solo5 implement mirage-os-shim + dune port mirage/mirage-solo5#44
mirage-entropy dune port + use mirage-os-shim as vlib mirage/mirage-entropy#41
asn1-combinators use bigarray-compat mirleft/ocaml-asn1-combinators#27
functoria 2-stage build mirage/functoria#171
mirage-bootvar-solo5 ported to dune need a release
mirage-console-solo5 ported to dune need a release
mirage-net-solo5 ported to dune need a release
mirage-clock-unix added unix dependency need a release
mirage-platform add flags files - need a PR https://github.com/TheLortex/mirage-platform.git#with-flags-files
duniverse need a release https://github.com/avsm/duniverse/
mirage-conduit ported to dune + tls.mirage -> tls-mirage - need a PR https://github.com/TheLortex/ocaml-conduit.git#v1.4.0+fix
nocrypto ported to dune, needs bigarray-compat mirleft/ocaml-nocrypto#158
tls ported to dune, needs a PR https://github.com/dune-universe/ocaml-tls.git#duniverse-0.10.2
zarith use variants (if we want to take advantage of dune workspaces instead of variants we probably need to change how gmp-freestanding/xen is built and installed) https://github.com/dune-universe/zarith.git#variants
@xekoukou
Copy link

@xekoukou xekoukou commented May 28, 2019

I haven't used Dune before. Currently, I am using the tag system of ocamlbuild to point that specific files are "precious" so as to allow certain *.cmi *.cmx files to be used by the unikernel, files that are generated by https://github.com/stedolan/malfunction

I hope that Dune has a similar feature.

@dinosaure
Copy link
Member

@dinosaure dinosaure commented May 28, 2019

@xekoukou really interesting, did you have an example of malfunction + mirage?

@xekoukou
Copy link

@xekoukou xekoukou commented May 28, 2019

@hannesm
Copy link
Member

@hannesm hannesm commented Jun 12, 2019

I still slightly fail to understand the high-level view, and connection between tools. afaiu duniverse is something to-be-integrated with mirage? I'd appreciate some remarks about the proposed workflows, both as a developer (where monorepo and no opam is fine) and as a CI system (where installing dependencies via opam is likely beneficial) -- also in respect to david's comments (will the pkg-config dependency go away?). anyone eager to explain them in a bit more detail here, so we can understand and discuss? (i agree that @xekoukou use-case (agda on mirageos) should be supported as well, see https://github.com/xekoukou/mirage-agda-examples). thanks!

@TheLortex
Copy link
Member Author

@TheLortex TheLortex commented Jun 12, 2019

I've written a recap here https://gist.github.com/TheLortex/f3d92db831b553f6e4eaf2982e3e6427
In short (steps can be merged, or renamed):

  • mirage config -t hvt: configure the unikernel and create an opam file with dependencies.
  • duniverse init: read the opam file and compute the list of packages to download in order to build the unikernel.
  • duniverse opam-install: install dune-incompatible packages in the global opam switch
  • duniverse pull: fetch the whole dependency tree in a duniverse/ directory so that dune can use them to build the unikernel. The first pull is slow but a cache mechanism has been developped so that subsequent pulls are faster
  • mirage build: build the unikernel for the given target.

What's interesting about having a duniverse/ directory with most of the sources is that you can go and modify dependencies, and dune will grab these changes to build the unikernel. This could lead to a faster workflow than opam pinning and updating.

pkg-config is not required anymore, as long as flags are correctly computed while installing mirage-platform and ocaml-freestanding (for now the pkg-config is just moved to these packages..)

About the agda on mirageos use-case: these is coq support in dune so hopefully this shouldn't go wrong but I'll keep that in mind. Thanks for the examples link.

@hannesm I really hope that it's now clearer for you, but tell me if something is still missing.

@hannesm
Copy link
Member

@hannesm hannesm commented Jun 16, 2019

@TheLortex thanks for writing this up, this makes the development workflow clearer. now, I'm a bit confused how a (non-interactive) CI flow should work -- is duniverse always required in the new model, or is it possible to only use opam for installing dependencies, and dune for building the unikernel?
also, which duniverse version is required for that flow?

@TheLortex
Copy link
Member Author

@TheLortex TheLortex commented Jun 18, 2019

The CI flow works the same way: duniverse is needed because packages with C stubs have to be locally compiled with the correct flags. duniverse is still under development so there's no release yet. However the master branch already has the core features so you can test the workflow.

@dinosaure
Copy link
Member

@dinosaure dinosaure commented Dec 10, 2019

Discussion still happens in #1020 and concerns about C stubs is highlighted as in this issue. Please read about that when a consensus must be found between all parts.

@samoht
Copy link
Member

@samoht samoht commented Jun 22, 2020

Status update: the implementation of this feature is now in #1153 which supersedes #1020 and #1024. Here a brief description of the design updates and of the remaining steps.

New Workflow

As in MirageOS 4, a repository can contain multiple config.ml files (e.g. mirage-skeleton). The main difference is how these are built. As for MirageOS 3, it is still possible to configure every unikernel separately:

$ for c in `list all the config.ml`; do 
    mirage configure -f $c <params_c>;
  done
[..] # generates files, including opam and dune-workspace

Note: this doesn’t work fully yet, all the unikernels have to be configured to use the same target, but that will be fixed before the release.

As for MirageOS 3, make depends will install the dependencies. But in MirageOS 4, this will be using duniverse instead of opam. make depends is then equivalent to writing:

$ duniverse init 
[..] # read all the opam files, resolve depencies and create dune-get

$ duniverse pull
[..] # read dune-get and download all the dependencies in duniverse/

Question: do we want a mirage depends option?

The main new feature of MirageOS 4 is that it is now possible to build all the unikernels in one go:

$ dune build @install
[..] # generate all the unikernels in one go

Or dune can be used to build only what is necessary to execute a given unikernel:

$ dune exec -- solo5-hvt <path/to/unikernel.hvt>
[..] # run the unikernel in `path/to`

Question: do we want to re-introduce a mirage exec command to wrap these?

mirage build

In MirageOS 3, config.ml files define both configure and build actions. This is not the case in MirageO4 anymore.

config.ml files now only allow users to define a configure step ; it also allow to generate dune fragments, which will be included in the auto-generated dune file and which will be picked-up by dune when running dune build.

mirage build is now simply an alias for dune build @install.

Question: do we want to deprecate mirage build?

make depends

In MirageOS 3, make depends calls opam to install the necessary packages. This is not the case in MirageOS4 anymore.

make depends is now calling duniverse pull && duniverse init to download all the sources locally, for all the configured unikernels, at once. This means that all the configured MirageOS unikernels have to co-installable. Also note that duniverse is also installing the necessary depexts by default.

duniverse uses a normal opam remote as source of metadata: the default one is an overlay on top of the main opam repository. There are a few additional unreleased packages in my fork that I'm preparing for MirageOS 4 (more on this bellow, but the goal is to upstream them before the release). The location of the opam remote can be controlled with mirage configure —opam-repo=<repo>.

pkg-config and ocamlfind predicates

In MirageOS 3, the compilation of C bindings were using pkg-config and an extra predicate in ocamlfind META files to replace, at link time, the default C archives by the one needed by the target. This is not the case in MirageOS 4 anymore.

mirage configure will create a dune-workspace file instead, which will define a CFLAGS variable defining the C flags for the specified target. This works great for simple C bindings already using dune and supporting cross-compilation, as they will work without any change with MirageOS 4. Bigger C libraries, usually available as a system packages, will have to wrapped in dune and recompiled from source using that new scheme (see for instance my dune rules to build gmp).

Porting the runtime is a bit more tricky, as it shouldn’t be re-using the workspace CFLAGS that it is defining. To simplify things a bit, samoht/ocaml-solo5 is wrapping the build of solo5 and freestanding into one package per target. This allow all these packages to be co-installable.

Current Status

#1153 now seems to work for mirage-skeleton (using the mirage-dev-dune branch). The patch queue is in samoht/opam-repository#duniverse. The patch are usually trivial (removal of artificial constraints, metadata fixes).

  • mirage-bootvar-solo5: removal or artificial constraints
  • solo5, solo5-{hvt,spt,virtio,muen,genode}: new solo5 packages including freestanding archives
  • mirage-solo5: remove ocamlfind hacks, use global CFLAGS, remove artificial constraints
  • mirage-clock-freestanding: fix metadata typo
  • ocplib-endian: support for bigarray without unix
  • mirage-net-solo5: remove artificial constraints
  • gmp: new package
  • zarith: use the previousgmp instead of system libs, and use dune
  • mirage-crypto-pk: remove artificial constraints
  • mirage-console-solo5: remove artificial constraints
  • mirage-block-solo5: remove artificial constraints
@TheLortex
Copy link
Member Author

@TheLortex TheLortex commented Oct 28, 2020

Discussion continues in #1195

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
10 participants