Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A Roadmap for Porting lwt.unix to libuv #813

Open
aantron opened this issue Oct 27, 2020 · 13 comments
Open

A Roadmap for Porting lwt.unix to libuv #813

aantron opened this issue Oct 27, 2020 · 13 comments
Labels

Comments

@aantron
Copy link
Collaborator

aantron commented Oct 27, 2020

Now that @ulrikstrid has begun the libuv conversion (#328) in #811, I'd like to outline an overall plan for how to finish the whole process sanely :)

The technical steps are:


  1. ✔️ Create a new Lwt_engine based on libuv (Add a initial engine based on luv #811).

    This allows quickly replacing the lwt.unix main loop by libuv's main loop, by telling libuv to poll lwt.unix's fds.

    This is not how libuv's API is intended to be used (and it does not use the vast majority of the libuv API). However, it allows us to switch to libuv without touching the corresponding vast majority of the code in Lwt_unix. We essentially connect two plugin APIs that were meant to be used together: Lwt already allows replacing its polling engine, and libuv offers a polling engine.

    This polling of lwt.unix's fds by libuv becomes a fallback implementation of everything in lwt.unix, allowing us to do further work piecemeal, yet still have a working library throughout.

    The libuv-based Lwt_engine initially won't support Windows.

  2. Reimplement system calls by forwarding directly to libuv.

    Like lwt.unix, libuv offers an asynchronous version of a large part of the Unix system call API. For example, see the filesystem operations available.

    So, we will continue by changing e.g. Lwt_unix.openfile to call uv_fs_open.

    This will bypass the Lwt_engine and Lwt's thread pool, and directly use libuv's uv_loop_t (its "engine") and libuv's thread pool. It will also allow us to delete large amounts of C code from Lwt.

    We will have to do this work one system call category at a time. libuv exposes different fd-like types in each category, so Lwt_unix.file_descr will have to internally become a sum type of these libuv types.

    To get a smooth transition, we will have to write many tests to discover API quirks. There is already a lwt.unix testing issue open for this purpose (Test the Unix binding #539), and it offers one categorization of the lwt.unix API. libuv's API is categorized in its documentation table of contents.

  3. Replace the Lwt thread pool by the libuv thread pool.

    After (2), there should be few system calls left implemented in Lwt that use Lwt's own thread pool. We can then reimplement it over libuv's thread pool with little stress.

    We may opt to use libuv's thread pool only internally, and leave the current Lwt thread pool implementation around to satisfy existing users.

  4. Port to Windows.

    The reasons for doing this last are:

    • The highest-quality Windows code in libuv is in its specific APIs (point 2) rather than in the polling engine (1). Since direct calls to libuv (2) will gradually replace explicit polling (1), it seems wasteful to adapt existing Lwt code to libuv polling quirks on Windows after (1), only to then delete and replace the code in (2). It's likely that after (2), many categories of system calls will work on Windows immediately, due to libuv's own Windows support.

      In other terms, we would like to bypass relying on libuv's Windows polling because we are not sure about the quality of the implementation (both due to libuv and due to Windows; I think Windows is focused on other styles of API than polling).

    • Likewise, the libuv thread pool in (3) is portable to both Unix and Windows. It seems best not to first adapt the lwt.unix thread pool to quirks of polling on Windows, if we only intend to replace it by libuv's own portable thread pool in (3) anyway.

    • libuv has projects to improve categories of system calls on Windows (at least pipes). If we port to Windows later, these may already have matured in libuv.


Organizationally, I propose to do the work in a branch starting from (2). Once we begin converting to direct calls in (2), lwt.unix will have a rigid dependency on libuv. It will no longer be possible to swap the Lwt_engine (in a reasonable way). lwt.unix also won't support Windows for a while. So this will be a breaking change, and it will also take some time to finish and stabilize.

On the other hand, working in a branch allows for easy pinning and cherry-picking.

We can prefix issues or PRs having to do with the branch with [libuv] or [luv] (for the name of the binding).


And last, to summarize some of the benefits:

  • We will switch to a more portable and much more heavily tested implementation.
  • We can delete a large amount of C code.
  • libuv already can run multiple loops in multiple system threads, and provides a cross-platform threading API — useful for multicore.
  • The binding (Luv) vendors libuv, so there will no longer be a need to remember conf-libev, or install libev system-wide.
  • We should gain a large number of tests as a side effect.
  • It is likely that we will end up contributing to libuv, as well.

Edited 28 October 2020.

@Lupus
Copy link

Lupus commented Oct 28, 2020

First of all many thanks for putting this roadmap together @aantron !

You do not mention thread-safety of libuv as a benefit when multicore OCaml is just around the corner. Won't libuv largely help in preparing Lwt to work in multicore paradigm?

@aantron
Copy link
Collaborator Author

aantron commented Oct 28, 2020

@Lupus that's right. I couldn't remember all the usual benefits :) the libuv API makes it easy to run multiple event loops, one per system thread. It's not quite thread-safety — that's still the responsibility of the libuv API user (whether a person or a higher-level library). You still have to create the multiple event loops and submit work from each thread to the right loop. libuv offers a cross-platform thread-local storage API that should ease various kinds of integration arrangements. The binding Luv has an issue about registering GC roots in libuv's TLS if the binding's API is changed to accept OCaml values.

@Lupus
Copy link

Lupus commented Oct 28, 2020

One Lwt "instance" per real thread sounds ideal to me 😊 I bet libuv offers rich API for communicating across threads, so one can easily arrange individual Lwt loops to communicate and do the work at multiple cores, while within each core it would be familiar look and feel of non-preemptive concurrency. Interesting edge cases include non-blocking wait of a real mutex or condition, probably some real thread will have to be sacrificed to block on it, and notify some Lwt loop(s) about completion.

@aantron
Copy link
Collaborator Author

aantron commented Oct 28, 2020

I bet libuv offers rich API for communicating across threads

uv_async_t for delivering a message to another loop (potentially in another thread). The OCaml version is Luv.Async. It may be time to start to flesh out the OCaml docs a bit more :)

non-blocking wait of a real mutex or condition

AFAIK, yes. libuv provides cross-platform wrappers of real mutexes and conditions (libuv, Luv), but they are sort of separated from the whole "loop"-based API, and are just standalone wrappers. So there is no direct way to do a non-blocking wait on them using libuv.

At least Windows has WaitForMultipleObjects that AFAIK allows this :)

@code-ghalib
Copy link
Contributor

code-ghalib commented Oct 31, 2020

Hi @aantron, this may be a non-issue on account of a misunderstanding on my part, but how does the vendoring in of libuv play out in the following scenario:

  • I use Lwt in my program, which uses libuv (vendored version 1) via Luv
  • I also use an external library, via C-bindings, which requires libuv (non-vendored version 2) under its hood

At link time, I have multiple definitions of libuv symbols. I imagine this might require some careful crafting of the link-line to ensure that the correct symbols are used in the correct place(s)? In a largish executable, sometimes you do not know what dependencies your dependency manager has brought in - so the careful crafting may be fragile?

Also, similar to this question in the python world, would it be a) possible, and b) beneficial, in any case, to be able to share the libuv loop between libraries, e.g. consider the potential for a libuv based adapter for hiredis - does the decision to vendor libuv impact that? How would you share event loops from two different source versions of libuv?

@aantron
Copy link
Collaborator Author

aantron commented Oct 31, 2020

As I understand it, code shouldn't be linked against two versions of libuv or two instances of libuv at all.

Ideally, the whole project could be configured to use the vendored libuv. The archive and headers are installed in the opam switch/esy sandbox.

Alternatively, we could provide a way for Luv to be built against an external libuv.

The current integration of Lwt with libuv uses libuv's default loop, so it is already shared and visible to anything else that wants to use it (and everything does, by default). Eventually, we will create additional loops for additional threads.

How would you share event loops from two different source versions of libuv?

This seems highly questionable at first. Can you link to or describe an instance where this is done? The issue is that libuv isn't a "leaf" library, but more like a "framework," since it takes over driving the application and its I/O. Unless the loops are running in different threads, I don't see how (practically) one would be able to use two different loops at all, or would want to, etc.

@aantron
Copy link
Collaborator Author

aantron commented Oct 31, 2020

For ocaml-hiredis, you would simply have the adapter depend on luv (or just lwt like it already does, if/when lwt itself depends on luv, though you may still want to make the dependency explicit). Then, you could write any C code against the vendored libuv, use the vendored headers, and gain access to the same loops used anywhere else in the final linked program (as long as you could get a reference to them). You can trivially get access to the default loop through libuv's APIs, whether in C, or in OCaml through the bindings exposed by Luv. In this case, the decision to vendor libuv should simplify using it.

The other discussion you linked to doesn't seem to be about having multiple libuvs linked in, but about sharing one libuv between multiple libraries. That's trivially possible with the vendored libuv as described above.

You may be interested in Luv's depending on headers test, which is a trivial OCaml program meant to be installed in opam (or esy) in the usual way, which has some C stubs that access the vendored libuv.

If binding to a bigger C library, you would ideally configure it to find the vendored headers and archive in the opam switch, and build it (essentially the same as Luv does with libuv).

@code-ghalib
Copy link
Contributor

As I understand it, code shouldn't be linked against two versions of libuv or two instances of libuv at all.

Ideally, the whole project could be configured to use the vendored libuv. The archive and headers are installed in the opam switch/esy sandbox.
Alternatively, we could provide a way for Luv to be built against an external libuv.

Thanks @aantron . I think we might need a way for Luv to be built against an external libuv at my organization (we don't use opam). Our package management is based on dpkg and there will be only one version of libuv used by all packages (OCaml or otherwise) in a distribution - and we can't dictate which one. Before discussing potential solutions though, let me try packaging Luv for our package manager, in its current state and get back to you on what, if any, issues I face.

If binding to a bigger C library, you would ideally configure it to find the vendored headers and archive in the opam switch, and build it (essentially the same as Luv does with libuv).

I don't think I can get the C library build configuration to care about the existence of opam, or even OCaml - it's distributed as pre-built artifact (.deb).

This seems highly questionable at first. Can you link to or describe an instance where this is done? The issue is that libuv isn't a "leaf" library, but more like a "framework," since it takes over driving the application and its I/O. Unless the loops are running in different threads, I don't see how (practically) one would be able to use two different loops at all, or would want to, etc.

I cannot link to an instance where this is done, I was speculating about the potential for it - but thank you, the "leaf" vs "framework" distinction makes sense to me now. Also makes sense that ocaml-hiredis can take a Luv adapter.

@rgrinberg
Copy link
Contributor

This is not entirely related, but would it be possible to take this opportunity to introduce a lwt_unix package? Since you're going to introduce a dependency on luv, it would be nice if the jsoo users of lwt that don't need an event loop will be able to use it.

@rgrinberg
Copy link
Contributor

It will no longer be possible to swap the Lwt_engine (in a reasonable way).

If Lwt_engine is removed, what's going to happen to libraries like lwt_glib? Not that I use gtk, but it seems like we'd lose the ability to make Lwt work with custom event loops.

@Lupus
Copy link

Lupus commented Nov 11, 2020

This is not entirely related, but would it be possible to take this opportunity to introduce a lwt_unix package? Since you're going to introduce a dependency on luv, it would be nice if the jsoo users of lwt that don't need an event loop will be able to use it.

I'd like to second that, we're using lwt with jsoo right now to maintain codebase that targets both native and js targets.

@aantron
Copy link
Collaborator Author

aantron commented Nov 13, 2020

This is not entirely related, but would it be possible to take this opportunity to introduce a lwt_unix package?

Yes, this seems like a good opportunity to do so.

If Lwt_engine is removed, what's going to happen to libraries like lwt_glib? Not that I use gtk, but it seems like we'd lose the ability to make Lwt work with custom event loops.

Indeed. There may be other ways to integrate custom event loops with libuv, but lwt_glib in its current form probably won't work. I understood from @avsm that 0install, the main known user of lwt_glib, will adapt to the change somehow (cc @talex5).

@talex5
Copy link
Contributor

talex5 commented Nov 13, 2020

I understood from @avsm that 0install, the main known user of lwt_glib, will adapt to the change somehow (cc @talex5).

First I've heard of it. Though when lwt_glib got split off from lwt, Debian took the opportunity to drop the package completely, so 0install is now having to vendor it there, which is a bit painful. Maybe we should just stop using Lwt in the GUI and go back to callbacks?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants