Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research on IO-bound tasks #22

Open
mratsim opened this issue Nov 29, 2019 · 4 comments
Open

Research on IO-bound tasks #22

mratsim opened this issue Nov 29, 2019 · 4 comments

Comments

@mratsim
Copy link
Owner

mratsim commented Nov 29, 2019

Weave / Project Picasso focuses on CPU-bound tasks, i.e. those are non-blocking and you can throw more CPU at it to have your result faster.

For IO-bound tasks the idea was to defer to specialized libraries like asyncdispatch and Chronos that use OS primitives (epoll/IOCP/kqueue) to handle IO efficiently.

However even for compute bound tasks we will have to deal with IO latencies for example in a distributed system or cluster. So we need a solution to do useful work in the downtime without blocking a whole thread.

That means:

  • either playing well with asyncdispatch/Chronos (do we run an event loop per thread or one event loop only ...)
  • or having deeper integration which probably will be better for people that needs both IO and compute but people only needing one or the other pay an extra tax. And the library will get significantly more complex, harder to maintain and with many more platform specific codepaths or even CPU specific in the case of coroutines with stack and register manipulation.

Research

  • Reduced I/O latencies with Futures

    Kyle Singer, Kunal Agrawal, I-Ting Angelina Lee

    https://arxiv.org/abs/1906.08239

    The paper explores coupling a Cilk-like workstealing runtime with a IO runtime based on Linux epoll and eventfd.

  • A practical solution to the Cactus Stack Problem

    Chaoran Yang, John Mellor-Crummey

    http://chaoran.me/assets/pdf/ws-spaa16.pdf

    Fibril: https://github.com/chaoran/fibril

    While not explicitly mentioning async I/O, the paper and the corresponding Fibril library are using
    coroutines/fibers-like tasks to achieve fast and extremely low overhead context switching.
    Coroutines are very efficient building blocks for async IO.

    For reference, the overhead is measured by fibonacci(40) which spawns billions of tasks, fibril achieves 130ms, Staccato 180ms, Weave 165-200ms depending on tradeoffs regarding memory management, more established runtimes have much more overhead: TBB 600ms~1s, Clang OpenMP ~2s, Julia Partr ~8s, HPX and GCC OpenMP cannot handle fib(40).

Implementations

@mratsim
Copy link
Owner Author

mratsim commented Nov 30, 2019

Also Rust coupled both IO and a task parallelism in the past and decided to avoid that:

https://github.com/aturon/rfcs/blob/remove-runtime/active/0000-remove-runtime.md

https://github.com/rust-lang/rfcs/blob/master/text/0230-remove-runtime.md

@mratsim
Copy link
Owner Author

mratsim commented May 4, 2020

An idea on how to play well with Asyncdispatch or Chronos or any future async/await library.

They all offer a poll() function that runs their event loop.

We can add a field pollHook*: proc() {.nimcall, gcsafe.} on each worker.
It would be setup by setPollingFunction(_: typedesc[Weave], poll: proc() {.nimcall, gcsafe.}) before Weave initialization (at first, can be relaxed later).

Then we modify loadBalance(), sync(), syncScope() to interleave pollHook calls before and after executing a task.

Note that loadBalance() is called in-between each parallelFor iterations:

  • If loop is fine-grained, even if it's executed on the main thread, there are plenty of opportunities to handle IO event.
  • If there is no hook, the if not pollHook.isNil: is very predictable and should be costless
  • If there is a hook, the syscalls to handle the IO event will probably slow down fine-grained parallelism a lot and also completely flush the CPU caches with data loaded by the kernel. This is bad, there are some ways to "mitigate" that:
    • Document the tradeoff
    • Only install the hook on the main thread, so that worker threads are not polluted
    • Give the option to install the hook on either the main thread or all thread.

Note that worker threads will sleep if they have no tasks, but it does not make sense for them to try to handle IO events without a task.

A potential issue is that a task can be migrated or for a parallel loop, it can even be split and then executed on 2 different threads, i.e. are the async libraries using {.threadvar.} to manage some global state? Because that will not work.

mratsim added a commit that referenced this issue May 6, 2020
* Expose isReady to check if sync will block (#123 #22)

* update README

* Add test

* The CI is very bad at precise sleep
@mratsim
Copy link
Owner Author

mratsim commented May 16, 2020

RFC #132 and its implementation with Weave as an independent background service is probably a better path forward #136

@mratsim
Copy link
Owner Author

mratsim commented Jan 10, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant