-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alan wants structured concurrency and parallel data processing #107
Comments
Two cents from the computational astrophysicist -- our work is indeed largely CPU-bound, not IO-bound. We have many compute tasks that run in parallel, but then must block waiting for the completion of "neighbor" tasks. Co-routines satisfy this, and the future / executor strategy I'm currently using "works", but it's sub-optimal:
Here is an example of how we are currently using async blocks and the Tokio executor. Note how I should also mention that we're only CPU-bound for single-node / multi-core calculations. On distributed-memory systems the "neighbor" tasks must be communicated across nodes, and it would be nice to treat upstream async tasks uniformly, whether it's a compute task or a network request. In regards to Niko's effort to collect priorities from various user bases, here are ours:
|
A quick question about your specific case @jzrake -- what motivated you to use async for your code in the first place? If it was just a single-node computation, I'm guessing that a non-async threadpool (or rayon) would have worked well. Did async stem from the need to support multi-node computations (where async to talk over the network is useful)? Or was it something else? |
@eminence -- I've had the impression that an async executor would minimize core idling from blocking tasks. Since |
As a data engineer who came to Rust from Scala and has worked extensively with Spark and Scala's fabulous async ecosystem, I'd like to offer some thoughts (and if I have time, could try writing up this story as well). Points:
I just want to join a list of multiple futures.... this would be super easy pattern in Scala. In Rust the above, if serialize_to_parquet is async and takes As an aside, concurrent processing is very different from the "data processing" that most people do. Most data scientists and engineers aren't really distributed systems people, and use tools without understanding async. They mostly come from Python (is there a character for the Python data scientist? If there isn't, there should be, this is a very important category! I guess Alan is that person, but JVM is very different from Python/Ruby). They might use tools like Spark, or Ray, which enable parallelism and distribution, but without the user really having to understand it. They write code that is almost always single threaded. So keep that in mind - the key to making async work for these folks is likely to make async "disappear" effectively behind other APIs which "look blocking". |
Up till now parallel processing is definitely very easy. Just change
I think only the first and second part is more troublesome such that a lot of stuff needs to be check one by one. |
Some links WRT Structured Concurrency to consider: https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/ Frankly I consider SC to be essential. |
What are the gaps in Rust today wrt structured concurrency? One could argue that waiting for async work to finish and being able to return errors in a natural way that fits the language paradigm is what current async-await is designed to do already.
… On Jul 11, 2021, at 11:31 PM, Matthias Urlichs ***@***.***> wrote:
Some links WRT Structured Concurrency to consider:
https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/ <https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/>
https://en.wikipedia.org/wiki/Structured_concurrency <https://en.wikipedia.org/wiki/Structured_concurrency>
Frankly I consider SC to be essential.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#107 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAIDPW6NG5SJQJKREVVKLLLTXKD4HANCNFSM42AG43FQ>.
|
There's also the ability to cancel other tasks seamlessly. You need them for a heap of reasons I could expound upon for the next ten pages; suffice it to say that SC with cancellation is what makes any nontrivial code using Python+asyncio (unstructured, abstraction: tasks+futures) a tedious and barely-debuggable mess while Python+trio (structured, has scoped cancellation, doesn't have/need user-visible Task nor Future classes) is the polar opposite IME. I'd like async Rust to be the latter, and I'd go as far as to say that if Niklaus and Barbara want to keep their sanity they need it to be the latter. https://rust-lang.github.io/async-book/01_getting_started/04_async_await_primer.html has these nonexisting chapters 6.3 thru 6.5 … |
NB, it's possible to code up an SC-style library using Future/Task as low-level primitives. The Python Still needs decent cancellation support though. |
Current cancellation system makes this impossible, as reminded here: https://www.reddit.com/r/rust/comments/wltcf8/announcing_rust_1630/ijwx3mz/ |
Can be leverage this from C++ community? https://github.com/kirkshoop/async_scope/blob/main/asyncscope.md |
Brief summary
Alan wants to intermix data processing and he finds it difficult. He misses Kotlin's support for coroutines. Barbara laments the lack of structured concurrency or Rayon-like APIs.
Optional details
The text was updated successfully, but these errors were encountered: