From 7d2a06282315f50bbe7064972287a3dc23b20810 Mon Sep 17 00:00:00 2001 From: Erich Ess Date: Wed, 7 Apr 2021 19:39:24 -0400 Subject: [PATCH 01/12] First draft of Barbaras experience solving computationally bound problems with async --- .../barbara_simulates_hydrodynamics.md | 81 +++++++++++++++++++ 1 file changed, 81 insertions(+) create mode 100644 src/vision/status_quo/barbara_simulates_hydrodynamics.md diff --git a/src/vision/status_quo/barbara_simulates_hydrodynamics.md b/src/vision/status_quo/barbara_simulates_hydrodynamics.md new file mode 100644 index 00000000..27f38b60 --- /dev/null +++ b/src/vision/status_quo/barbara_simulates_hydrodynamics.md @@ -0,0 +1,81 @@ +# 😱 Status quo stories: Barbara Builds a Hydrodynamics Simulator + +[How To Vision: Status Quo]: ../how_to_vision/status_quo.md +[the raw source from this template]: https://raw.githubusercontent.com/rust-lang/wg-async-foundations/master/src/vision/status_quo/template.md +[`status_quo`]: https://github.com/rust-lang/wg-async-foundations/tree/master/src/vision/status_quo +[`SUMMARY.md`]: https://github.com/rust-lang/wg-async-foundations/blob/master/src/SUMMARY.md +[open issues]: https://github.com/rust-lang/wg-async-foundations/issues?q=is%3Aopen+is%3Aissue+label%3Astatus-quo-story-ideas +[open an issue of your own]: https://github.com/rust-lang/wg-async-foundations/issues/new?assignees=&labels=good+first+issue%2C+help+wanted%2C+status-quo-story-ideas&template=-status-quo--story-issue.md&title= + + +## 🚧 Warning: Draft status 🚧 + +This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today. + +If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as they reflect peoples' experiences, status quo stories [cannot be wrong], only inaccurate). Alternatively, you may wish to [add your own status quo story][htvsq]! + +## The story +### Problem +Barbara needed to build a tool to solve hydrodynamics simulations; there is a common method for this that subdivides a region into a grid and computes the solution for each grid patch. All the patches in a grid for a point in time are independent and can be computed in parallel, but they are dependent on neighboring patches in the previously computed frame in time. This is a well known computational model and the patterns for basic parallelization are well established. + +Barabara wanted to write a performant tool to compute the solutions to the simulations of her research. She chose Rust because she was already familiar with it and it had good qualities for writing performant code. After implementing the core mathematical formulas, Barbara began implementing the parallelization architecture. + +Her first attempt to was to emulate a common CFD design pattern: using message passing to communicate between processes that are each assigned a specific patch in the grid. So she assign one thread to each patch and used messages to communicate solution state to dependent patches. With one thread per patch this usually meant that there were 5-10x more threads than CPU cores. + +This solution was fine, but Barbara was not satisified. She had two problems with it: first, she didn't like that the design would greedily use all the resources on the machine and, second, when her team added a new feature (tracer particles) that increased the complexity of how patches interact with each other and the number of messages that were passsed between threads increased to the point that it became a performance bottleneck and parallel processing became no faster than serial processing. So, Barbara decided to find a better solution. + +### Solution Path +What Barbara wanted to do was find a way to more efficiently use threads: have a fixed number of threads that each mapped to a core on the CPU and assign patches to those threads as patches became ready to compute. The design of the `async` framework seemed to provide exactly that behavior. And to move away from the message passing design, because the number of messages being passed was proportional to the number of trace particles being traced. + +As Barbara began working on her new design with `tokio`, her use of `async` went from a general (from the textbook) use of basic `async` features to a more specific implementation leveraging exactly the features that were most suited for her needs. + +At first, Barbara was under a false impression about what async executors do. She had assumed that a multi-threaded executor could automatically move the execution of an async block to a worker thread. Then she discovered that async tasks must be explicitly spawned into into a thread pool if they are to be executed on a worker thread. This meant that the algorithm to be parallelized became strongly coupled to both the spawner and the executor. Code that used to cleanly express a physics algorithm now had interspersed references to the task spawner, not only making it harder to understand, but also making it impossible to try different execution strategies, since with Tokio the spawner and executor are the same object (the Tokio runtime). Barbara feels that a better design for data parallelism would enable better separation of concerns: a group of interdependent compute tasks, and a strategy to execute them in parallel. + +In order to remove the need for message passing, Barbara moved to a shared state design: she would keep a table tracking the solution state for every grid patch and a specific patch would only start its computation task when solutions were written for all the patches it was dependent on. So, each task needs to access the table with the solution results of all the other tasks. Learning how to properly use shared data with `async` was a new challenge. The initial design: + +```rust + let mut stage_primitive_and_scalar = |index: BlockIndex, state: BlockState, hydro: H, geometry: GridGeometry| { + let stage = async move { + let p = state.try_to_primitive(&hydro, &geometry)?; + let s = state.scalar_mass / &geometry.cell_volumes / p.map(P::lorentz_factor); + Ok::<_, HydroError>( ( p.to_shared(), s.to_shared() ) ) + }; + stage_map.insert(index, runtime.spawn(stage).map(|f| f.unwrap()).shared()); + }; +``` +lacked performance because she needed to clone the value for every task. So, Barbara switched over to using `Arc` to keep a thread safe RC to the shared data. But this change introduced a lot of `.map` and `.unwrap` function calls, making the code much harder to read. She realized that managing the dependency graph was not intuitive when using `async` for concurrency. + +During the move to `async` Barbara ran into a steep learning curve with error handling. The initial version of her design just used `panic!`s to fail the program if an error was encountered, but the stack traces were almost unreadable. She asked her teammate Grace to migrate over to using the `Result` idiom for error handling and Grace found a major inconvenience. The Rust type inference inconsistently breaks when propagating `Result` in `async` blocks. Grace frequently found that she had to specify the type of the error when creating a result value: +```rust +Ok::<_, HydroError>( ( p.to_shared(), s.to_shared() ) ) +``` +And she could not figure out why she had to add the `::<_, HydroError>` to some of the `Result` values. + +Once her team began using the new `async` design for their simulations, they noticed an important issue that impacted productivity: compilation time had now increased to between 30 and 60 seconds. The nature of their work requires frequent changes to code and recompilation and 30-60 seconds is long enough to have a noticeable impact on their quality of life. What her and her team want is for compilation to be 2 to 3 seconds. Barbara believes that the use of `async` is a major contributor to the long compilation times. + +This new solution works, but Barbara is not satisfied with how complex her code wound up becoming with the move to `async` and the fact that compilation time is now 30-60 seconds. The state sharing adding a large amount of cruft with `Arc` and `async` is not well suited for using a dependency graph to schedule tasks so implementing this solution created a key component of her program that was difficult to understand and pervasive. Ultimately, her conclusion was that `async` is not appropriate for parallelizing computational tasks. She will be trying a new design based upon Rayon in the next version of her application. + +## 🤔 Frequently Asked Questions + +### **What are the morals of the story?** +- `async` looks to be the wrong choice for parallelizing compute bound/computational work +- There is a lack of guidance to help people solving such problems get started on the right foot +- Quality of Life issues (compilation time, type inference on `Result`) can create a drag on users ability to focus on their domain problem + +### **What are the sources for this story?** +This story is based on the experience of building the [kilonova](https://github.com/clemson-cal/app-kilonova) hydrodynamics simulation solver. + +### **Why did you choose *NAME* to tell this story?** +I chose Barbara as the primary character in this story because this work was driven by someone with experience in Rust specifically but does not have much systems level experience. Grace was chosen as a supporting character because of that persons experience with C/C++ programming and to avoid repeating characters. + +### **How would this story have played out differently for the other characters?** +For Alan, there's a good chance he would have already had experience working with either async workflows in another language or doing parallelization of compute bound tasks; and so would already know from experience that `async` was not the right place to start. Grace, likewise, might already have experience with problems like this and would know what to look for when searching for tools. For Niklaus, the experience would probably be the same, as it's very easy to assume that `tokio` is the starting place for concurrency in Rust. + +[character]: ../characters.md +[status quo stories]: ./status_quo.md +[Alan]: ../characters/alan.md +[Grace]: ../characters/grace.md +[Niklaus]: ../characters/niklaus.md +[Barbara]: ../characters/barbara.md +[htvsq]: ../how_to_vision/status_quo.md +[cannot be wrong]: ../how_to_vision/comment.md#comment-to-understand-or-improve-not-to-negate-or-dissuade From ecb858d52cb78d3a3438f36c8fd24e654acd9cca Mon Sep 17 00:00:00 2001 From: Erich Ess Date: Wed, 7 Apr 2021 19:44:36 -0400 Subject: [PATCH 02/12] Removed extraneous text --- src/vision/status_quo/barbara_simulates_hydrodynamics.md | 8 -------- 1 file changed, 8 deletions(-) diff --git a/src/vision/status_quo/barbara_simulates_hydrodynamics.md b/src/vision/status_quo/barbara_simulates_hydrodynamics.md index 27f38b60..cb11dade 100644 --- a/src/vision/status_quo/barbara_simulates_hydrodynamics.md +++ b/src/vision/status_quo/barbara_simulates_hydrodynamics.md @@ -1,13 +1,5 @@ # 😱 Status quo stories: Barbara Builds a Hydrodynamics Simulator -[How To Vision: Status Quo]: ../how_to_vision/status_quo.md -[the raw source from this template]: https://raw.githubusercontent.com/rust-lang/wg-async-foundations/master/src/vision/status_quo/template.md -[`status_quo`]: https://github.com/rust-lang/wg-async-foundations/tree/master/src/vision/status_quo -[`SUMMARY.md`]: https://github.com/rust-lang/wg-async-foundations/blob/master/src/SUMMARY.md -[open issues]: https://github.com/rust-lang/wg-async-foundations/issues?q=is%3Aopen+is%3Aissue+label%3Astatus-quo-story-ideas -[open an issue of your own]: https://github.com/rust-lang/wg-async-foundations/issues/new?assignees=&labels=good+first+issue%2C+help+wanted%2C+status-quo-story-ideas&template=-status-quo--story-issue.md&title= - - ## 🚧 Warning: Draft status 🚧 This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today. From 5813338cad8942926d085c385d10e51b4889d296 Mon Sep 17 00:00:00 2001 From: Erich Ess Date: Thu, 8 Apr 2021 08:52:12 -0400 Subject: [PATCH 03/12] Fixing typos --- src/vision/status_quo/barbara_simulates_hydrodynamics.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/vision/status_quo/barbara_simulates_hydrodynamics.md b/src/vision/status_quo/barbara_simulates_hydrodynamics.md index cb11dade..8151bee3 100644 --- a/src/vision/status_quo/barbara_simulates_hydrodynamics.md +++ b/src/vision/status_quo/barbara_simulates_hydrodynamics.md @@ -21,7 +21,7 @@ What Barbara wanted to do was find a way to more efficiently use threads: have a As Barbara began working on her new design with `tokio`, her use of `async` went from a general (from the textbook) use of basic `async` features to a more specific implementation leveraging exactly the features that were most suited for her needs. -At first, Barbara was under a false impression about what async executors do. She had assumed that a multi-threaded executor could automatically move the execution of an async block to a worker thread. Then she discovered that async tasks must be explicitly spawned into into a thread pool if they are to be executed on a worker thread. This meant that the algorithm to be parallelized became strongly coupled to both the spawner and the executor. Code that used to cleanly express a physics algorithm now had interspersed references to the task spawner, not only making it harder to understand, but also making it impossible to try different execution strategies, since with Tokio the spawner and executor are the same object (the Tokio runtime). Barbara feels that a better design for data parallelism would enable better separation of concerns: a group of interdependent compute tasks, and a strategy to execute them in parallel. +At first, Barbara was under a false impression about what async executors do. She had assumed that a multi-threaded executor could automatically move the execution of an async block to a worker thread. Then she discovered that async tasks must be explicitly spawned into a thread pool if they are to be executed on a worker thread. This meant that the algorithm to be parallelized became strongly coupled to both the spawner and the executor. Code that used to cleanly express a physics algorithm now had interspersed references to the task spawner, not only making it harder to understand, but also making it impossible to try different execution strategies, since with Tokio the spawner and executor are the same object (the Tokio runtime). Barbara feels that a better design for data parallelism would enable better separation of concerns: a group of interdependent compute tasks, and a strategy to execute them in parallel. In order to remove the need for message passing, Barbara moved to a shared state design: she would keep a table tracking the solution state for every grid patch and a specific patch would only start its computation task when solutions were written for all the patches it was dependent on. So, each task needs to access the table with the solution results of all the other tasks. Learning how to properly use shared data with `async` was a new challenge. The initial design: @@ -57,7 +57,7 @@ This new solution works, but Barbara is not satisfied with how complex her code ### **What are the sources for this story?** This story is based on the experience of building the [kilonova](https://github.com/clemson-cal/app-kilonova) hydrodynamics simulation solver. -### **Why did you choose *NAME* to tell this story?** +### **Why did you choose Barbara and Grace to tell this story?** I chose Barbara as the primary character in this story because this work was driven by someone with experience in Rust specifically but does not have much systems level experience. Grace was chosen as a supporting character because of that persons experience with C/C++ programming and to avoid repeating characters. ### **How would this story have played out differently for the other characters?** From f6ae9bdb42c532e4b992f9254cf80175652193fa Mon Sep 17 00:00:00 2001 From: Erich Ess Date: Thu, 8 Apr 2021 08:53:09 -0400 Subject: [PATCH 04/12] Formatting --- src/vision/status_quo/barbara_simulates_hydrodynamics.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/vision/status_quo/barbara_simulates_hydrodynamics.md b/src/vision/status_quo/barbara_simulates_hydrodynamics.md index 8151bee3..566cfe62 100644 --- a/src/vision/status_quo/barbara_simulates_hydrodynamics.md +++ b/src/vision/status_quo/barbara_simulates_hydrodynamics.md @@ -61,7 +61,9 @@ This story is based on the experience of building the [kilonova](https://github. I chose Barbara as the primary character in this story because this work was driven by someone with experience in Rust specifically but does not have much systems level experience. Grace was chosen as a supporting character because of that persons experience with C/C++ programming and to avoid repeating characters. ### **How would this story have played out differently for the other characters?** -For Alan, there's a good chance he would have already had experience working with either async workflows in another language or doing parallelization of compute bound tasks; and so would already know from experience that `async` was not the right place to start. Grace, likewise, might already have experience with problems like this and would know what to look for when searching for tools. For Niklaus, the experience would probably be the same, as it's very easy to assume that `tokio` is the starting place for concurrency in Rust. +- Alan: there's a good chance he would have already had experience working with either async workflows in another language or doing parallelization of compute bound tasks; and so would already know from experience that `async` was not the right place to start. +- Grace: likewise, might already have experience with problems like this and would know what to look for when searching for tools. +- Niklaus: the experience would probably be the same, as it's very easy to assume that `tokio` is the starting place for concurrency in Rust. [character]: ../characters.md [status quo stories]: ./status_quo.md From 6f47a88f00e71fcfcaf77bdad5f47b31f86cc709 Mon Sep 17 00:00:00 2001 From: Erich Ess Date: Thu, 8 Apr 2021 10:30:56 -0400 Subject: [PATCH 05/12] Improved clarity --- src/vision/status_quo/barbara_simulates_hydrodynamics.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/vision/status_quo/barbara_simulates_hydrodynamics.md b/src/vision/status_quo/barbara_simulates_hydrodynamics.md index 566cfe62..b81105f6 100644 --- a/src/vision/status_quo/barbara_simulates_hydrodynamics.md +++ b/src/vision/status_quo/barbara_simulates_hydrodynamics.md @@ -17,7 +17,7 @@ Her first attempt to was to emulate a common CFD design pattern: using message p This solution was fine, but Barbara was not satisified. She had two problems with it: first, she didn't like that the design would greedily use all the resources on the machine and, second, when her team added a new feature (tracer particles) that increased the complexity of how patches interact with each other and the number of messages that were passsed between threads increased to the point that it became a performance bottleneck and parallel processing became no faster than serial processing. So, Barbara decided to find a better solution. ### Solution Path -What Barbara wanted to do was find a way to more efficiently use threads: have a fixed number of threads that each mapped to a core on the CPU and assign patches to those threads as patches became ready to compute. The design of the `async` framework seemed to provide exactly that behavior. And to move away from the message passing design, because the number of messages being passed was proportional to the number of trace particles being traced. +What Barbara wanted to do was find a way to more efficiently use threads: have a fixed number of threads that each mapped to a core on the CPU and assign patches to those threads as patches became ready to compute. The `async` feature seemed to provide exactly that behavior. And to move away from the message passing design, because the number of messages being passed was proportional to the number of trace particles being traced. As Barbara began working on her new design with `tokio`, her use of `async` went from a general (from the textbook) use of basic `async` features to a more specific implementation leveraging exactly the features that were most suited for her needs. From b11494ff628ccb85cb798d0ba0147fde8cc5af8b Mon Sep 17 00:00:00 2001 From: Erich Ess Date: Mon, 12 Apr 2021 10:04:51 -0400 Subject: [PATCH 06/12] Rewrote several sections to improve the clarity of the story --- .../status_quo/barbara_simulates_hydrodynamics.md | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/src/vision/status_quo/barbara_simulates_hydrodynamics.md b/src/vision/status_quo/barbara_simulates_hydrodynamics.md index b81105f6..c116b185 100644 --- a/src/vision/status_quo/barbara_simulates_hydrodynamics.md +++ b/src/vision/status_quo/barbara_simulates_hydrodynamics.md @@ -17,14 +17,11 @@ Her first attempt to was to emulate a common CFD design pattern: using message p This solution was fine, but Barbara was not satisified. She had two problems with it: first, she didn't like that the design would greedily use all the resources on the machine and, second, when her team added a new feature (tracer particles) that increased the complexity of how patches interact with each other and the number of messages that were passsed between threads increased to the point that it became a performance bottleneck and parallel processing became no faster than serial processing. So, Barbara decided to find a better solution. ### Solution Path -What Barbara wanted to do was find a way to more efficiently use threads: have a fixed number of threads that each mapped to a core on the CPU and assign patches to those threads as patches became ready to compute. The `async` feature seemed to provide exactly that behavior. And to move away from the message passing design, because the number of messages being passed was proportional to the number of trace particles being traced. +What Barbara wanted use the CPU more efficiently: she would decouple the work that needed to be done (the patches) from the workers (threads) this would allow her to more finely control how many resources were used. So, she began looking for a tool in Rust that would meet this design pattern. When she read about `async` and how it allowed the user to define units of work, called tasks, and send those to an executor which would manage the execution of those tasks across a set of workers, she thought she'd found exactly what she needed. Further reading indicate that `tokio` was the runtime of choice for `async` in the community and so she began building a new CFD tool with `async` and `tokio`. And to move away from the message passing design, because the number of messages being passed was proportional to the number of trace particles being traced. -As Barbara began working on her new design with `tokio`, her use of `async` went from a general (from the textbook) use of basic `async` features to a more specific implementation leveraging exactly the features that were most suited for her needs. - -At first, Barbara was under a false impression about what async executors do. She had assumed that a multi-threaded executor could automatically move the execution of an async block to a worker thread. Then she discovered that async tasks must be explicitly spawned into a thread pool if they are to be executed on a worker thread. This meant that the algorithm to be parallelized became strongly coupled to both the spawner and the executor. Code that used to cleanly express a physics algorithm now had interspersed references to the task spawner, not only making it harder to understand, but also making it impossible to try different execution strategies, since with Tokio the spawner and executor are the same object (the Tokio runtime). Barbara feels that a better design for data parallelism would enable better separation of concerns: a group of interdependent compute tasks, and a strategy to execute them in parallel. - -In order to remove the need for message passing, Barbara moved to a shared state design: she would keep a table tracking the solution state for every grid patch and a specific patch would only start its computation task when solutions were written for all the patches it was dependent on. So, each task needs to access the table with the solution results of all the other tasks. Learning how to properly use shared data with `async` was a new challenge. The initial design: +As Barbara began working on her new design with `tokio`, her use of `async` went from a general (from the textbook) use of basic `async` features to a more specific implementation leveraging exactly the features that were most suited for her needs. At first, Barbara was under a false impression about what async executors do. She had assumed that a multi-threaded executor could automatically move the execution of an async block to a worker thread. When this turned out to wrong, she went to Stackoverflow and learned that async tasks must be explicitly spawned into a thread pool if they are to be executed on a worker thread. This meant that the algorithm to be parallelized became strongly coupled to both the spawner and the executor. Code that used to cleanly express a physics algorithm now had interspersed references to the task spawner, not only making it harder to understand, but also making it impossible to try different execution strategies, since with Tokio the spawner and executor are the same object (the Tokio runtime). Barbara felt that a better design for data parallelism would enable better separation of concerns: a group of interdependent compute tasks, and a strategy to execute them in parallel. +Along with moving the execution of the computational tasks to `async`, Barbara also used this as an opportunity to remove the message passing that was used to coordinate the computation of each patch. She used the `async` API to define dependencies between patches so that a patch would only begin computing its solution when its neighboring patches had completed. This also required setting up shared state that would store the solutions for all the patches as they were computed, so that dependents could access them. Learning how to properly use shared data with `async` was a new challenge. The initial design: ```rust let mut stage_primitive_and_scalar = |index: BlockIndex, state: BlockState, hydro: H, geometry: GridGeometry| { let stage = async move { @@ -43,9 +40,9 @@ Ok::<_, HydroError>( ( p.to_shared(), s.to_shared() ) ) ``` And she could not figure out why she had to add the `::<_, HydroError>` to some of the `Result` values. -Once her team began using the new `async` design for their simulations, they noticed an important issue that impacted productivity: compilation time had now increased to between 30 and 60 seconds. The nature of their work requires frequent changes to code and recompilation and 30-60 seconds is long enough to have a noticeable impact on their quality of life. What her and her team want is for compilation to be 2 to 3 seconds. Barbara believes that the use of `async` is a major contributor to the long compilation times. +Finally, once her team began using the new `async` design for their simulations, they noticed an important issue that impacted productivity: compilation time had now increased to between 30 and 60 seconds. The nature of their work requires frequent changes to code and recompilation and 30-60 seconds is long enough to have a noticeable impact on their quality of life. What her and her team want is for compilation to be 2 to 3 seconds. Barbara believes that the use of `async` is a major contributor to the long compilation times. -This new solution works, but Barbara is not satisfied with how complex her code wound up becoming with the move to `async` and the fact that compilation time is now 30-60 seconds. The state sharing adding a large amount of cruft with `Arc` and `async` is not well suited for using a dependency graph to schedule tasks so implementing this solution created a key component of her program that was difficult to understand and pervasive. Ultimately, her conclusion was that `async` is not appropriate for parallelizing computational tasks. She will be trying a new design based upon Rayon in the next version of her application. +This new solution works, but Barbara is not satisfied with how complex her code became after the move to `async` and that compilation time is now 30-60 seconds. The state sharing adding a large amount of cruft with `Arc` and `async` is not well suited for using a dependency graph to schedule tasks so implementing this solution created a key component of her program that was difficult to understand and pervasive. Ultimately, her conclusion was that `async` is not appropriate for parallelizing computational tasks. She will be trying a new design based upon Rayon in the next version of her application. ## 🤔 Frequently Asked Questions From 144075e6527ebccc6fa409da8bee4edec7d9c5bf Mon Sep 17 00:00:00 2001 From: Erich Ess Date: Mon, 12 Apr 2021 10:36:41 -0400 Subject: [PATCH 07/12] Improving clarity of the story --- src/vision/status_quo/barbara_simulates_hydrodynamics.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/vision/status_quo/barbara_simulates_hydrodynamics.md b/src/vision/status_quo/barbara_simulates_hydrodynamics.md index c116b185..11000ec0 100644 --- a/src/vision/status_quo/barbara_simulates_hydrodynamics.md +++ b/src/vision/status_quo/barbara_simulates_hydrodynamics.md @@ -8,13 +8,13 @@ If you would like to expand on this story, or adjust the answers to the FAQ, fee ## The story ### Problem -Barbara needed to build a tool to solve hydrodynamics simulations; there is a common method for this that subdivides a region into a grid and computes the solution for each grid patch. All the patches in a grid for a point in time are independent and can be computed in parallel, but they are dependent on neighboring patches in the previously computed frame in time. This is a well known computational model and the patterns for basic parallelization are well established. +Barbara is a professor of physics at the University of Rustville. She needed to build a tool to solve hydrodynamics simulations; there is a common method for this that subdivides a region into a grid and computes the solution for each grid patch. All the patches in a grid for a point in time are independent and can be computed in parallel, but they are dependent on neighboring patches in the previously computed frame in time. This is a well known computational model and the patterns for basic parallelization are well established. -Barabara wanted to write a performant tool to compute the solutions to the simulations of her research. She chose Rust because she was already familiar with it and it had good qualities for writing performant code. After implementing the core mathematical formulas, Barbara began implementing the parallelization architecture. +Barabara wanted to write a performant tool to compute the solutions to the simulations of her research. She chose Rust because she needed high performance but she also wanted something that could be maintained by her students, who are not professional programmers. Rust's safety guarantees giver he confidence that her results are not going to be corrupted by data races or other programming errors. After implementing the core mathematical formulas, Barbara began implementing the parallelization architecture. Her first attempt to was to emulate a common CFD design pattern: using message passing to communicate between processes that are each assigned a specific patch in the grid. So she assign one thread to each patch and used messages to communicate solution state to dependent patches. With one thread per patch this usually meant that there were 5-10x more threads than CPU cores. -This solution was fine, but Barbara was not satisified. She had two problems with it: first, she didn't like that the design would greedily use all the resources on the machine and, second, when her team added a new feature (tracer particles) that increased the complexity of how patches interact with each other and the number of messages that were passsed between threads increased to the point that it became a performance bottleneck and parallel processing became no faster than serial processing. So, Barbara decided to find a better solution. +This solution worked, but Barbara had two problems with it. First, it gave her no control over CPU usage so the solution would greedily use all available CPU resources. Second, using messages to communicate solution values between patches did not scale when her team added a new feature (tracer particles) that added additional solution data the additional messages caused by this change created so much overhead that parallel processing was no faster than serial. So, Barbara decided to find a better solution. ### Solution Path What Barbara wanted use the CPU more efficiently: she would decouple the work that needed to be done (the patches) from the workers (threads) this would allow her to more finely control how many resources were used. So, she began looking for a tool in Rust that would meet this design pattern. When she read about `async` and how it allowed the user to define units of work, called tasks, and send those to an executor which would manage the execution of those tasks across a set of workers, she thought she'd found exactly what she needed. Further reading indicate that `tokio` was the runtime of choice for `async` in the community and so she began building a new CFD tool with `async` and `tokio`. And to move away from the message passing design, because the number of messages being passed was proportional to the number of trace particles being traced. From b1cf2ee3895ce088abe8efd459582037f5506bc2 Mon Sep 17 00:00:00 2001 From: Erich Ess Date: Mon, 12 Apr 2021 10:43:12 -0400 Subject: [PATCH 08/12] Improving clarity of the story --- src/vision/status_quo/barbara_simulates_hydrodynamics.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/vision/status_quo/barbara_simulates_hydrodynamics.md b/src/vision/status_quo/barbara_simulates_hydrodynamics.md index 11000ec0..61ef0dea 100644 --- a/src/vision/status_quo/barbara_simulates_hydrodynamics.md +++ b/src/vision/status_quo/barbara_simulates_hydrodynamics.md @@ -17,11 +17,11 @@ Her first attempt to was to emulate a common CFD design pattern: using message p This solution worked, but Barbara had two problems with it. First, it gave her no control over CPU usage so the solution would greedily use all available CPU resources. Second, using messages to communicate solution values between patches did not scale when her team added a new feature (tracer particles) that added additional solution data the additional messages caused by this change created so much overhead that parallel processing was no faster than serial. So, Barbara decided to find a better solution. ### Solution Path -What Barbara wanted use the CPU more efficiently: she would decouple the work that needed to be done (the patches) from the workers (threads) this would allow her to more finely control how many resources were used. So, she began looking for a tool in Rust that would meet this design pattern. When she read about `async` and how it allowed the user to define units of work, called tasks, and send those to an executor which would manage the execution of those tasks across a set of workers, she thought she'd found exactly what she needed. Further reading indicate that `tokio` was the runtime of choice for `async` in the community and so she began building a new CFD tool with `async` and `tokio`. And to move away from the message passing design, because the number of messages being passed was proportional to the number of trace particles being traced. +To address the first problem: Barbara would decouple the work that needed to be done (solving each patch) from the workers (threads) this would allow her to more finely control how many resources were used. So, she began looking for a tool in Rust that would meet this design pattern. When she read about `async` and how it allowed the user to define units of work, called tasks, and send those to an executor which would manage the execution of those tasks across a set of workers, she thought she'd found exactly what she needed. Further reading indicate that `tokio` was the runtime of choice for `async` in the community and so she began building a new CFD tool with `async` and `tokio`. And to move away from the message passing design, because the number of messages being passed was proportional to the number of trace particles being traced. As Barbara began working on her new design with `tokio`, her use of `async` went from a general (from the textbook) use of basic `async` features to a more specific implementation leveraging exactly the features that were most suited for her needs. At first, Barbara was under a false impression about what async executors do. She had assumed that a multi-threaded executor could automatically move the execution of an async block to a worker thread. When this turned out to wrong, she went to Stackoverflow and learned that async tasks must be explicitly spawned into a thread pool if they are to be executed on a worker thread. This meant that the algorithm to be parallelized became strongly coupled to both the spawner and the executor. Code that used to cleanly express a physics algorithm now had interspersed references to the task spawner, not only making it harder to understand, but also making it impossible to try different execution strategies, since with Tokio the spawner and executor are the same object (the Tokio runtime). Barbara felt that a better design for data parallelism would enable better separation of concerns: a group of interdependent compute tasks, and a strategy to execute them in parallel. -Along with moving the execution of the computational tasks to `async`, Barbara also used this as an opportunity to remove the message passing that was used to coordinate the computation of each patch. She used the `async` API to define dependencies between patches so that a patch would only begin computing its solution when its neighboring patches had completed. This also required setting up shared state that would store the solutions for all the patches as they were computed, so that dependents could access them. Learning how to properly use shared data with `async` was a new challenge. The initial design: +With the move to `async`, Barbara saw an opportunity to solve her second program. Rather than using message passing to coordinate patch computation, she used the `async` API to define dependencies between patches so that a patch would only begin computing its solution when its neighboring patches had completed. She setup a shared data structure to track the solutions for each patch now that messages would not be passing that data. Learning how to properly use shared data with `async` was a new challenge. The initial design: ```rust let mut stage_primitive_and_scalar = |index: BlockIndex, state: BlockState, hydro: H, geometry: GridGeometry| { let stage = async move { @@ -34,7 +34,7 @@ Along with moving the execution of the computational tasks to `async`, Barbara a ``` lacked performance because she needed to clone the value for every task. So, Barbara switched over to using `Arc` to keep a thread safe RC to the shared data. But this change introduced a lot of `.map` and `.unwrap` function calls, making the code much harder to read. She realized that managing the dependency graph was not intuitive when using `async` for concurrency. -During the move to `async` Barbara ran into a steep learning curve with error handling. The initial version of her design just used `panic!`s to fail the program if an error was encountered, but the stack traces were almost unreadable. She asked her teammate Grace to migrate over to using the `Result` idiom for error handling and Grace found a major inconvenience. The Rust type inference inconsistently breaks when propagating `Result` in `async` blocks. Grace frequently found that she had to specify the type of the error when creating a result value: +A new problem arose during the move to `async`: a steep learning curve with error handling. The initial version of her design used `panic!`s to fail the program if an error was encountered, but the stack traces were almost unreadable. She asked her teammate Grace to migrate over to using the `Result` idiom for error handling and Grace found a major inconvenience. The Rust type inference inconsistently breaks when propagating `Result` in `async` blocks. Grace frequently found that she had to specify the type of the error when creating a result value: ```rust Ok::<_, HydroError>( ( p.to_shared(), s.to_shared() ) ) ``` From 28e0259c6b518d9b2c8ae837a6a8dede847a6f69 Mon Sep 17 00:00:00 2001 From: Erich Ess Date: Mon, 12 Apr 2021 10:53:59 -0400 Subject: [PATCH 09/12] Switch main character to Niklaus from Barbara --- .../barbara_simulates_hydrodynamics.md | 24 +++++++++---------- 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/src/vision/status_quo/barbara_simulates_hydrodynamics.md b/src/vision/status_quo/barbara_simulates_hydrodynamics.md index 61ef0dea..8778815e 100644 --- a/src/vision/status_quo/barbara_simulates_hydrodynamics.md +++ b/src/vision/status_quo/barbara_simulates_hydrodynamics.md @@ -8,20 +8,20 @@ If you would like to expand on this story, or adjust the answers to the FAQ, fee ## The story ### Problem -Barbara is a professor of physics at the University of Rustville. She needed to build a tool to solve hydrodynamics simulations; there is a common method for this that subdivides a region into a grid and computes the solution for each grid patch. All the patches in a grid for a point in time are independent and can be computed in parallel, but they are dependent on neighboring patches in the previously computed frame in time. This is a well known computational model and the patterns for basic parallelization are well established. +Niklaus is a professor of physics at the University of Rustville. He needed to build a tool to solve hydrodynamics simulations; there is a common method for this that subdivides a region into a grid and computes the solution for each grid patch. All the patches in a grid for a point in time are independent and can be computed in parallel, but they are dependent on neighboring patches in the previously computed frame in time. This is a well known computational model and the patterns for basic parallelization are well established. -Barabara wanted to write a performant tool to compute the solutions to the simulations of her research. She chose Rust because she needed high performance but she also wanted something that could be maintained by her students, who are not professional programmers. Rust's safety guarantees giver he confidence that her results are not going to be corrupted by data races or other programming errors. After implementing the core mathematical formulas, Barbara began implementing the parallelization architecture. +Niklaus wanted to write a performant tool to compute the solutions to the simulations of his research. He chose Rust because he needed high performance but he also wanted something that could be maintained by his students, who are not professional programmers. Rust's safety guarantees giver him confidence that his results are not going to be corrupted by data races or other programming errors. After implementing the core mathematical formulas, Niklaus began implementing the parallelization architecture. -Her first attempt to was to emulate a common CFD design pattern: using message passing to communicate between processes that are each assigned a specific patch in the grid. So she assign one thread to each patch and used messages to communicate solution state to dependent patches. With one thread per patch this usually meant that there were 5-10x more threads than CPU cores. +His first attempt to was to emulate a common CFD design pattern: using message passing to communicate between processes that are each assigned a specific patch in the grid. So he assign one thread to each patch and used messages to communicate solution state to dependent patches. With one thread per patch this usually meant that there were 5-10x more threads than CPU cores. -This solution worked, but Barbara had two problems with it. First, it gave her no control over CPU usage so the solution would greedily use all available CPU resources. Second, using messages to communicate solution values between patches did not scale when her team added a new feature (tracer particles) that added additional solution data the additional messages caused by this change created so much overhead that parallel processing was no faster than serial. So, Barbara decided to find a better solution. +This solution worked, but Niklaus had two problems with it. First, it gave him no control over CPU usage so the solution would greedily use all available CPU resources. Second, using messages to communicate solution values between patches did not scale when his team added a new feature (tracer particles) the additional messages caused by this change created so much overhead that parallel processing was no faster than serial. So, Niklaus decided to find a better solution. ### Solution Path -To address the first problem: Barbara would decouple the work that needed to be done (solving each patch) from the workers (threads) this would allow her to more finely control how many resources were used. So, she began looking for a tool in Rust that would meet this design pattern. When she read about `async` and how it allowed the user to define units of work, called tasks, and send those to an executor which would manage the execution of those tasks across a set of workers, she thought she'd found exactly what she needed. Further reading indicate that `tokio` was the runtime of choice for `async` in the community and so she began building a new CFD tool with `async` and `tokio`. And to move away from the message passing design, because the number of messages being passed was proportional to the number of trace particles being traced. +To address the first problem: Niklaus would decouple the work that needed to be done (solving each patch) from the workers (threads) this would allow him to more finely control how many resources were used. So, he began looking for a tool in Rust that would meet this design pattern. When he read about `async` and how it allowed the user to define units of work, called tasks, and send those to an executor which would manage the execution of those tasks across a set of workers, he thought he'd found exactly what he needed. Further reading indicate that `tokio` was the runtime of choice for `async` in the community and, so, he began building a new CFD tool with `async` and `tokio`. -As Barbara began working on her new design with `tokio`, her use of `async` went from a general (from the textbook) use of basic `async` features to a more specific implementation leveraging exactly the features that were most suited for her needs. At first, Barbara was under a false impression about what async executors do. She had assumed that a multi-threaded executor could automatically move the execution of an async block to a worker thread. When this turned out to wrong, she went to Stackoverflow and learned that async tasks must be explicitly spawned into a thread pool if they are to be executed on a worker thread. This meant that the algorithm to be parallelized became strongly coupled to both the spawner and the executor. Code that used to cleanly express a physics algorithm now had interspersed references to the task spawner, not only making it harder to understand, but also making it impossible to try different execution strategies, since with Tokio the spawner and executor are the same object (the Tokio runtime). Barbara felt that a better design for data parallelism would enable better separation of concerns: a group of interdependent compute tasks, and a strategy to execute them in parallel. +As Niklaus began working on his new design with `tokio`, his use of `async` went from a general (from the textbook) use of basic `async` features to a more specific implementation leveraging exactly the features that were most suited for his needs. At first, Niklaus was under a false impression about what `async` executors do. He had assumed that a multi-threaded executor could automatically move the execution of an `async` block to a worker thread. When this turned out to wrong, he went to Stackoverflow and learned that async tasks must be explicitly spawned into a thread pool if they are to be executed on a worker thread. This meant that the algorithm to be parallelized became strongly coupled to both the spawner and the executor. Code that used to cleanly express a physics algorithm now had interspersed references to the task spawner, not only making it harder to understand, but also making it impossible to try different execution strategies, since with Tokio the spawner and executor are the same object (the Tokio runtime). Niklaus felt that a better design for data parallelism would enable better separation of concerns: a group of interdependent compute tasks, and a strategy to execute them in parallel. -With the move to `async`, Barbara saw an opportunity to solve her second program. Rather than using message passing to coordinate patch computation, she used the `async` API to define dependencies between patches so that a patch would only begin computing its solution when its neighboring patches had completed. She setup a shared data structure to track the solutions for each patch now that messages would not be passing that data. Learning how to properly use shared data with `async` was a new challenge. The initial design: +With the move to `async`, Niklaus saw an opportunity to solve his second program. Rather than using message passing to coordinate patch computation, he used the `async` API to define dependencies between patches so that a patch would only begin computing its solution when its neighboring patches had completed. He setup a shared data structure to track the solutions for each patch now that messages would not be passing that data. Learning how to properly use shared data with `async` was a new challenge. The initial design: ```rust let mut stage_primitive_and_scalar = |index: BlockIndex, state: BlockState, hydro: H, geometry: GridGeometry| { let stage = async move { @@ -32,17 +32,17 @@ With the move to `async`, Barbara saw an opportunity to solve her second program stage_map.insert(index, runtime.spawn(stage).map(|f| f.unwrap()).shared()); }; ``` -lacked performance because she needed to clone the value for every task. So, Barbara switched over to using `Arc` to keep a thread safe RC to the shared data. But this change introduced a lot of `.map` and `.unwrap` function calls, making the code much harder to read. She realized that managing the dependency graph was not intuitive when using `async` for concurrency. +lacked performance because he needed to clone the value for every task. So, Niklaus switched over to using `Arc` to keep a thread safe RC to the shared data. But this change introduced a lot of `.map` and `.unwrap` function calls, making the code much harder to read. He realized that managing the dependency graph was not intuitive when using `async` for concurrency. -A new problem arose during the move to `async`: a steep learning curve with error handling. The initial version of her design used `panic!`s to fail the program if an error was encountered, but the stack traces were almost unreadable. She asked her teammate Grace to migrate over to using the `Result` idiom for error handling and Grace found a major inconvenience. The Rust type inference inconsistently breaks when propagating `Result` in `async` blocks. Grace frequently found that she had to specify the type of the error when creating a result value: +A new problem arose during the move to `async`: a steep learning curve with error handling. The initial version of his design used `panic!`s to fail the program if an error was encountered, but the stack traces were almost unreadable. He asked his teammate Grace to migrate over to using the `Result` idiom for error handling and Grace found a major inconvenience. The Rust type inference inconsistently breaks when propagating `Result` in `async` blocks. Grace frequently found that she had to specify the type of the error when creating a result value: ```rust Ok::<_, HydroError>( ( p.to_shared(), s.to_shared() ) ) ``` And she could not figure out why she had to add the `::<_, HydroError>` to some of the `Result` values. -Finally, once her team began using the new `async` design for their simulations, they noticed an important issue that impacted productivity: compilation time had now increased to between 30 and 60 seconds. The nature of their work requires frequent changes to code and recompilation and 30-60 seconds is long enough to have a noticeable impact on their quality of life. What her and her team want is for compilation to be 2 to 3 seconds. Barbara believes that the use of `async` is a major contributor to the long compilation times. +Finally, once Niklaus' team began using the new `async` design for their simulations, they noticed an important issue that impacted productivity: compilation time had now increased to between 30 and 60 seconds. The nature of their work requires frequent changes to code and recompilation and 30-60 seconds is long enough to have a noticeable impact on their quality of life. What he and his team want is for compilation to be 2 to 3 seconds. Niklaus believes that the use of `async` is a major contributor to the long compilation times. -This new solution works, but Barbara is not satisfied with how complex her code became after the move to `async` and that compilation time is now 30-60 seconds. The state sharing adding a large amount of cruft with `Arc` and `async` is not well suited for using a dependency graph to schedule tasks so implementing this solution created a key component of her program that was difficult to understand and pervasive. Ultimately, her conclusion was that `async` is not appropriate for parallelizing computational tasks. She will be trying a new design based upon Rayon in the next version of her application. +This new solution works, but Niklaus is not satisfied with how complex his code became after the move to `async` and that compilation time is now 30-60 seconds. The state sharing adding a large amount of cruft with `Arc` and `async` is not well suited for using a dependency graph to schedule tasks so implementing this solution created a key component of his program that was difficult to understand and pervasive. Ultimately, his conclusion was that `async` is not appropriate for parallelizing computational tasks. He will be trying a new design based upon Rayon in the next version of her application. ## 🤔 Frequently Asked Questions @@ -55,7 +55,7 @@ This new solution works, but Barbara is not satisfied with how complex her code This story is based on the experience of building the [kilonova](https://github.com/clemson-cal/app-kilonova) hydrodynamics simulation solver. ### **Why did you choose Barbara and Grace to tell this story?** -I chose Barbara as the primary character in this story because this work was driven by someone with experience in Rust specifically but does not have much systems level experience. Grace was chosen as a supporting character because of that persons experience with C/C++ programming and to avoid repeating characters. +I chose Niklaus as the primary character in this story because this work was driven by someone who only uses programming for a small part of their work. Grace was chosen as a supporting character because of that persons experience with C/C++ programming and to avoid repeating characters. ### **How would this story have played out differently for the other characters?** - Alan: there's a good chance he would have already had experience working with either async workflows in another language or doing parallelization of compute bound tasks; and so would already know from experience that `async` was not the right place to start. From 2d9f732f74dfec2ce10a163204e06bd6a08df88a Mon Sep 17 00:00:00 2001 From: Erich Ess Date: Mon, 12 Apr 2021 11:01:28 -0400 Subject: [PATCH 10/12] Improving clarity --- src/vision/status_quo/barbara_simulates_hydrodynamics.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/vision/status_quo/barbara_simulates_hydrodynamics.md b/src/vision/status_quo/barbara_simulates_hydrodynamics.md index 8778815e..c0a3c25d 100644 --- a/src/vision/status_quo/barbara_simulates_hydrodynamics.md +++ b/src/vision/status_quo/barbara_simulates_hydrodynamics.md @@ -17,7 +17,7 @@ His first attempt to was to emulate a common CFD design pattern: using message p This solution worked, but Niklaus had two problems with it. First, it gave him no control over CPU usage so the solution would greedily use all available CPU resources. Second, using messages to communicate solution values between patches did not scale when his team added a new feature (tracer particles) the additional messages caused by this change created so much overhead that parallel processing was no faster than serial. So, Niklaus decided to find a better solution. ### Solution Path -To address the first problem: Niklaus would decouple the work that needed to be done (solving each patch) from the workers (threads) this would allow him to more finely control how many resources were used. So, he began looking for a tool in Rust that would meet this design pattern. When he read about `async` and how it allowed the user to define units of work, called tasks, and send those to an executor which would manage the execution of those tasks across a set of workers, he thought he'd found exactly what he needed. Further reading indicate that `tokio` was the runtime of choice for `async` in the community and, so, he began building a new CFD tool with `async` and `tokio`. +To address the first problem: Niklaus' new design decoupled the work that needed to be done (solving physics equations for each patch in the grid) from the workers (threads), this would allow him to set the number of threads and not use all the CPU resources. So, he began looking for a tool in Rust that would meet this design pattern. When he read about `async` and how it allowed the user to define units of work and send those to an executor which would manage the execution of those tasks across a set of workers, he thought he'd found exactly what he needed. He also thought that the `.await` semantics would give a much better way of coordinating dependencies between patches. Further reading indicated that `tokio` was the runtime of choice for `async` in the community and, so, he began building a new CFD solver with `async` and `tokio`. As Niklaus began working on his new design with `tokio`, his use of `async` went from a general (from the textbook) use of basic `async` features to a more specific implementation leveraging exactly the features that were most suited for his needs. At first, Niklaus was under a false impression about what `async` executors do. He had assumed that a multi-threaded executor could automatically move the execution of an `async` block to a worker thread. When this turned out to wrong, he went to Stackoverflow and learned that async tasks must be explicitly spawned into a thread pool if they are to be executed on a worker thread. This meant that the algorithm to be parallelized became strongly coupled to both the spawner and the executor. Code that used to cleanly express a physics algorithm now had interspersed references to the task spawner, not only making it harder to understand, but also making it impossible to try different execution strategies, since with Tokio the spawner and executor are the same object (the Tokio runtime). Niklaus felt that a better design for data parallelism would enable better separation of concerns: a group of interdependent compute tasks, and a strategy to execute them in parallel. From 968d911c5f89dd9092e294bc52b5415d5e1c1c3f Mon Sep 17 00:00:00 2001 From: Erich Ess Date: Mon, 12 Apr 2021 11:07:32 -0400 Subject: [PATCH 11/12] improving clarity --- src/vision/status_quo/barbara_simulates_hydrodynamics.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/vision/status_quo/barbara_simulates_hydrodynamics.md b/src/vision/status_quo/barbara_simulates_hydrodynamics.md index c0a3c25d..b22e5a09 100644 --- a/src/vision/status_quo/barbara_simulates_hydrodynamics.md +++ b/src/vision/status_quo/barbara_simulates_hydrodynamics.md @@ -19,9 +19,9 @@ This solution worked, but Niklaus had two problems with it. First, it gave him n ### Solution Path To address the first problem: Niklaus' new design decoupled the work that needed to be done (solving physics equations for each patch in the grid) from the workers (threads), this would allow him to set the number of threads and not use all the CPU resources. So, he began looking for a tool in Rust that would meet this design pattern. When he read about `async` and how it allowed the user to define units of work and send those to an executor which would manage the execution of those tasks across a set of workers, he thought he'd found exactly what he needed. He also thought that the `.await` semantics would give a much better way of coordinating dependencies between patches. Further reading indicated that `tokio` was the runtime of choice for `async` in the community and, so, he began building a new CFD solver with `async` and `tokio`. -As Niklaus began working on his new design with `tokio`, his use of `async` went from a general (from the textbook) use of basic `async` features to a more specific implementation leveraging exactly the features that were most suited for his needs. At first, Niklaus was under a false impression about what `async` executors do. He had assumed that a multi-threaded executor could automatically move the execution of an `async` block to a worker thread. When this turned out to wrong, he went to Stackoverflow and learned that async tasks must be explicitly spawned into a thread pool if they are to be executed on a worker thread. This meant that the algorithm to be parallelized became strongly coupled to both the spawner and the executor. Code that used to cleanly express a physics algorithm now had interspersed references to the task spawner, not only making it harder to understand, but also making it impossible to try different execution strategies, since with Tokio the spawner and executor are the same object (the Tokio runtime). Niklaus felt that a better design for data parallelism would enable better separation of concerns: a group of interdependent compute tasks, and a strategy to execute them in parallel. +After making some progress, Niklaus ran into his firts problem. Niklaus had been under a false impression about what `async` executors do. He had assumed that a multi-threaded executor could automatically move the execution of an `async` block to a worker thread. When this turned out to wrong, he went to Stackoverflow and learned that async tasks must be explicitly spawned into a thread pool if they are to be executed on a worker thread. This meant that the algorithm to be parallelized became strongly coupled to both the spawner and the executor. Code that used to cleanly express a physics algorithm now had interspersed references to the task spawner, not only making it harder to understand, but also making it impossible to try different execution strategies, since with Tokio the spawner and executor are the same object (the Tokio runtime). Niklaus felt that a better design for data parallelism would enable better separation of concerns: a group of interdependent compute tasks, and a strategy to execute them in parallel. -With the move to `async`, Niklaus saw an opportunity to solve his second program. Rather than using message passing to coordinate patch computation, he used the `async` API to define dependencies between patches so that a patch would only begin computing its solution when its neighboring patches had completed. He setup a shared data structure to track the solutions for each patch now that messages would not be passing that data. Learning how to properly use shared data with `async` was a new challenge. The initial design: +Niklaus second problem came as he tried to fully replace the message passing from the first design: sharing data between tasks. He used the `async` API to coordinate computation of patches so that a patch would only go to a worker when all its dependencies had completed. But he also needed to account for the solution data which was passed in the messages. He setup a shared data structure to track the solutions for each patch now that messages would not be passing that data. Learning how to properly use shared data with `async` was a new challenge. The initial design: ```rust let mut stage_primitive_and_scalar = |index: BlockIndex, state: BlockState, hydro: H, geometry: GridGeometry| { let stage = async move { @@ -34,7 +34,7 @@ With the move to `async`, Niklaus saw an opportunity to solve his second program ``` lacked performance because he needed to clone the value for every task. So, Niklaus switched over to using `Arc` to keep a thread safe RC to the shared data. But this change introduced a lot of `.map` and `.unwrap` function calls, making the code much harder to read. He realized that managing the dependency graph was not intuitive when using `async` for concurrency. -A new problem arose during the move to `async`: a steep learning curve with error handling. The initial version of his design used `panic!`s to fail the program if an error was encountered, but the stack traces were almost unreadable. He asked his teammate Grace to migrate over to using the `Result` idiom for error handling and Grace found a major inconvenience. The Rust type inference inconsistently breaks when propagating `Result` in `async` blocks. Grace frequently found that she had to specify the type of the error when creating a result value: +As the program matured, a new problem arose: a steep learning curve with error handling. The initial version of his design used `panic!`s to fail the program if an error was encountered, but the stack traces were almost unreadable. He asked his teammate Grace to migrate over to using the `Result` idiom for error handling and Grace found a major inconvenience. The Rust type inference inconsistently breaks when propagating `Result` in `async` blocks. Grace frequently found that she had to specify the type of the error when creating a result value: ```rust Ok::<_, HydroError>( ( p.to_shared(), s.to_shared() ) ) ``` From bfe59f53e645d50c2f505ce199fee9c97b0a1338 Mon Sep 17 00:00:00 2001 From: Erich Ess Date: Mon, 12 Apr 2021 17:42:40 -0400 Subject: [PATCH 12/12] Missed some occurances of Barbara that needed to be renamed to Niklaus --- ..._hydrodynamics.md => niklaus_simulates_hydrodynamics.md} | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) rename src/vision/status_quo/{barbara_simulates_hydrodynamics.md => niklaus_simulates_hydrodynamics.md} (97%) diff --git a/src/vision/status_quo/barbara_simulates_hydrodynamics.md b/src/vision/status_quo/niklaus_simulates_hydrodynamics.md similarity index 97% rename from src/vision/status_quo/barbara_simulates_hydrodynamics.md rename to src/vision/status_quo/niklaus_simulates_hydrodynamics.md index b22e5a09..cfe04a92 100644 --- a/src/vision/status_quo/barbara_simulates_hydrodynamics.md +++ b/src/vision/status_quo/niklaus_simulates_hydrodynamics.md @@ -1,4 +1,4 @@ -# 😱 Status quo stories: Barbara Builds a Hydrodynamics Simulator +# 😱 Status quo stories: Niklaus Builds a Hydrodynamics Simulator ## 🚧 Warning: Draft status 🚧 @@ -54,13 +54,13 @@ This new solution works, but Niklaus is not satisfied with how complex his code ### **What are the sources for this story?** This story is based on the experience of building the [kilonova](https://github.com/clemson-cal/app-kilonova) hydrodynamics simulation solver. -### **Why did you choose Barbara and Grace to tell this story?** +### **Why did you choose Niklaus and Grace to tell this story?** I chose Niklaus as the primary character in this story because this work was driven by someone who only uses programming for a small part of their work. Grace was chosen as a supporting character because of that persons experience with C/C++ programming and to avoid repeating characters. ### **How would this story have played out differently for the other characters?** - Alan: there's a good chance he would have already had experience working with either async workflows in another language or doing parallelization of compute bound tasks; and so would already know from experience that `async` was not the right place to start. - Grace: likewise, might already have experience with problems like this and would know what to look for when searching for tools. -- Niklaus: the experience would probably be the same, as it's very easy to assume that `tokio` is the starting place for concurrency in Rust. +- Barbara: the experience would likely be fairly similar, since the actual subject of this story is quite familiar with Rust by now [character]: ../characters.md [status quo stories]: ./status_quo.md