New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mention dangers around Task
and sending a lot of data along
#13173
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The `Task` module is one of the coolest modules in elixir and is probably the first contact and experience of a lot of beginners with parallelism in elixir. I hence find it worthwhile to warn about the memory copying and its impacts here as it might easily lead to unwelcome results, so it's worth pointing out. This is only a first draft, not sure if this is the best place to put it - feedback very welcome. Honestly `Task.async`/`Task.await` may be too magical/too much of a nice high level magical abstraction. I'm somewhat ashamed to admit that while I knew that sending a message would copy the data I never made the connection to `Task.async` myself and I had an "oh shit" moment just yesterday :sweat-smile: Which, totally should have occurred to me before as I knew they were processes and wouldn't magically change the rules on how processes work. I also asked some elixir friends and saw similiar reactions in some of them. I think it's also fair to assume that the BEAM wouldn't need to copy immutable data (cos it's immutable anyhow), I know why it does but it's not obvious. We might want to/need to work on making it more obvious in other parts of the documentation as well but I thought this was the best starting point.
PragTob
changed the title
Point out dangers around
Mention dangers around Dec 12, 2023
Task
and sending a lot of data alongTask
and sending a lot of data along
💚 💙 💜 💛 ❤️ |
Good call. And no worries, it happens from time to time. For example, folks referencing their whole |
josevalim
pushed a commit
that referenced
this pull request
Dec 12, 2023
The `Task` module is one of the coolest modules in elixir and is probably the first contact and experience of a lot of beginners with parallelism in elixir. I hence find it worthwhile to warn about the memory copying and its impacts here as it might easily lead to unwelcome results, so it's worth pointing out.
PragTob
added a commit
to PragTob/elixir
that referenced
this pull request
Dec 17, 2023
Follow up to/extension of elixir-lang#13173 First draft, happy to adjust and extend it in many ways. I had trouble coming up with a simple example as we needed a bunch of data to make sure it's not good. I could have employed the first anti pattern myself and did repeated statistics calculation, but that'd have been worse :sweat-smile: I remembered José's comment around sending along the `conn` and figured it's central enough in elixir to not throw anyone off. If someone has a better example, happy to redo it! Thanks y'all! :green-heart:
PragTob
added a commit
to PragTob/elixir
that referenced
this pull request
Dec 17, 2023
Follow up to/extension of elixir-lang#13173 First draft, happy to adjust and extend it in many ways. I had trouble coming up with a simple example as we needed a bunch of data to make sure it's not good. I could have employed the first anti pattern myself and did repeated statistics calculation, but that'd have been worse :sweat-smile: I remembered José's comment around sending along the `conn` and figured it's central enough in elixir to not throw anyone off. If someone has a better example, happy to redo it! Thanks y'all! :green-heart:
josevalim
pushed a commit
that referenced
this pull request
Dec 21, 2023
Follow up to/extension of #13173
josevalim
pushed a commit
that referenced
this pull request
Dec 21, 2023
Follow up to/extension of #13173
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
First, as always, thanks to y'all for elixir and all your great work on it 💚 I'm truly happy to be a part of this community where discussion & contributions are always welcome, where we (well, more y'all) push the boundaries of what was previously thought possible and somehow it all happens without breaking things all the time 🚀
The
Task
module is one of the coolest modules in elixir and is probably the first contact and experience of a lot of beginners with parallelism in elixir. I hence find it worthwhile to warn about the memory copying and its impacts here as it might easily lead to unwelcome results, so it's worth pointing out.This is only a first draft, not sure if this is the best place to put it - feedback very welcome.
Honestly
Task.async
/Task.await
may be too magical/too much of a nice high level magical abstraction. I'm somewhat ashamed to admit that while I knew that sending a message would copy the data I never made the connection toTask.async
myself and I had an "oh shit" moment just yesterday :sweat-smile: Which, totally should have occurred to me before as I knew they were processes and wouldn't magically change the rules on how processes work. I also asked some elixir friends and saw similiar reactions in some of them.I think it's also fair to assume that the BEAM wouldn't need to copy immutable data (cos it's immutable anyhow), I know why it does but it's not obvious. We might want to/need to work on making it more obvious in other parts of the documentation as well but I thought this was the best starting point.
To show my own pile of shame where I'm coming from with this: Benchee works mostly on a big
Suite
struct that over the different stages acummulates more data. It's fine, works well with piping. However, it also holds the benchmarking function and the inputs used. Which means if you benchmark with a ton of data and Benchee goes off to calculate the statistics in parallel or run the formatters in parallel memory consumption explodes and it also takes forever. That all was a big 🤦♂️ moment that I'm honestly a bit ashamed about 😅 I want to help others to avoid making the same mistake (will also probably blog soon).Fixes in benchee should be easy - be more selective about the data to put into tasks/not just pass the whole Suite along (so no need to reach for ETS/persistent terms). Fix version hopefully dropping this week :)
Just, as background 🙈 🙈 🙈
Again, happy to have feedback and ideas about other/better places to put this or also word it better et. al.
Go have a bunny picture.
Thanks! 💚