Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mention dangers around Task and sending a lot of data along #13173

Merged
merged 2 commits into from Dec 12, 2023

Conversation

PragTob
Copy link
Contributor

@PragTob PragTob commented Dec 12, 2023

First, as always, thanks to y'all for elixir and all your great work on it 💚 I'm truly happy to be a part of this community where discussion & contributions are always welcome, where we (well, more y'all) push the boundaries of what was previously thought possible and somehow it all happens without breaking things all the time 🚀

The Task module is one of the coolest modules in elixir and is probably the first contact and experience of a lot of beginners with parallelism in elixir. I hence find it worthwhile to warn about the memory copying and its impacts here as it might easily lead to unwelcome results, so it's worth pointing out.

This is only a first draft, not sure if this is the best place to put it - feedback very welcome.

Honestly Task.async/Task.await may be too magical/too much of a nice high level magical abstraction. I'm somewhat ashamed to admit that while I knew that sending a message would copy the data I never made the connection to Task.async myself and I had an "oh shit" moment just yesterday :sweat-smile: Which, totally should have occurred to me before as I knew they were processes and wouldn't magically change the rules on how processes work. I also asked some elixir friends and saw similiar reactions in some of them.

I think it's also fair to assume that the BEAM wouldn't need to copy immutable data (cos it's immutable anyhow), I know why it does but it's not obvious. We might want to/need to work on making it more obvious in other parts of the documentation as well but I thought this was the best starting point.


To show my own pile of shame where I'm coming from with this: Benchee works mostly on a big Suite struct that over the different stages acummulates more data. It's fine, works well with piping. However, it also holds the benchmarking function and the inputs used. Which means if you benchmark with a ton of data and Benchee goes off to calculate the statistics in parallel or run the formatters in parallel memory consumption explodes and it also takes forever. That all was a big 🤦‍♂️ moment that I'm honestly a bit ashamed about 😅 I want to help others to avoid making the same mistake (will also probably blog soon).

Fixes in benchee should be easy - be more selective about the data to put into tasks/not just pass the whole Suite along (so no need to reach for ETS/persistent terms). Fix version hopefully dropping this week :)

Just, as background 🙈 🙈 🙈

Again, happy to have feedback and ideas about other/better places to put this or also word it better et. al.

Go have a bunny picture.

Thanks! 💚

IMG20230429191639

The `Task` module is one of the coolest modules in elixir and
is probably the first contact and experience of a lot of
beginners with parallelism in elixir. I hence find it worthwhile
to warn about the memory copying and its impacts here as it might
easily lead to unwelcome results, so it's worth pointing out.

This is only a first draft, not sure if this is the best place
to put it - feedback very welcome.

Honestly `Task.async`/`Task.await` may be too magical/too much of
a nice high level magical abstraction. I'm somewhat ashamed to
admit that while I knew that sending a message would copy the
data I never made the connection to `Task.async` myself and I
had an "oh shit" moment just yesterday :sweat-smile: Which,
totally should have occurred to me before as I knew they were
processes and wouldn't magically change the rules on how
processes work. I also asked some elixir friends and saw similiar
reactions in some of them.

I think it's also fair to assume that the BEAM wouldn't need to
copy immutable data (cos it's immutable anyhow), I know why it
does but it's not obvious. We might want to/need to work on making
it more obvious in other parts of the documentation as well but
I thought this was the best starting point.
@PragTob PragTob changed the title Point out dangers around Task and sending a lot of data along Mention dangers around Task and sending a lot of data along Dec 12, 2023
@josevalim josevalim merged commit 0fdb0f8 into elixir-lang:main Dec 12, 2023
8 checks passed
@josevalim
Copy link
Member

💚 💙 💜 💛 ❤️

@josevalim
Copy link
Member

Good call. And no worries, it happens from time to time. For example, folks referencing their whole conn inside a Task only to access a certain field. This would actually be good as an anti-pattern related to processes in our guides, in case anyone wants to send a PR :)

josevalim pushed a commit that referenced this pull request Dec 12, 2023
The `Task` module is one of the coolest modules in elixir and
is probably the first contact and experience of a lot of
beginners with parallelism in elixir. I hence find it worthwhile
to warn about the memory copying and its impacts here as it might
easily lead to unwelcome results, so it's worth pointing out.
@PragTob PragTob deleted the task-async-and-data branch December 12, 2023 17:55
PragTob added a commit to PragTob/elixir that referenced this pull request Dec 17, 2023
Follow up to/extension of elixir-lang#13173

First draft, happy to adjust and extend it in many ways. I had
trouble coming up with a simple example as we needed a bunch of
data to make sure it's not good. I could have employed the first
anti pattern myself and did repeated statistics calculation, but
that'd have been worse :sweat-smile:

I remembered José's comment around sending along the `conn` and
figured it's central enough in elixir to not throw anyone off.

If someone has a better example, happy to redo it!

Thanks y'all! :green-heart:
PragTob added a commit to PragTob/elixir that referenced this pull request Dec 17, 2023
Follow up to/extension of elixir-lang#13173

First draft, happy to adjust and extend it in many ways. I had
trouble coming up with a simple example as we needed a bunch of
data to make sure it's not good. I could have employed the first
anti pattern myself and did repeated statistics calculation, but
that'd have been worse :sweat-smile:

I remembered José's comment around sending along the `conn` and
figured it's central enough in elixir to not throw anyone off.

If someone has a better example, happy to redo it!

Thanks y'all! :green-heart:
josevalim pushed a commit that referenced this pull request Dec 21, 2023
josevalim pushed a commit that referenced this pull request Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants