Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the process anti pattern of sending large data #13194

Merged
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions lib/elixir/pages/anti-patterns/process-anti-patterns.md
Original file line number Diff line number Diff line change
Expand Up @@ -290,3 +290,42 @@ iex> Supervisor.restart_child(App.Supervisor, Counter)
iex> Counter.get(Counter) # After the restart, this process can be used again
0
```

## Sending unnecessary data

#### Problem

Sending a message to a process can be an expensive operation if it is big enough, as messages will be fully copied to the receiving process, which is both CPU and memory intensive. This is due to erlang's "shared nothing" architecture where each process has its own memory, simplifying and speeding up garbage collection.
PragTob marked this conversation as resolved.
Show resolved Hide resolved
That this happens is more obvious when using `send/2`, `GenServer.call/3` or the initial data in `GenServer.start_link/3`, however notably this also happens when using `spawn/1`, `Task.async/1`, `Task.async_stream/3` & friends. It is more subtle here as the anonymous function passed to these captures the variables it references in its closure - meaning that data will also be copied over. Hence, you can accidentally send way more data to a process than you actually need.
PragTob marked this conversation as resolved.
Show resolved Hide resolved

#### Example

To depict the problem let's imagine you were to implement some simple reporting of ip addresses that made requests against your application. You want to do this asynchronously to not block processing, so you decide to use `spawn/1`. It may seem like a good idea to hand over the whole connection ("We might need more data later!"), however it results in copying a lot of unnecessary data (request body, params etc.).
PragTob marked this conversation as resolved.
Show resolved Hide resolved

```elixir
# log_request_ip send the ip to some external service
spawn(fn -> log_request_ip(conn) end)
```

Please note that this problem also occurs if you think you may just be accessing the relevant parts:

```elixir
spawn(fn -> log_request_ip(conn.remote_ip) end)
```

This will still copy over all of `conn`.
PragTob marked this conversation as resolved.
Show resolved Hide resolved

#### Refactoring

This anti-pattern has many potential remedies:

* Limiting the data you send to the absolute necessary minimum, instead of just sending the whole struct. For example, don't send an entire `Plug.Conn` struct if all you need is a couple of fields.
PragTob marked this conversation as resolved.
Show resolved Hide resolved
* If only the process you send the data to needs it, it may fetch the data itself instead.
PragTob marked this conversation as resolved.
Show resolved Hide resolved
* There are some data structures that are shared between processes and hence don't need copying, such as [ets](https://www.erlang.org/doc/man/ets) and [persistent_term](https://www.erlang.org/doc/man/persistent_term.html).
PragTob marked this conversation as resolved.
Show resolved Hide resolved

In our case the first, and most common, strategy is applicable. If all we need _right now_ is the ip address, then let's only work with that and make sure that's all we're passing into the closure:
PragTob marked this conversation as resolved.
Show resolved Hide resolved

```elixir
ip_address = conn.remote_ip
spawn(fn -> log_request_ip(ip_address) end)
```