-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Document the process anti pattern of sending large data (#13194)
Follow up to/extension of #13173
- Loading branch information
Showing
1 changed file
with
52 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -188,6 +188,58 @@ iex> Foo.Bucket.get(bucket, "milk") | |
|
||
This anti-pattern was formerly known as [Agent obsession](https://github.com/lucasvegi/Elixir-Code-Smells/tree/main#agent-obsession). | ||
|
||
## Sending unnecessary data | ||
|
||
#### Problem | ||
|
||
Sending a message to a process can be an expensive operation if the message is big enough. That's because that message will be fully copied to the receiving process, which may be CPU and memory intensive. This is due to Erlang's "share nothing" architecture, where each process has its own memory, which simplifies and speeds up garbage collection. | ||
|
||
This is more obvious when using `send/2`, `GenServer.call/3`, or the initial data in `GenServer.start_link/3`. Notably this also happens when using `spawn/1`, `Task.async/1`, `Task.async_stream/3`, and so on. It is more subtle here as the anonymous function passed to these functions captures the variables it references, and all captured variables will be copied over. By doing this, you can accidentally send way more data to a process than you actually need. | ||
|
||
#### Example | ||
|
||
Imagine you were to implement some simple reporting of IP addresses that made requests against your application. You want to do this asynchronously and not block processing, so you decide to use `spawn/1`. It may seem like a good idea to hand over the whole connection because we might need more data later. However passing the connection results in copying a lot of unnecessary data like the request body, params, etc. | ||
|
||
```elixir | ||
# log_request_ip send the ip to some external service | ||
spawn(fn -> log_request_ip(conn) end) | ||
``` | ||
|
||
This problem also occurs when accessing only the relevant parts: | ||
|
||
```elixir | ||
spawn(fn -> log_request_ip(conn.remote_ip) end) | ||
``` | ||
|
||
This will still copy over all of `conn`, because the `conn` variable is being captured inside the spawned function. The function then extracts the `remote_ip` field, but only after the whole `conn` has been copied over. | ||
|
||
`send/2` and the `GenServer` APIs also rely on message passing. In the example below, the `conn` is once again copied to the underlying `GenServer`: | ||
|
||
```elixir | ||
GenServer.cast(pid, {:report_ip_address, conn}) | ||
``` | ||
|
||
#### Refactoring | ||
|
||
This anti-pattern has many potential remedies: | ||
|
||
* Limit the data you send to the absolute necessary minimum instead of sending an entire struct. For example, don't send an entire `conn` struct if all you need is a couple of fields. | ||
Check failure on line 226 in lib/elixir/pages/anti-patterns/process-anti-patterns.md GitHub Actions / Lint Markdown contentUnordered list indentation
|
||
* If the only process that needs data is the one you are sending to, consider making the process fetch that data instead of passing it. | ||
Check failure on line 227 in lib/elixir/pages/anti-patterns/process-anti-patterns.md GitHub Actions / Lint Markdown contentUnordered list indentation
|
||
* Some abstractions, such as [`:persistent_term`](https://www.erlang.org/doc/man/persistent_term.html), allows you to share data between processes, as long as such data changes infrequently. | ||
Check failure on line 228 in lib/elixir/pages/anti-patterns/process-anti-patterns.md GitHub Actions / Lint Markdown contentUnordered list indentation
|
||
|
||
In our case, limiting the input data is a reasonable strategy. If all we need *right now* is the IP address, then let's only work with that and make sure we're only passing the IP address into the closure, like so: | ||
|
||
```elixir | ||
ip_address = conn.remote_ip | ||
spawn(fn -> log_request_ip(ip_address) end) | ||
``` | ||
|
||
Or in the `GenServer` case: | ||
|
||
```elixir | ||
GenServer.cast(pid, {:report_ip_address, conn.remote_ip}) | ||
``` | ||
|
||
## Unsupervised processes | ||
|
||
#### Problem | ||
|