Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the process anti pattern of sending large data #13194

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions lib/elixir/pages/anti-patterns/process-anti-patterns.md
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,58 @@

This anti-pattern was formerly known as [Agent obsession](https://github.com/lucasvegi/Elixir-Code-Smells/tree/main#agent-obsession).

## Sending unnecessary data

#### Problem

Sending a message to a process can be an expensive operation if the message is big enough. That's because that message will be fully copied to the receiving process, which may be CPU and memory intensive. This is due to Erlang's "share nothing" architecture, where each process has its own memory, which simplifies and speeds up garbage collection.

This is more obvious when using `send/2`, `GenServer.call/3`, or the initial data in `GenServer.start_link/3`. Notably this also happens when using `spawn/1`, `Task.async/1`, `Task.async_stream/3`, and so on. It is more subtle here as the anonymous function passed to these functions captures the variables it references, and all captured variables will be copied over. By doing this, you can accidentally send way more data to a process than you actually need.

#### Example

Imagine you were to implement some simple reporting of IP addresses that made requests against your application. You want to do this asynchronously and not block processing, so you decide to use `spawn/1`. It may seem like a good idea to hand over the whole connection because we might need more data later. However passing the connection results in copying a lot of unnecessary data like the request body, params, etc.

```elixir
# log_request_ip send the ip to some external service
spawn(fn -> log_request_ip(conn) end)
```

This problem also occurs when accessing only the relevant parts:

```elixir
spawn(fn -> log_request_ip(conn.remote_ip) end)
```

This will still copy over all of `conn`, because the `conn` variable is being captured inside the spawned function. The function then extracts the `remote_ip` field, but only after the whole `conn` has been copied over.

`send/2` and the `GenServer` APIs also rely on message passing. In the example below, the `conn` is once again copied to the underlying `GenServer`:

```elixir
GenServer.cast(pid, {:report_ip_address, conn})
```

#### Refactoring

This anti-pattern has many potential remedies:

* Limit the data you send to the absolute necessary minimum instead of sending an entire struct. For example, don't send an entire `conn` struct if all you need is a couple of fields.

Check failure on line 226 in lib/elixir/pages/anti-patterns/process-anti-patterns.md

View workflow job for this annotation

GitHub Actions / Lint Markdown content

Unordered list indentation

lib/elixir/pages/anti-patterns/process-anti-patterns.md:226:1 MD007/ul-indent Unordered list indentation [Expected: 2; Actual: 0] https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md007.md
* If the only process that needs data is the one you are sending to, consider making the process fetch that data instead of passing it.

Check failure on line 227 in lib/elixir/pages/anti-patterns/process-anti-patterns.md

View workflow job for this annotation

GitHub Actions / Lint Markdown content

Unordered list indentation

lib/elixir/pages/anti-patterns/process-anti-patterns.md:227:1 MD007/ul-indent Unordered list indentation [Expected: 2; Actual: 0] https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md007.md
* Some abstractions, such as [`:persistent_term`](https://www.erlang.org/doc/man/persistent_term.html), allows you to share data between processes, as long as such data changes infrequently.

Check failure on line 228 in lib/elixir/pages/anti-patterns/process-anti-patterns.md

View workflow job for this annotation

GitHub Actions / Lint Markdown content

Unordered list indentation

lib/elixir/pages/anti-patterns/process-anti-patterns.md:228:1 MD007/ul-indent Unordered list indentation [Expected: 2; Actual: 0] https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md007.md

In our case, limiting the input data is a reasonable strategy. If all we need *right now* is the IP address, then let's only work with that and make sure we're only passing the IP address into the closure, like so:

```elixir
ip_address = conn.remote_ip
spawn(fn -> log_request_ip(ip_address) end)
```

Or in the `GenServer` case:

```elixir
GenServer.cast(pid, {:report_ip_address, conn.remote_ip})
```

## Unsupervised processes

#### Problem
Expand Down
Loading