Skip to content

Commit

Permalink
Document the process anti pattern of sending large data
Browse files Browse the repository at this point in the history
Follow up to/extension of #13173

First draft, happy to adjust and extend it in many ways. I had
trouble coming up with a simple example as we needed a bunch of
data to make sure it's not good. I could have employed the first
anti pattern myself and did repeated statistics calculation, but
that'd have been worse :sweat-smile:

I remembered José's comment around sending along the `conn` and
figured it's central enough in elixir to not throw anyone off.

If someone has a better example, happy to redo it!

Thanks y'all! :green-heart:
  • Loading branch information
PragTob committed Dec 17, 2023
1 parent 80723f5 commit 1e22915
Showing 1 changed file with 80 additions and 0 deletions.
80 changes: 80 additions & 0 deletions lib/elixir/pages/anti-patterns/process-anti-patterns.md
Original file line number Diff line number Diff line change
Expand Up @@ -290,3 +290,83 @@ iex> Supervisor.restart_child(App.Supervisor, Counter)
iex> Counter.get(Counter) # After the restart, this process can be used again
0
```

## Sending unnecessary between processes

#### Problem

Sending a lot of data between processes is not an anti-pattern by itself, it may be necessary. However, it is costly as messages will be fully copied to the receiving process, which is both CPU and memory intensive. This is due to erlang's "shared nothing" architecture where each process has its own memory, simplifying and speeding up garbage collection.
Notably, you don't need to use `send/2` to trigger this anti pattern, the anonymous functions used in `spawn/1` and `Task.async/1` etc. capture the data and trigger the same problem.

#### Example

To depict the problem let's imagine you were to implement a simple rate limiter based on the IP of a connection. It may seem like a good idea to hand over the whole connection ("We might need more data later!"), however it results in copying a lot of unnecessary data (request body, params etc.).

```elixir
defmodule RateLimiter do
use GenServer

def report_request(conn, pid) do
GenServer.call(pid, {:report_request, conn})
end

@impl GenServer
def init(init_arg) do
{:ok, init_arg}
end

@impl GenServer
def handle_call({:report_request, conn}, _from, state) do
ip = conn.remote_ip
# actual logic irrelevant for example, but involves ip

{:reply, :ok, state}
end
end
```

```elixir
iex(1)> {:ok, pid} = GenServer.start_link(RateLimiter, :init)
{:ok, #PID<0.286.0>}
iex(2)> RateLimiter.report_request(%Plug.Conn{remote_ip: {127,0,0,1}}, pid)
:ok
```

#### Refactoring

This anti-pattern has many potential remedies:

* Limiting the data you send to the absolute necessary minimum, instead of just sending the whole struct. For example, don't send an entire `Plug.Conn` struct if all you need is a couple of fields.
* If only the process you send the data to needs it, it may fetch the data itself instead.
* There are some data structures that are shared between processes and hence don't need copying, such as [ets](https://www.erlang.org/doc/man/ets) and [persistent_term](https://www.erlang.org/doc/man/persistent_term.html).

In our case the first, and most common, strategy is applicable. If all we need _right now_ is the ip address, then let's only work with that.

```elixir
defmodule RateLimiter do
use GenServer

def report_request(ip, pid) do
GenServer.call(pid, {:report_request, ip})
end

@impl GenServer
def init(init_arg) do
{:ok, init_arg}
end

@impl GenServer
def handle_call({:report_request, ip}, _from, state) do
# actual logic irrelevant for example, but involves ip

{:reply, :ok, state}
end
end
```

```elixir
iex(1)> {:ok, pid} = GenServer.start_link(RateLimiter, :init)
{:ok, #PID<0.286.0>}
iex(2)> RateLimiter.report_request({127,0,0,1}, pid)
:ok
```

0 comments on commit 1e22915

Please sign in to comment.