From 5806d5a3e175e58e1880b2b356fbe2689b735031 Mon Sep 17 00:00:00 2001 From: Tobias Pfeiffer Date: Sun, 17 Dec 2023 11:03:56 +0100 Subject: [PATCH] Document the process anti pattern of sending large data MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Follow up to/extension of #13173 First draft, happy to adjust and extend it in many ways. I had trouble coming up with a simple example as we needed a bunch of data to make sure it's not good. I could have employed the first anti pattern myself and did repeated statistics calculation, but that'd have been worse :sweat-smile: I remembered José's comment around sending along the `conn` and figured it's central enough in elixir to not throw anyone off. If someone has a better example, happy to redo it! Thanks y'all! :green-heart: --- .../anti-patterns/process-anti-patterns.md | 81 +++++++++++++++++++ 1 file changed, 81 insertions(+) diff --git a/lib/elixir/pages/anti-patterns/process-anti-patterns.md b/lib/elixir/pages/anti-patterns/process-anti-patterns.md index 7a7378279bc..2250b578992 100644 --- a/lib/elixir/pages/anti-patterns/process-anti-patterns.md +++ b/lib/elixir/pages/anti-patterns/process-anti-patterns.md @@ -290,3 +290,84 @@ iex> Supervisor.restart_child(App.Supervisor, Counter) iex> Counter.get(Counter) # After the restart, this process can be used again 0 ``` + +## Sending unnecessary between processes + +#### Problem + +Sending a lot of data between processes is not an anti-pattern by itself, it may be necessary. However, it is costly as messages will be fully copied to the receiving process, which is both CPU and memory intensive. This is due to erlang's "shared nothing" architecture where each process has its own memory, simplifying and speeding up garbage collection. +Notably, you don't need to use `send/2` to trigger this anti pattern, the anonymous functions used in `spawn/1` and `Task.async/1` etc. capture the data and trigger the same problem. + +#### Example + +To depict the problem let's imagine you were to implement a simple rate limiter based on the IP of a connection. It may seem like a good idea to hand over the whole connection ("We might need more data later!"), however it results in copying a lot of unnecessary data (request body, params etc.). + +```elixir +defmodule RateLimiter do + use GenServer + + def report_request(conn, pid) do + GenServer.call(pid, {:report_request, conn}) + end + + @impl GenServer + def init(init_arg) do + {:ok, init_arg} + end + + @impl GenServer + def handle_call({:report_request, conn}, _from, state) do + ip = conn.remote_ip + # actual logic irrelevant for example, but involves ip + + {:reply, :ok, state} + end +end +``` + +```elixir +iex(1)> {:ok, pid} = GenServer.start_link(RateLimiter, :init) +{:ok, #PID<0.286.0>} +iex(2)> RateLimiter.report_request(%Plug.Conn{remote_ip: {127,0,0,1}}, pid) +:ok +``` + + +#### Refactoring + +This anti-pattern has many potential remedies: + +* Limiting the data you send to the absolute necessary minimum, instead of just sending the whole struct. For example, don't send an entire `Plug.Conn` struct if all you need is a couple of fields. +* If only the process you send the data to needs it, it may fetch the data itself instead. +* There are some data structures that are shared between processes and hence don't need copying, such as [ets](https://www.erlang.org/doc/man/ets) and [persistent_term](https://www.erlang.org/doc/man/persistent_term.html). + +In our case the first, and most common, strategy is applicable. If all we need _right now_ is the ip address, then let's only work with that. + +```elixir +defmodule RateLimiter do + use GenServer + + def report_request(ip, pid) do + GenServer.call(pid, {:report_request, ip}) + end + + @impl GenServer + def init(init_arg) do + {:ok, init_arg} + end + + @impl GenServer + def handle_call({:report_request, ip}, _from, state) do + # actual logic irrelevant for example, but involves ip + + {:reply, :ok, state} + end +end +``` + +```elixir +iex(1)> {:ok, pid} = GenServer.start_link(RateLimiter, :init) +{:ok, #PID<0.286.0>} +iex(2)> RateLimiter.report_request({127,0,0,1}, pid) +:ok +```