Improve Keyword performance#15378
Conversation
Benchmark# > Disclosure: I found, benchmarked, and refactored the performance improvements here with Claude Opus 4.7 but I understand the proposed changes, benchmarks, and results. I wrote the description below myself.
# I refactored some of the functions in `Keyword` and achieved roughly the following speed-up over the baseline. `n` is the number of items in the input list(s). Time is the mean iteration time of the optimized code. The values at `n=10` seem slower than the baseline, but given their nanosecond execution times, variations here are largely due to noise (deviations for `n=10` were typically 50-300ns)
# | Function | n=10 | n=100 | n=1000 |
# |---|---:|---:|---:|
# | `new/2` | 1.10x · 441 ns | 1.10x · 6.32 µs | **3.70x** · 83 µs |
# | `merge/2` | 0.83x · 253 ns | 1.49x · 4.96 µs | **6.88x** · 95 µs |
# | `merge/3` | 0.64x · 483 ns | 2.91x · 9.83 µs | **14.17x** · 230 µs |
# | `pop/3` | 1.40x · 41 ns | **1.65x** · 548 ns | **1.56x** · 5.49 µs |
# | `take/2` | 1.04x · 101 ns | 1.04x · 3.35 µs | **7.89x** · 47.1 µs |
# | `drop/2` | 0.88x · 135 ns | 1.07x · 3.48 µs | **7.05x** · 53.1 µs |
# | `split/2` | 1.22x · 137 ns | 1.05x · 3.50 µs | **6.85x** · 55.4 µs
# An overview of the changes I've applied to the functions:
# ### `new/2`
# The original code would call `put_new/3` on every key-value pair which runs `:lists.keyfind(key, 1, acc)` under the hood to check whether a key is already present. As the accumulator grows, the `:lists.keyfind/3` would walk the entire accumulator for every item.
# The new code builds a `seen` map of keys and pattern-matches against new keys to check for duplicates. I also considered using `Map.get(seen, k)` here but my assumption was that the pattern-match found be slightly faster.
# ### `merge/2`
# The original code would first call `keyword?(keywords2)` which walks the entire `keywords2` list to confirm it's a keyword list. It then called `has_key?(keywords2, key)` on every key-value pair which calls `:lists.keymember(key, 1, keywords)` under the hood, again walking the entire accumulator for every new item.
# The new code first collects all keys in `keywords2` and validates that the keyword list is valid. It then uses `Map.has_key?(keys2, key)` on the output map of keys to check for duplicates.
# ### `merge/3`
# The original code would walk `keywords2` only once, but would walk `keywords1` three times for every entry. First, it would walk it as `original` with `:lists.keyfind(key, 1, original)` to check for duplication. If a duplication exists, it would walk it again with `:lists.keydelete(key, 1, original)` and once more with `delete(rest, key)`. The `:lists.keydelete/3` would remove only the first occurrence of the duplicate key from `original` but remove **all** duplicates from `rest`. If a `keywords1` had two occurrences of the same key, the original code would call `:lists.keydelete(key, 1, original)` again but it would be noop for `delete(rest, key)`.
# The new code does three linear passes. First, it walks `keywords2` to collect all keys into a map with an empty list for each key. It also validates the keys. Second, `partition_left/4` walks `keywords1` once. It groups the values of keys occurring in `keywords2`, builds an array of keys that don't occur in `keywords2`, and tracks the duplicate keys in `keywords1`. If `keywords1` has any duplicate keys, it reverses the values of the duplicate keys to make sure their values are back in order.
# Third, `emit_right` walks `keywords2` once more and joins the values of `keywords2` of keys that also appeared in `keywords1` left-to-right. It also adds the key-value pairs that didn't have a duplicate key in `keywords1`. Lastly, we concatenate the key-value pair from `keywords1` whose key did not occur in `keywords2` with the values from `keywords2` that did or did not have duplicate keys in `kewords1`.
# ----
### `split/2`
# Comparison bench for Keyword findings.
#
# Usage:
# elixir .private/bench_keyword_compare.exs # all findings
# elixir .private/bench_keyword_compare.exs merge2 # just one
#
# Findings: merge2, merge3, new2, pop, replace_miss, replace_hit, take, drop, split
#
# `Baseline` holds the current upstream implementations verbatim. `Optimised`
# holds the proposed alternatives. Function names and arities match upstream
# exactly, so each scenario calls e.g. Baseline.merge/2 vs Optimised.merge/2.
Mix.install([{:benchee, "~> 1.3"}])
defmodule Baseline do
# ---- F064-001: merge/2 -------------------------------------------------
def merge(keywords1, []) when is_list(keywords1), do: keywords1
def merge([], keywords2) when is_list(keywords2), do: keywords2
def merge(keywords1, keywords2) when is_list(keywords1) and is_list(keywords2) do
if keyword?(keywords2) do
fun = fn
{key, _value} when is_atom(key) ->
not has_key?(keywords2, key)
_ ->
raise ArgumentError,
"expected a keyword list as the first argument, got: #{inspect(keywords1)}"
end
:lists.filter(fun, keywords1) ++ keywords2
else
raise ArgumentError,
"expected a keyword list as the second argument, got: #{inspect(keywords2)}"
end
end
# ---- F064-004: merge/3 -------------------------------------------------
def merge(keywords1, keywords2, fun)
when is_list(keywords1) and is_list(keywords2) and is_function(fun, 3) do
if keyword?(keywords1) do
do_merge(keywords2, [], keywords1, keywords1, fun, keywords2)
else
raise ArgumentError,
"expected a keyword list as the first argument, got: #{inspect(keywords1)}"
end
end
defp do_merge([{key, value2} | tail], acc, rest, original, fun, keywords2) when is_atom(key) do
case :lists.keyfind(key, 1, original) do
{^key, value1} ->
acc = [{key, fun.(key, value1, value2)} | acc]
original = :lists.keydelete(key, 1, original)
do_merge(tail, acc, delete(rest, key), original, fun, keywords2)
false ->
do_merge(tail, [{key, value2} | acc], rest, original, fun, keywords2)
end
end
defp do_merge([], acc, rest, _original, _fun, _keywords2) do
rest ++ :lists.reverse(acc)
end
defp do_merge(_other, _acc, _rest, _original, _fun, keywords2) do
raise ArgumentError,
"expected a keyword list as the second argument, got: #{inspect(keywords2)}"
end
# ---- F064-002: new/2 ---------------------------------------------------
def new(pairs, transform) when is_function(transform, 1) do
fun = fn el, acc ->
{k, v} = transform.(el)
put_new(acc, k, v)
end
:lists.foldl(fun, [], Enum.reverse(pairs))
end
# ---- F064-003: pop -----------------------------------------------------
def pop(keywords, key, default \\ nil) when is_list(keywords) and is_atom(key) do
case fetch(keywords, key) do
{:ok, value} -> {value, delete(keywords, key)}
:error -> {default, keywords}
end
end
# ---- F064-005: replace -------------------------------------------------
def replace(keywords, key, value) when is_list(keywords) and is_atom(key) do
do_replace(keywords, key, value)
end
defp do_replace([{key, _} | keywords], key, value) do
[{key, value} | delete(keywords, key)]
end
defp do_replace([{_, _} = e | keywords], key, value) do
[e | do_replace(keywords, key, value)]
end
defp do_replace([], _key, _value) do
[]
end
# ---- F064-006: take / drop / split ------------------------------------
def take(keywords, keys) when is_list(keywords) and is_list(keys) do
:lists.filter(fn {k, _} -> :lists.member(k, keys) end, keywords)
end
def drop(keywords, keys) when is_list(keywords) and is_list(keys) do
:lists.filter(fn {k, _} -> k not in keys end, keywords)
end
def split(keywords, keys) when is_list(keywords) and is_list(keys) do
fun = fn {k, v}, {take, drop} ->
case k in keys do
true -> {[{k, v} | take], drop}
false -> {take, [{k, v} | drop]}
end
end
acc = {[], []}
{take, drop} = :lists.foldl(fun, acc, keywords)
{:lists.reverse(take), :lists.reverse(drop)}
end
# ---- Internals reused by Baseline functions ---------------------------
def keyword?([{key, _value} | rest]) when is_atom(key), do: keyword?(rest)
def keyword?([]), do: true
def keyword?(_other), do: false
def has_key?(keywords, key) when is_list(keywords) and is_atom(key) do
:lists.keymember(key, 1, keywords)
end
def fetch(keywords, key) when is_list(keywords) and is_atom(key) do
case :lists.keyfind(key, 1, keywords) do
{^key, value} -> {:ok, value}
false -> :error
end
end
def delete(keywords, key) when is_list(keywords) and is_atom(key) do
case :lists.keymember(key, 1, keywords) do
true -> delete_key(keywords, key)
_ -> keywords
end
end
defp delete_key([{key, _} | tail], key), do: delete_key(tail, key)
defp delete_key([{_, _} = pair | tail], key), do: [pair | delete_key(tail, key)]
defp delete_key([], _key), do: []
def put_new(keywords, key, value) when is_list(keywords) and is_atom(key) do
case :lists.keyfind(key, 1, keywords) do
{^key, _} -> keywords
false -> [{key, value} | keywords]
end
end
end
defmodule Optimised do
# F064-001 — merge/2: collapse O(n·m) to O(n+m) using a key-set of keywords2.
def merge(keywords1, []) when is_list(keywords1), do: keywords1
def merge([], keywords2) when is_list(keywords2), do: keywords2
def merge(keywords1, keywords2) when is_list(keywords1) and is_list(keywords2) do
keys2 = keys_of!(keywords2)
fun = fn
{k, _} when is_atom(k) -> not Map.has_key?(keys2, k)
_ -> raise_first_arg!(keywords1)
end
:lists.filter(fun, keywords1) ++ keywords2
end
# F064-004 — merge/3: matches the current keyword.ex implementation
# (MapSet-tracked duplicate keys, targeted reverse, three linear passes).
def merge(keywords1, keywords2, fun)
when is_list(keywords1) and is_list(keywords2) and is_function(fun, 3) do
if not keyword?(keywords1), do: raise_first_arg!(keywords1)
keys2 = keys_of!(keywords2)
{non_matching_rev, keys2, duplicate_keys} =
partition_left_merge3(keywords1, [], keys2, MapSet.new())
keys2 =
Enum.reduce(duplicate_keys, keys2, fn key, acc ->
Map.update!(acc, key, &:lists.reverse/1)
end)
emitted_rev = emit_right_merge3(keywords2, [], keys2, fun)
:lists.reverse(non_matching_rev) ++ :lists.reverse(emitted_rev)
end
defp partition_left_merge3([{key, value} | rest], non_matching, keys2, duplicate_keys) do
case keys2 do
%{^key => []} ->
partition_left_merge3(rest, non_matching, Map.put(keys2, key, [value]), duplicate_keys)
%{^key => current} ->
partition_left_merge3(
rest,
non_matching,
Map.put(keys2, key, [value | current]),
MapSet.put(duplicate_keys, key)
)
_ ->
partition_left_merge3(rest, [{key, value} | non_matching], keys2, duplicate_keys)
end
end
defp partition_left_merge3([], non_matching, keys2, duplicate_keys),
do: {non_matching, keys2, duplicate_keys}
defp emit_right_merge3([{key, value2} | rest], emitted, keys2, fun) do
case keys2 do
%{^key => [value1 | remaining]} ->
emit_right_merge3(
rest,
[{key, fun.(key, value1, value2)} | emitted],
Map.put(keys2, key, remaining),
fun
)
_ ->
emit_right_merge3(rest, [{key, value2} | emitted], keys2, fun)
end
end
defp emit_right_merge3([], emitted, _keys2, _fun), do: emitted
# F064-002 — new/2: O(n) backward traversal with a seen-keys map. Upstream's
# put_new walks the growing acc via :lists.keyfind, giving O(n·unique).
def new(pairs, transform) when is_function(transform, 1) do
{result, _seen} =
:lists.foldl(
fn el, {acc, seen} ->
{k, v} = transform.(el)
case seen do
%{^k => _} -> {acc, seen}
_ -> {[{k, v} | acc], Map.put(seen, k, [])}
end
end,
{[], %{}},
Enum.reverse(pairs)
)
result
end
# F064-003 — pop/3: single-pass extract. Baseline fetches then deletes, doing
# 3 traversals on hit; here we walk once to find the value, then once over
# the tail to strip any remaining duplicates.
def pop(keywords, key, default \\ nil) when is_list(keywords) and is_atom(key) do
do_pop(keywords, key, default, [])
end
defp do_pop([{key, value} | tail], key, _default, acc),
do: {value, :lists.reverse(acc, delete_key(tail, key))}
defp do_pop([{_, _} = pair | tail], key, default, acc),
do: do_pop(tail, key, default, [pair | acc])
defp do_pop([], _key, default, acc), do: {default, :lists.reverse(acc)}
# F064-005 — replace/3: :lists.keymember is a C BIF (~7-10x faster than BEAM
# walking), so detect miss via BIF first and return the input unchanged with
# zero allocation. On hit we re-walk to rebuild — cheap relative to the
# baseline's full spine rebuild on every call.
def replace(keywords, key, value) when is_list(keywords) and is_atom(key) do
if :lists.keymember(key, 1, keywords) do
do_replace(keywords, key, value)
else
keywords
end
end
defp do_replace([{key, _} | tail], key, value),
do: [{key, value} | delete(tail, key)]
defp do_replace([{_, _} = pair | tail], key, value),
do: [pair | do_replace(tail, key, value)]
defp do_replace([], _key, _value), do: []
# F064-006 — take/drop/split: build a key-set for O(1) lookups. For ≤ 5 keys
# the map-build cost exceeds the savings, so we fall back to `:lists.member`
# — detected by matching the 6th cell without walking the list.
def take(keywords, keys) when is_list(keywords) and is_list(keys) do
:lists.filter(in_keys_pred(keys), keywords)
end
def drop(keywords, keys) when is_list(keywords) and is_list(keys) do
pred = in_keys_pred(keys)
:lists.filter(fn pair -> not pred.(pair) end, keywords)
end
def split(keywords, keys) when is_list(keywords) and is_list(keys) do
pred = in_keys_pred(keys)
{take, drop} =
:lists.foldl(
fn pair, {take, drop} ->
if pred.(pair), do: {[pair | take], drop}, else: {take, [pair | drop]}
end,
{[], []},
keywords
)
{:lists.reverse(take), :lists.reverse(drop)}
end
defp in_keys_pred([_, _, _, _, _, _ | _] = keys) do
set = :lists.foldl(fn k, acc -> Map.put(acc, k, []) end, %{}, keys)
fn {k, _} -> Map.has_key?(set, k) end
end
defp in_keys_pred(keys), do: fn {k, _} -> :lists.member(k, keys) end
# --- Shared helpers ---
# Build a {key => []} lookup map from a keyword list. Raises with the full
# original list if it isn't a keyword list — matching upstream's error shape.
defp keys_of!(keywords), do: do_keys_of(keywords, %{}, keywords)
defp do_keys_of([{k, _} | rest], acc, orig) when is_atom(k),
do: do_keys_of(rest, Map.put(acc, k, []), orig)
defp do_keys_of([], acc, _orig), do: acc
defp do_keys_of(_other, _acc, orig) do
raise ArgumentError,
"expected a keyword list as the second argument, got: #{inspect(orig)}"
end
defp delete(keywords, key) do
if :lists.keymember(key, 1, keywords), do: delete_key(keywords, key), else: keywords
end
defp delete_key([{key, _} | tail], key), do: delete_key(tail, key)
defp delete_key([{_, _} = pair | tail], key), do: [pair | delete_key(tail, key)]
defp delete_key([], _key), do: []
defp keyword?([{k, _} | rest]) when is_atom(k), do: keyword?(rest)
defp keyword?([]), do: true
defp keyword?(_), do: false
defp raise_first_arg!(kws) do
raise ArgumentError,
"expected a keyword list as the first argument, got: #{inspect(kws)}"
end
end
# ---------------------------------------------------------------------------
# Inputs and scenarios.
# ---------------------------------------------------------------------------
all_findings = ~w(new2 merge2 merge3 pop take drop split)
findings =
case System.argv() do
[] ->
all_findings
["all"] ->
all_findings
[one] ->
[one]
other ->
raise "Usage: elixir bench_keyword_compare.exs [all | #{Enum.join(all_findings, " | ")}]\nGot: #{inspect(other)}"
end
# All scenarios share the same percentage dimension. The meaning of "pct" is
# function-specific (see the bench comments inside each case-branch).
pcts = [0, 25, 50]
# Build a merge input: keywords1 has size n with unique keys :k_0.. :k_(n-1).
# keywords2 has size n; the first `pct%` of its entries reuse :k_i keys
# (overlap), the rest use :x_i keys (no overlap).
merge_input = fn n, pct ->
k1 = for i <- 0..(n - 1), do: {String.to_atom("k_#{i}"), i}
overlap_count = div(n * pct, 100)
k2 =
for i <- 0..(n - 1) do
key =
if i < overlap_count do
String.to_atom("k_#{i}")
else
String.to_atom("x_#{i}")
end
{key, i * 10}
end
{k1, k2}
end
# Build a new/2 input: list of `n` `{string_key, int}` pairs. The first
# `n - dup_count` entries have unique keys; the remaining `dup_count`
# entries reuse keys from the unique block (so they're duplicates).
new_input = fn n, pct ->
dup_count = div(n * pct, 100)
unique_n = max(1, n - dup_count)
for i <- 0..(n - 1) do
base = if i < unique_n, do: i, else: rem(i, unique_n)
{"k_#{base}", i}
end
end
# Build a pop input: list of size n with unique keys; target key `:t` is placed
# at position `pct%` of the list (e.g. pct=25 → position n/4). This varies the
# scan distance to the first (and only) hit, with no duplicates involved.
pop_input = fn n, pct ->
target_pos = min(n - 1, div(n * pct, 100))
kws =
for i <- 0..(n - 1) do
if i == target_pos do
{:t, i}
else
{String.to_atom("k_#{i}"), i}
end
end
{kws, :t}
end
# Build a take/drop/split input: keywords list of size n with unique keys.
# The `keys` lookup list contains `pct%` of those same keys (overlap = pct%).
# pct=0 → keys is empty (degenerate but well-defined).
take_input = fn n, pct ->
kws = for i <- 0..(n - 1), do: {String.to_atom("k_#{i}"), i}
hit_count = div(n * pct, 100)
keys = for i <- 0..(hit_count - 1)//1, do: String.to_atom("k_#{i}")
{kws, keys}
end
assert_equal = fn label, a, b ->
if a != b do
raise "Baseline/Optimised disagree on #{label}:\n baseline: #{inspect(a)}\n optimised: #{inspect(b)}"
end
end
for finding <- findings do
IO.puts("\n#{String.duplicate("═", 56)}")
IO.puts("=== BENCH_F=#{finding} ===")
IO.puts(String.duplicate("═", 56))
case finding do
"merge2" ->
sizes = [10, 100, 1000]
inputs =
for n <- sizes, pct <- pcts, into: %{} do
{"n=#{n} overlap=#{pct}%", merge_input.(n, pct)}
end
for {label, {k1, k2}} <- inputs do
assert_equal.("merge/2 #{label}", Baseline.merge(k1, k2), Optimised.merge(k1, k2))
end
Benchee.run(
%{
"Baseline.merge/2" => fn {k1, k2} -> Baseline.merge(k1, k2) end,
"Optimised.merge/2" => fn {k1, k2} -> Optimised.merge(k1, k2) end
},
inputs: inputs,
warmup: 1,
time: 3,
exclude_outliers: true,
print: [fast_warning: false, benchmarking: false]
)
"merge3" ->
sizes = [10, 100, 1000]
f = fn _k, v1, v2 -> v1 + v2 end
inputs =
for n <- sizes, pct <- pcts, into: %{} do
{"n=#{n} overlap=#{pct}%", merge_input.(n, pct)}
end
for {label, {k1, k2}} <- inputs do
assert_equal.("merge/3 #{label}", Baseline.merge(k1, k2, f), Optimised.merge(k1, k2, f))
end
Benchee.run(
%{
"Baseline.merge/3" => fn {k1, k2} -> Baseline.merge(k1, k2, f) end,
"Optimised.merge/3" => fn {k1, k2} -> Optimised.merge(k1, k2, f) end
},
inputs: inputs,
warmup: 1,
time: 3,
exclude_outliers: true,
print: [fast_warning: false, benchmarking: false]
)
"new2" ->
sizes = [10, 100, 1000]
f = fn {k, v} -> {String.to_atom(k), v} end
inputs =
for n <- sizes, pct <- pcts, into: %{} do
{"n=#{n} dup=#{pct}%", new_input.(n, pct)}
end
for {label, pairs} <- inputs do
assert_equal.("new/2 #{label}", Baseline.new(pairs, f), Optimised.new(pairs, f))
end
Benchee.run(
%{
"Baseline.new/2" => fn pairs -> Baseline.new(pairs, f) end,
"Optimised.new/2" => fn pairs -> Optimised.new(pairs, f) end
},
inputs: inputs,
warmup: 1,
time: 3,
exclude_outliers: true,
print: [fast_warning: false, benchmarking: false]
)
"pop" ->
sizes = [10, 100, 1000]
# Vary target position (% from head) rather than duplicate ratio.
pop_pcts = [25, 50, 75]
inputs =
for n <- sizes, pct <- pop_pcts, into: %{} do
{"n=#{n} pos=#{pct}%", pop_input.(n, pct)}
end
for {label, {kws, key}} <- inputs do
assert_equal.("pop/3 #{label}", Baseline.pop(kws, key), Optimised.pop(kws, key))
end
Benchee.run(
%{
"Baseline.pop/3" => fn {kws, key} -> Baseline.pop(kws, key) end,
"Optimised.pop/3" => fn {kws, key} -> Optimised.pop(kws, key) end
},
inputs: inputs,
warmup: 1,
time: 3,
exclude_outliers: true,
print: [fast_warning: false, benchmarking: false]
)
"take" ->
sizes = [5, 10, 100, 1000]
inputs =
for n <- sizes, pct <- pcts, into: %{} do
{"n=#{n} overlap=#{pct}%", take_input.(n, pct)}
end
for {label, {kws, keys}} <- inputs do
assert_equal.("take/2 #{label}", Baseline.take(kws, keys), Optimised.take(kws, keys))
end
Benchee.run(
%{
"Baseline.take/2" => fn {kws, keys} -> Baseline.take(kws, keys) end,
"Optimised.take/2" => fn {kws, keys} -> Optimised.take(kws, keys) end
},
inputs: inputs,
warmup: 1,
time: 3,
exclude_outliers: true,
print: [fast_warning: false, benchmarking: false]
)
"drop" ->
sizes = [5, 10, 100, 1000]
inputs =
for n <- sizes, pct <- pcts, into: %{} do
{"n=#{n} overlap=#{pct}%", take_input.(n, pct)}
end
for {label, {kws, keys}} <- inputs do
assert_equal.("drop/2 #{label}", Baseline.drop(kws, keys), Optimised.drop(kws, keys))
end
Benchee.run(
%{
"Baseline.drop/2" => fn {kws, keys} -> Baseline.drop(kws, keys) end,
"Optimised.drop/2" => fn {kws, keys} -> Optimised.drop(kws, keys) end
},
inputs: inputs,
warmup: 1,
time: 3,
exclude_outliers: true,
print: [fast_warning: false, benchmarking: false]
)
"split" ->
sizes = [5, 10, 100, 1000]
inputs =
for n <- sizes, pct <- pcts, into: %{} do
{"n=#{n} overlap=#{pct}%", take_input.(n, pct)}
end
for {label, {kws, keys}} <- inputs do
assert_equal.("split/2 #{label}", Baseline.split(kws, keys), Optimised.split(kws, keys))
end
Benchee.run(
%{
"Baseline.split/2" => fn {kws, keys} -> Baseline.split(kws, keys) end,
"Optimised.split/2" => fn {kws, keys} -> Optimised.split(kws, keys) end
},
inputs: inputs,
warmup: 1,
time: 3,
exclude_outliers: true,
print: [fast_warning: false, benchmarking: false]
)
other ->
raise "Unknown BENCH_F=#{other}"
end
end |
Results |
|
@PJUllrich thank you! Instead of MapSet, let's please use |
|
I replaced the MapSet a Map. It affected Noticeable changes for
New Benchmark Results for `merge/3`, `split/2`, `take/2`, and `drop/2` |
|
CI is failing. You can reproduce it with |
|
I replaced the Map again with a MapSet and re-ran the benchmarks as a sanity check. It looks like the differences between MapSet and Map were just noise. I'll stick to Map and fix the CI.
|
|
💚 💙 💜 💛 ❤️ |
|
I tried some quick benchmarks, I'm a bit concerned these will make things slower for typical use cases (keywords should typically be quite small, given these are the wrong data structures otherwise). Keyword.merge/2 1.20x slower 5.17x memory |
|
I have reverted this for now in main. @PJUllrich can you please do those again? We need to consider keyword lists with 2, 5, 10 and 20 elements as well. |
|
@josevalim no problem. That was also a concern on my mind. I re-ran the benchmarks and added the 2, 5, 10, and 20 element inputs. Because each iteration would take nanoseconds and is hard to measure, I measured 100 iterations for the small sizes and 1 iteration for the large sizes (100, 1000) New Benchmark ResultsWe're consistently slower for small lists unfortunately. Maybe we need to switch to the optimised logic only for larger lists.
|
| n | overlap | Baseline | Optimised | Delta | Ratio |
|---|---|---|---|---|---|
| 2 | 0% | 1.85 µs | 3.33 µs | +1.48 µs | 1.80x slower |
| 2 | 25% | 1.86 µs | 3.32 µs | +1.46 µs | 1.79x slower |
| 2 | 50% | 2.19 µs | 4.42 µs | +2.23 µs | 2.02x slower |
| 5 | 0% | 3.38 µs | 4.87 µs | +1.49 µs | 1.44x slower |
| 5 | 25% | 4.11 µs | 5.92 µs | +1.81 µs | 1.44x slower |
| 5 | 50% | 4.34 µs | 7.28 µs | +2.94 µs | 1.68x slower |
| 10 | 0% | 6.55 µs | 7.88 µs | +1.33 µs | 1.20x slower |
| 10 | 25% | 7.98 µs | 9.96 µs | +1.98 µs | 1.25x slower |
| 10 | 50% | 8.71 µs | 14.37 µs | +5.67 µs | 1.65x slower |
| 20 | 0% | 11.69 µs | 12.98 µs | +1.29 µs | 1.11x slower |
| 20 | 25% | 16.80 µs | 21.70 µs | +4.90 µs | 1.29x slower |
| 20 | 50% | 19.72 µs | 30.88 µs | +11.16 µs | 1.57x slower |
drop/2 per 100 calls
| n | overlap | Baseline | Optimised | Delta | Ratio |
|---|---|---|---|---|---|
| 2 | 0% | 2.61 µs | 3.81 µs | +1.20 µs | 1.46x slower |
| 2 | 25% | 2.60 µs | 3.80 µs | +1.20 µs | 1.46x slower |
| 2 | 50% | 2.49 µs | 4.65 µs | +2.16 µs | 1.87x slower |
| 5 | 0% | 4.88 µs | 5.99 µs | +1.11 µs | 1.23x slower |
| 5 | 25% | 5.12 µs | 7.17 µs | +2.05 µs | 1.40x slower |
| 5 | 50% | 5.03 µs | 8.70 µs | +3.67 µs | 1.73x slower |
| 10 | 0% | 9.22 µs | 10.15 µs | +0.93 µs | 1.10x slower |
| 10 | 25% | 9.85 µs | 14.71 µs | +4.86 µs | 1.49x slower |
| 10 | 50% | 11.09 µs | 17.44 µs | +6.35 µs | 1.57x slower |
| 20 | 0% | 16.64 µs | 17.47 µs | +0.83 µs | 1.05x (tied) |
| 20 | 25% | 20.89 µs | 27.07 µs | +6.18 µs | 1.30x slower |
| 20 | 50% | 23.02 µs | 35.38 µs | +12.36 µs | 1.54x slower |
split/2 per 100 calls
| n | overlap | Baseline | Optimised | Delta | Ratio |
|---|---|---|---|---|---|
| 2 | 0% | 3.58 µs | 4.98 µs | +1.40 µs | 1.39x slower |
| 2 | 25% | 3.63 µs | 5.04 µs | +1.41 µs | 1.39x slower |
| 2 | 50% | 3.08 µs | 5.40 µs | +2.32 µs | 1.75x slower |
| 5 | 0% | 7.24 µs | 6.86 µs | −0.38 µs | 1.06x faster (WIN) |
| 5 | 25% | 5.86 µs | 8.54 µs | +2.68 µs | 1.46x slower |
| 5 | 50% | 5.81 µs | 8.81 µs | +3.00 µs | 1.52x slower |
| 10 | 0% | 10.31 µs | 11.36 µs | +1.05 µs | 1.10x slower |
| 10 | 25% | 10.83 µs | 13.37 µs | +2.54 µs | 1.23x slower |
| 10 | 50% | 12.32 µs | 18.02 µs | +5.70 µs | 1.46x slower |
| 20 | 0% | 18.65 µs | 19.30 µs | +0.65 µs | 1.03x (tied) |
| 20 | 25% | 22.24 µs | 27.42 µs | +5.18 µs | 1.23x slower |
| 20 | 50% | 24.87 µs | 35.94 µs | +11.07 µs | 1.45x slower |
| Function | Optimised slower | Tied (within ~5%) | Optimised faster | Wins at n=1000 |
|---|---|---|---|---|
pop/3 |
— | — | n=10 already | 1.3x – 2.2x |
new/2 |
n=10 dup=0% (1.07x slower, +31 ns) | n=10 dup=25% | n=10 dup=50% (1.06x), n=100+ | 3.8x – 4.4x |
merge/2 |
n=10 (all overlaps, +2–26 ns) | — | n=100 (1.57x – 2.16x) | 6.8x – 9.4x |
merge/3 |
n=10 (all overlaps, 1.44x – 1.76x, +112–146 ns) | — | n=100 (1.81x – 4.75x) | 7.5x – 21.7x |
take/2 |
n=2–20 (1.11x – 2.02x slower) | n=100 (within 4%) | n=1000 with overlap≥25% | 6.6x – 7.0x |
drop/2 |
n=2–20 (1.05x – 1.87x slower) | n=100 (within 4%) | n=1000 with overlap≥25% | 5.9x – 6.8x |
split/2 |
n=2–20 (1.03x – 1.75x slower) | n=100 (within 5%) | n=1000 with overlap≥25% | 5.9x – 6.4x |
|
@PJUllrich so it seems the only consistent win is with |
|
Yes that sounds good given the context that keywords are commonly short instead of hundreds of elements long. I’ll open a pr for just pop tomorrow :)
…-------- Original Message --------
On Thursday, 05/14/26 at 18:59 José Valim ***@***.***> wrote:
josevalim left a comment [(elixir-lang/elixir#15378)](#15378 (comment))
***@***.***(https://github.com/PJUllrich) so it seems the only consistent win is with pop/3? Shall we have a PR for that?
—
Reply to this email directly, [view it on GitHub](#15378 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/ACPRRGZME4HGSTUOUVZ6RUD42YCQXAVCNFSM6AAAAACY54EBBKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DINJTGMZTGNZZGQ).
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
I refactored some functions in
Keywordand achieved the following speed-ups over the baseline.nis the number of items in the input list(s). Time is the average iteration time of the optimized code. The speed-up multiplier is the best-case result (e.g. 50% of duplicate keys in the input ofKeyword.merge/2)The values at
n=5andn=10seem slower than the baseline, but given their nanosecond execution times, variations here are mostly due to noise, but I think some regressions are real, for example formerge/3 withn=10`, so caution is advised.new/2merge/2merge/3pop/3take/2drop/2split/2An overview of the changes I've applied to the functions:
new/2The original code would call
put_new/3on every key-value pair which runs:lists.keyfind(key, 1, acc)under the hood to check whether a key is already present. As the accumulator grows, the:lists.keyfind/3would walk the growing accumulator for every item.The new code builds a
seenmap of keys and pattern-matches against new keys to check for duplicates. I also considered usingMap.get(seen, k)here but my assumption was that the pattern-match found be slightly faster.merge/2The original code would first call
keyword?(keywords2)which walks the entirekeywords2list to confirm it's a keyword list. It then calledhas_key?(keywords2, key)on every key-value pair which calls:lists.keymember(key, 1, keywords)under the hood, again walking the entire accumulator for every new item.The new code first collects all keys in
keywords2and validates that the keyword list is valid in a single pass. It then usesMap.has_key?(keys2, key)to check for duplicates.merge/3The original code would walk
keywords2only once, but then walkkeywords1three times for every entry. First, it would walk it asoriginalwith:lists.keyfind(key, 1, original)to check for duplication. If a duplication exists, it would walk it again with:lists.keydelete(key, 1, original)and once more withdelete(rest, key). The:lists.keydelete/3would remove only the first occurrence of the duplicate key fromoriginalbut remove all duplicates fromrest. If akeywords1had two occurrences of the same key, the original code would call:lists.keydelete(key, 1, original)again but it would be noop fordelete(rest, key).The new code does three linear passes. First, it walks
keywords2to collect all keys into a map with an empty list for each key. It also validates the keys. Second,partition_left/4walkskeywords1once. It groups the values of keys also occurring inkeywords2, builds an array of keys that don't occur inkeywords2, and tracks the duplicate keys in a MapSet. Ifkeywords1has any duplicate keys, it reverses the values of the duplicate keys to make sure their values are back in order.Third,
emit_rightwalkskeywords2once more and joins the values ofkeywords2of keys that also appeared inkeywords1left-to-right. It then adds the key-value pairs that didn't have a duplicate key inkeywords1. Lastly, we concatenate the key-value pair fromkeywords1whose key did not occur inkeywords2with the values fromkeywords2that did or did not have duplicate keys inkewords1.split/2-take/2-drop/2All three of these functions would run
k in keysfor every entry of the keyword list. This would be fine for a small list ofkeysbut if that lists grows to e.g. 50% of the keyword list, it would slow down these functions significantly.The new code builds a MapSet from the keys and uses
MapSet.member?/2for every entry. I'm not sure whether this is "safe" though. Can we assume that allkeysare always valid MapSet entries? Thekey in keyscheck previously only checked for equality and didn't assume any validity of the keys.pop/2The original code would walk the
keywordslist three times: Once to check whether the list contains thekeywithfetch(keywords, key), then a second time insidedelete(keywords. key)which checks again that the key exists with:lists.keymember(key, 1, keywords), and thendelete_key/2would walk thekeywordslist a third time to remove the key from the list.The new code walks the
keywordslist only once. If it finds the key, it runsdelete_key(tail, key)to remove it from the tail and reverses the now clean new list. If it doesn't find the key, it reverses thekeywordslist once. This logic is very similar toreplace/3.I will add the benchmark and results in separate comments below.