Skip to content

Improve Keyword performance#15378

Merged
josevalim merged 8 commits into
elixir-lang:mainfrom
PJUllrich:improve-keyword-performance
May 14, 2026
Merged

Improve Keyword performance#15378
josevalim merged 8 commits into
elixir-lang:mainfrom
PJUllrich:improve-keyword-performance

Conversation

@PJUllrich
Copy link
Copy Markdown
Contributor

Disclosure: I found, benchmarked, and refactored the performance improvements here with Claude Opus 4.7 but I rewrote most of Claude's code by hand and I understand the proposed changes, benchmarks, and results. I wrote the description below myself as well.

I refactored some functions in Keyword and achieved the following speed-ups over the baseline. n is the number of items in the input list(s). Time is the average iteration time of the optimized code. The speed-up multiplier is the best-case result (e.g. 50% of duplicate keys in the input of Keyword.merge/2)

The values at n=5 and n=10 seem slower than the baseline, but given their nanosecond execution times, variations here are mostly due to noise, but I think some regressions are real, for example for merge/3 with n=10`, so caution is advised.

Function n=5 n=10 n=100 n=1000
new/2 - 1.10x · 395 ns 1.11x · 6.05 µs 3.88x · 76.03 µs
merge/2 - 0.90x · 208 ns 1.62x · 4.42 µs 7.13x · 88.80 µs
merge/3 - 0.68x · 394 ns 3.32x · 8.49 µs 16.01x · 200 µs
pop/3 - 2.22x · 30 ns 3.03x · 291 ns 2.20x · 3.97 µs
take/2 0.80x · 42ns 0.98x · 87 ns 1.13x · 2.98 µs 8.17x · 44.55 µs
drop/2 1.32x · 51ns 0.88x · 109 ns 1.05x · 3.34 µs 7.22x · 50.53 µs
split/2 0.87x · 83ns 0.88x · 138 ns 1.10x · 3.17 µs 6.43x · 58.22 µs

An overview of the changes I've applied to the functions:

new/2

The original code would call put_new/3 on every key-value pair which runs :lists.keyfind(key, 1, acc) under the hood to check whether a key is already present. As the accumulator grows, the :lists.keyfind/3 would walk the growing accumulator for every item.

The new code builds a seen map of keys and pattern-matches against new keys to check for duplicates. I also considered using Map.get(seen, k) here but my assumption was that the pattern-match found be slightly faster.

merge/2

The original code would first call keyword?(keywords2) which walks the entire keywords2 list to confirm it's a keyword list. It then called has_key?(keywords2, key) on every key-value pair which calls :lists.keymember(key, 1, keywords) under the hood, again walking the entire accumulator for every new item.

The new code first collects all keys in keywords2 and validates that the keyword list is valid in a single pass. It then uses Map.has_key?(keys2, key) to check for duplicates.

merge/3

The original code would walk keywords2 only once, but then walk keywords1 three times for every entry. First, it would walk it as original with :lists.keyfind(key, 1, original) to check for duplication. If a duplication exists, it would walk it again with :lists.keydelete(key, 1, original) and once more with delete(rest, key). The :lists.keydelete/3 would remove only the first occurrence of the duplicate key from original but remove all duplicates from rest. If a keywords1 had two occurrences of the same key, the original code would call :lists.keydelete(key, 1, original) again but it would be noop for delete(rest, key).

The new code does three linear passes. First, it walks keywords2 to collect all keys into a map with an empty list for each key. It also validates the keys. Second, partition_left/4 walks keywords1 once. It groups the values of keys also occurring in keywords2, builds an array of keys that don't occur in keywords2, and tracks the duplicate keys in a MapSet. If keywords1 has any duplicate keys, it reverses the values of the duplicate keys to make sure their values are back in order.

Third, emit_right walks keywords2 once more and joins the values of keywords2 of keys that also appeared in keywords1 left-to-right. It then adds the key-value pairs that didn't have a duplicate key in keywords1. Lastly, we concatenate the key-value pair from keywords1 whose key did not occur in keywords2 with the values from keywords2 that did or did not have duplicate keys in kewords1.

split/2 - take/2 - drop/2

All three of these functions would run k in keys for every entry of the keyword list. This would be fine for a small list of keys but if that lists grows to e.g. 50% of the keyword list, it would slow down these functions significantly.

The new code builds a MapSet from the keys and uses MapSet.member?/2 for every entry. I'm not sure whether this is "safe" though. Can we assume that all keys are always valid MapSet entries? The key in keys check previously only checked for equality and didn't assume any validity of the keys.

pop/2

The original code would walk the keywords list three times: Once to check whether the list contains the key with fetch(keywords, key), then a second time inside delete(keywords. key) which checks again that the key exists with :lists.keymember(key, 1, keywords), and then delete_key/2 would walk the keywords list a third time to remove the key from the list.

The new code walks the keywords list only once. If it finds the key, it runs delete_key(tail, key) to remove it from the tail and reverses the now clean new list. If it doesn't find the key, it reverses the keywords list once. This logic is very similar to replace/3.

I will add the benchmark and results in separate comments below.

@PJUllrich
Copy link
Copy Markdown
Contributor Author

Benchmark
# > Disclosure: I found, benchmarked, and refactored the performance improvements here with Claude Opus 4.7 but I understand the proposed changes, benchmarks, and results. I wrote the description below myself.

# I refactored some of the functions in `Keyword` and achieved roughly the following speed-up over the baseline. `n` is the number of items in the input list(s). Time is the mean iteration time of the optimized code. The values at `n=10` seem slower than the baseline, but given their nanosecond execution times, variations here are largely due to noise (deviations for `n=10` were typically 50-300ns)

# | Function | n=10 | n=100 | n=1000 |
# |---|---:|---:|---:|
# | `new/2`    | 1.10x · 441 ns | 1.10x · 6.32 µs | **3.70x** · 83 µs |
# | `merge/2`  | 0.83x · 253 ns | 1.49x · 4.96 µs | **6.88x** · 95 µs |
# | `merge/3`  | 0.64x · 483 ns | 2.91x · 9.83 µs | **14.17x** · 230 µs |
# | `pop/3`    | 1.40x · 41 ns  | **1.65x** · 548 ns | **1.56x** · 5.49 µs |
# | `take/2`   | 1.04x · 101 ns | 1.04x · 3.35 µs | **7.89x** · 47.1 µs |
# | `drop/2`   | 0.88x · 135 ns | 1.07x · 3.48 µs | **7.05x** · 53.1 µs |
# | `split/2`  | 1.22x · 137 ns | 1.05x · 3.50 µs | **6.85x** · 55.4 µs

# An overview of the changes I've applied to the functions:

# ### `new/2`
# The original code would call `put_new/3` on every key-value pair which runs `:lists.keyfind(key, 1, acc)` under the hood to check whether a key is already present. As the accumulator grows, the `:lists.keyfind/3` would walk the entire accumulator for every item.

# The new code builds a `seen` map of keys and pattern-matches against new keys to check for duplicates. I also considered using `Map.get(seen, k)` here but my assumption was that the pattern-match found be slightly faster.

# ### `merge/2`
# The original code would first call `keyword?(keywords2)` which walks the entire `keywords2` list to confirm it's a keyword list. It then called `has_key?(keywords2, key)` on every key-value pair which calls `:lists.keymember(key, 1, keywords)` under the hood, again walking the entire accumulator for every new item.

# The new code first collects all keys in `keywords2` and validates that the keyword list is valid. It then uses `Map.has_key?(keys2, key)` on the output map of keys to check for duplicates.

# ### `merge/3`
# The original code would walk `keywords2` only once, but would walk `keywords1` three times for every entry. First, it would walk it as `original` with `:lists.keyfind(key, 1, original)` to check for duplication. If a duplication exists, it would walk it again with `:lists.keydelete(key, 1, original)` and once more with `delete(rest, key)`. The `:lists.keydelete/3` would remove only the first occurrence of the duplicate key from `original` but remove **all** duplicates from `rest`. If a `keywords1` had two occurrences of the same key, the original code would call `:lists.keydelete(key, 1, original)` again but it would be noop for `delete(rest, key)`.

# The new code does three linear passes. First, it walks `keywords2` to collect all keys into a map with an empty list for each key. It also validates the keys. Second, `partition_left/4` walks `keywords1` once. It groups the values of keys occurring in `keywords2`, builds an array of keys that don't occur in `keywords2`, and tracks the duplicate keys in `keywords1`. If `keywords1` has any duplicate keys, it reverses the values of the duplicate keys to make sure their values are back in order.

# Third, `emit_right` walks `keywords2` once more and joins the values of `keywords2` of keys that also appeared in `keywords1` left-to-right. It also adds the key-value pairs that didn't have a duplicate key in `keywords1`. Lastly, we concatenate the key-value pair from `keywords1` whose key did not occur in `keywords2` with the values from `keywords2` that did or did not have duplicate keys in `kewords1`.
# ----

### `split/2`

# Comparison bench for Keyword findings.
#
# Usage:
#   elixir .private/bench_keyword_compare.exs            # all findings
#   elixir .private/bench_keyword_compare.exs merge2     # just one
#
# Findings: merge2, merge3, new2, pop, replace_miss, replace_hit, take, drop, split
#
# `Baseline` holds the current upstream implementations verbatim. `Optimised`
# holds the proposed alternatives. Function names and arities match upstream
# exactly, so each scenario calls e.g. Baseline.merge/2 vs Optimised.merge/2.

Mix.install([{:benchee, "~> 1.3"}])

defmodule Baseline do
  # ---- F064-001: merge/2 -------------------------------------------------
  def merge(keywords1, []) when is_list(keywords1), do: keywords1
  def merge([], keywords2) when is_list(keywords2), do: keywords2

  def merge(keywords1, keywords2) when is_list(keywords1) and is_list(keywords2) do
    if keyword?(keywords2) do
      fun = fn
        {key, _value} when is_atom(key) ->
          not has_key?(keywords2, key)

        _ ->
          raise ArgumentError,
                "expected a keyword list as the first argument, got: #{inspect(keywords1)}"
      end

      :lists.filter(fun, keywords1) ++ keywords2
    else
      raise ArgumentError,
            "expected a keyword list as the second argument, got: #{inspect(keywords2)}"
    end
  end

  # ---- F064-004: merge/3 -------------------------------------------------
  def merge(keywords1, keywords2, fun)
      when is_list(keywords1) and is_list(keywords2) and is_function(fun, 3) do
    if keyword?(keywords1) do
      do_merge(keywords2, [], keywords1, keywords1, fun, keywords2)
    else
      raise ArgumentError,
            "expected a keyword list as the first argument, got: #{inspect(keywords1)}"
    end
  end

  defp do_merge([{key, value2} | tail], acc, rest, original, fun, keywords2) when is_atom(key) do
    case :lists.keyfind(key, 1, original) do
      {^key, value1} ->
        acc = [{key, fun.(key, value1, value2)} | acc]
        original = :lists.keydelete(key, 1, original)
        do_merge(tail, acc, delete(rest, key), original, fun, keywords2)

      false ->
        do_merge(tail, [{key, value2} | acc], rest, original, fun, keywords2)
    end
  end

  defp do_merge([], acc, rest, _original, _fun, _keywords2) do
    rest ++ :lists.reverse(acc)
  end

  defp do_merge(_other, _acc, _rest, _original, _fun, keywords2) do
    raise ArgumentError,
          "expected a keyword list as the second argument, got: #{inspect(keywords2)}"
  end

  # ---- F064-002: new/2 ---------------------------------------------------
  def new(pairs, transform) when is_function(transform, 1) do
    fun = fn el, acc ->
      {k, v} = transform.(el)
      put_new(acc, k, v)
    end

    :lists.foldl(fun, [], Enum.reverse(pairs))
  end

  # ---- F064-003: pop -----------------------------------------------------
  def pop(keywords, key, default \\ nil) when is_list(keywords) and is_atom(key) do
    case fetch(keywords, key) do
      {:ok, value} -> {value, delete(keywords, key)}
      :error -> {default, keywords}
    end
  end

  # ---- F064-005: replace -------------------------------------------------
  def replace(keywords, key, value) when is_list(keywords) and is_atom(key) do
    do_replace(keywords, key, value)
  end

  defp do_replace([{key, _} | keywords], key, value) do
    [{key, value} | delete(keywords, key)]
  end

  defp do_replace([{_, _} = e | keywords], key, value) do
    [e | do_replace(keywords, key, value)]
  end

  defp do_replace([], _key, _value) do
    []
  end

  # ---- F064-006: take / drop / split ------------------------------------
  def take(keywords, keys) when is_list(keywords) and is_list(keys) do
    :lists.filter(fn {k, _} -> :lists.member(k, keys) end, keywords)
  end

  def drop(keywords, keys) when is_list(keywords) and is_list(keys) do
    :lists.filter(fn {k, _} -> k not in keys end, keywords)
  end

  def split(keywords, keys) when is_list(keywords) and is_list(keys) do
    fun = fn {k, v}, {take, drop} ->
      case k in keys do
        true -> {[{k, v} | take], drop}
        false -> {take, [{k, v} | drop]}
      end
    end

    acc = {[], []}
    {take, drop} = :lists.foldl(fun, acc, keywords)
    {:lists.reverse(take), :lists.reverse(drop)}
  end

  # ---- Internals reused by Baseline functions ---------------------------
  def keyword?([{key, _value} | rest]) when is_atom(key), do: keyword?(rest)
  def keyword?([]), do: true
  def keyword?(_other), do: false

  def has_key?(keywords, key) when is_list(keywords) and is_atom(key) do
    :lists.keymember(key, 1, keywords)
  end

  def fetch(keywords, key) when is_list(keywords) and is_atom(key) do
    case :lists.keyfind(key, 1, keywords) do
      {^key, value} -> {:ok, value}
      false -> :error
    end
  end

  def delete(keywords, key) when is_list(keywords) and is_atom(key) do
    case :lists.keymember(key, 1, keywords) do
      true -> delete_key(keywords, key)
      _ -> keywords
    end
  end

  defp delete_key([{key, _} | tail], key), do: delete_key(tail, key)
  defp delete_key([{_, _} = pair | tail], key), do: [pair | delete_key(tail, key)]
  defp delete_key([], _key), do: []

  def put_new(keywords, key, value) when is_list(keywords) and is_atom(key) do
    case :lists.keyfind(key, 1, keywords) do
      {^key, _} -> keywords
      false -> [{key, value} | keywords]
    end
  end
end

defmodule Optimised do
  # F064-001 — merge/2: collapse O(n·m) to O(n+m) using a key-set of keywords2.
  def merge(keywords1, []) when is_list(keywords1), do: keywords1
  def merge([], keywords2) when is_list(keywords2), do: keywords2

  def merge(keywords1, keywords2) when is_list(keywords1) and is_list(keywords2) do
    keys2 = keys_of!(keywords2)

    fun = fn
      {k, _} when is_atom(k) -> not Map.has_key?(keys2, k)
      _ -> raise_first_arg!(keywords1)
    end

    :lists.filter(fun, keywords1) ++ keywords2
  end

  # F064-004 — merge/3: matches the current keyword.ex implementation
  # (MapSet-tracked duplicate keys, targeted reverse, three linear passes).
  def merge(keywords1, keywords2, fun)
      when is_list(keywords1) and is_list(keywords2) and is_function(fun, 3) do
    if not keyword?(keywords1), do: raise_first_arg!(keywords1)

    keys2 = keys_of!(keywords2)

    {non_matching_rev, keys2, duplicate_keys} =
      partition_left_merge3(keywords1, [], keys2, MapSet.new())

    keys2 =
      Enum.reduce(duplicate_keys, keys2, fn key, acc ->
        Map.update!(acc, key, &:lists.reverse/1)
      end)

    emitted_rev = emit_right_merge3(keywords2, [], keys2, fun)
    :lists.reverse(non_matching_rev) ++ :lists.reverse(emitted_rev)
  end

  defp partition_left_merge3([{key, value} | rest], non_matching, keys2, duplicate_keys) do
    case keys2 do
      %{^key => []} ->
        partition_left_merge3(rest, non_matching, Map.put(keys2, key, [value]), duplicate_keys)

      %{^key => current} ->
        partition_left_merge3(
          rest,
          non_matching,
          Map.put(keys2, key, [value | current]),
          MapSet.put(duplicate_keys, key)
        )

      _ ->
        partition_left_merge3(rest, [{key, value} | non_matching], keys2, duplicate_keys)
    end
  end

  defp partition_left_merge3([], non_matching, keys2, duplicate_keys),
    do: {non_matching, keys2, duplicate_keys}

  defp emit_right_merge3([{key, value2} | rest], emitted, keys2, fun) do
    case keys2 do
      %{^key => [value1 | remaining]} ->
        emit_right_merge3(
          rest,
          [{key, fun.(key, value1, value2)} | emitted],
          Map.put(keys2, key, remaining),
          fun
        )

      _ ->
        emit_right_merge3(rest, [{key, value2} | emitted], keys2, fun)
    end
  end

  defp emit_right_merge3([], emitted, _keys2, _fun), do: emitted

  # F064-002 — new/2: O(n) backward traversal with a seen-keys map. Upstream's
  # put_new walks the growing acc via :lists.keyfind, giving O(n·unique).
  def new(pairs, transform) when is_function(transform, 1) do
    {result, _seen} =
      :lists.foldl(
        fn el, {acc, seen} ->
          {k, v} = transform.(el)

          case seen do
            %{^k => _} -> {acc, seen}
            _ -> {[{k, v} | acc], Map.put(seen, k, [])}
          end
        end,
        {[], %{}},
        Enum.reverse(pairs)
      )

    result
  end

  # F064-003 — pop/3: single-pass extract. Baseline fetches then deletes, doing
  # 3 traversals on hit; here we walk once to find the value, then once over
  # the tail to strip any remaining duplicates.
  def pop(keywords, key, default \\ nil) when is_list(keywords) and is_atom(key) do
    do_pop(keywords, key, default, [])
  end

  defp do_pop([{key, value} | tail], key, _default, acc),
    do: {value, :lists.reverse(acc, delete_key(tail, key))}

  defp do_pop([{_, _} = pair | tail], key, default, acc),
    do: do_pop(tail, key, default, [pair | acc])

  defp do_pop([], _key, default, acc), do: {default, :lists.reverse(acc)}

  # F064-005 — replace/3: :lists.keymember is a C BIF (~7-10x faster than BEAM
  # walking), so detect miss via BIF first and return the input unchanged with
  # zero allocation. On hit we re-walk to rebuild — cheap relative to the
  # baseline's full spine rebuild on every call.
  def replace(keywords, key, value) when is_list(keywords) and is_atom(key) do
    if :lists.keymember(key, 1, keywords) do
      do_replace(keywords, key, value)
    else
      keywords
    end
  end

  defp do_replace([{key, _} | tail], key, value),
    do: [{key, value} | delete(tail, key)]

  defp do_replace([{_, _} = pair | tail], key, value),
    do: [pair | do_replace(tail, key, value)]

  defp do_replace([], _key, _value), do: []

  # F064-006 — take/drop/split: build a key-set for O(1) lookups. For ≤ 5 keys
  # the map-build cost exceeds the savings, so we fall back to `:lists.member`
  # — detected by matching the 6th cell without walking the list.
  def take(keywords, keys) when is_list(keywords) and is_list(keys) do
    :lists.filter(in_keys_pred(keys), keywords)
  end

  def drop(keywords, keys) when is_list(keywords) and is_list(keys) do
    pred = in_keys_pred(keys)
    :lists.filter(fn pair -> not pred.(pair) end, keywords)
  end

  def split(keywords, keys) when is_list(keywords) and is_list(keys) do
    pred = in_keys_pred(keys)

    {take, drop} =
      :lists.foldl(
        fn pair, {take, drop} ->
          if pred.(pair), do: {[pair | take], drop}, else: {take, [pair | drop]}
        end,
        {[], []},
        keywords
      )

    {:lists.reverse(take), :lists.reverse(drop)}
  end

  defp in_keys_pred([_, _, _, _, _, _ | _] = keys) do
    set = :lists.foldl(fn k, acc -> Map.put(acc, k, []) end, %{}, keys)
    fn {k, _} -> Map.has_key?(set, k) end
  end

  defp in_keys_pred(keys), do: fn {k, _} -> :lists.member(k, keys) end

  # --- Shared helpers ---

  # Build a {key => []} lookup map from a keyword list. Raises with the full
  # original list if it isn't a keyword list — matching upstream's error shape.
  defp keys_of!(keywords), do: do_keys_of(keywords, %{}, keywords)

  defp do_keys_of([{k, _} | rest], acc, orig) when is_atom(k),
    do: do_keys_of(rest, Map.put(acc, k, []), orig)

  defp do_keys_of([], acc, _orig), do: acc

  defp do_keys_of(_other, _acc, orig) do
    raise ArgumentError,
          "expected a keyword list as the second argument, got: #{inspect(orig)}"
  end

  defp delete(keywords, key) do
    if :lists.keymember(key, 1, keywords), do: delete_key(keywords, key), else: keywords
  end

  defp delete_key([{key, _} | tail], key), do: delete_key(tail, key)
  defp delete_key([{_, _} = pair | tail], key), do: [pair | delete_key(tail, key)]
  defp delete_key([], _key), do: []

  defp keyword?([{k, _} | rest]) when is_atom(k), do: keyword?(rest)
  defp keyword?([]), do: true
  defp keyword?(_), do: false

  defp raise_first_arg!(kws) do
    raise ArgumentError,
          "expected a keyword list as the first argument, got: #{inspect(kws)}"
  end
end

# ---------------------------------------------------------------------------
# Inputs and scenarios.
# ---------------------------------------------------------------------------

all_findings = ~w(new2 merge2 merge3 pop take drop split)

findings =
  case System.argv() do
    [] ->
      all_findings

    ["all"] ->
      all_findings

    [one] ->
      [one]

    other ->
      raise "Usage: elixir bench_keyword_compare.exs [all | #{Enum.join(all_findings, " | ")}]\nGot: #{inspect(other)}"
  end

# All scenarios share the same percentage dimension. The meaning of "pct" is
# function-specific (see the bench comments inside each case-branch).
pcts = [0, 25, 50]

# Build a merge input: keywords1 has size n with unique keys :k_0.. :k_(n-1).
# keywords2 has size n; the first `pct%` of its entries reuse :k_i keys
# (overlap), the rest use :x_i keys (no overlap).
merge_input = fn n, pct ->
  k1 = for i <- 0..(n - 1), do: {String.to_atom("k_#{i}"), i}
  overlap_count = div(n * pct, 100)

  k2 =
    for i <- 0..(n - 1) do
      key =
        if i < overlap_count do
          String.to_atom("k_#{i}")
        else
          String.to_atom("x_#{i}")
        end

      {key, i * 10}
    end

  {k1, k2}
end

# Build a new/2 input: list of `n` `{string_key, int}` pairs. The first
# `n - dup_count` entries have unique keys; the remaining `dup_count`
# entries reuse keys from the unique block (so they're duplicates).
new_input = fn n, pct ->
  dup_count = div(n * pct, 100)
  unique_n = max(1, n - dup_count)

  for i <- 0..(n - 1) do
    base = if i < unique_n, do: i, else: rem(i, unique_n)
    {"k_#{base}", i}
  end
end

# Build a pop input: list of size n with unique keys; target key `:t` is placed
# at position `pct%` of the list (e.g. pct=25 → position n/4). This varies the
# scan distance to the first (and only) hit, with no duplicates involved.
pop_input = fn n, pct ->
  target_pos = min(n - 1, div(n * pct, 100))

  kws =
    for i <- 0..(n - 1) do
      if i == target_pos do
        {:t, i}
      else
        {String.to_atom("k_#{i}"), i}
      end
    end

  {kws, :t}
end

# Build a take/drop/split input: keywords list of size n with unique keys.
# The `keys` lookup list contains `pct%` of those same keys (overlap = pct%).
# pct=0 → keys is empty (degenerate but well-defined).
take_input = fn n, pct ->
  kws = for i <- 0..(n - 1), do: {String.to_atom("k_#{i}"), i}
  hit_count = div(n * pct, 100)
  keys = for i <- 0..(hit_count - 1)//1, do: String.to_atom("k_#{i}")
  {kws, keys}
end

assert_equal = fn label, a, b ->
  if a != b do
    raise "Baseline/Optimised disagree on #{label}:\n  baseline:  #{inspect(a)}\n  optimised: #{inspect(b)}"
  end
end

for finding <- findings do
  IO.puts("\n#{String.duplicate("═", 56)}")
  IO.puts("=== BENCH_F=#{finding} ===")
  IO.puts(String.duplicate("═", 56))

  case finding do
    "merge2" ->
      sizes = [10, 100, 1000]

      inputs =
        for n <- sizes, pct <- pcts, into: %{} do
          {"n=#{n} overlap=#{pct}%", merge_input.(n, pct)}
        end

      for {label, {k1, k2}} <- inputs do
        assert_equal.("merge/2 #{label}", Baseline.merge(k1, k2), Optimised.merge(k1, k2))
      end

      Benchee.run(
        %{
          "Baseline.merge/2" => fn {k1, k2} -> Baseline.merge(k1, k2) end,
          "Optimised.merge/2" => fn {k1, k2} -> Optimised.merge(k1, k2) end
        },
        inputs: inputs,
        warmup: 1,
        time: 3,
        exclude_outliers: true,
        print: [fast_warning: false, benchmarking: false]
      )

    "merge3" ->
      sizes = [10, 100, 1000]
      f = fn _k, v1, v2 -> v1 + v2 end

      inputs =
        for n <- sizes, pct <- pcts, into: %{} do
          {"n=#{n} overlap=#{pct}%", merge_input.(n, pct)}
        end

      for {label, {k1, k2}} <- inputs do
        assert_equal.("merge/3 #{label}", Baseline.merge(k1, k2, f), Optimised.merge(k1, k2, f))
      end

      Benchee.run(
        %{
          "Baseline.merge/3" => fn {k1, k2} -> Baseline.merge(k1, k2, f) end,
          "Optimised.merge/3" => fn {k1, k2} -> Optimised.merge(k1, k2, f) end
        },
        inputs: inputs,
        warmup: 1,
        time: 3,
        exclude_outliers: true,
        print: [fast_warning: false, benchmarking: false]
      )

    "new2" ->
      sizes = [10, 100, 1000]
      f = fn {k, v} -> {String.to_atom(k), v} end

      inputs =
        for n <- sizes, pct <- pcts, into: %{} do
          {"n=#{n} dup=#{pct}%", new_input.(n, pct)}
        end

      for {label, pairs} <- inputs do
        assert_equal.("new/2 #{label}", Baseline.new(pairs, f), Optimised.new(pairs, f))
      end

      Benchee.run(
        %{
          "Baseline.new/2" => fn pairs -> Baseline.new(pairs, f) end,
          "Optimised.new/2" => fn pairs -> Optimised.new(pairs, f) end
        },
        inputs: inputs,
        warmup: 1,
        time: 3,
        exclude_outliers: true,
        print: [fast_warning: false, benchmarking: false]
      )

    "pop" ->
      sizes = [10, 100, 1000]
      # Vary target position (% from head) rather than duplicate ratio.
      pop_pcts = [25, 50, 75]

      inputs =
        for n <- sizes, pct <- pop_pcts, into: %{} do
          {"n=#{n} pos=#{pct}%", pop_input.(n, pct)}
        end

      for {label, {kws, key}} <- inputs do
        assert_equal.("pop/3 #{label}", Baseline.pop(kws, key), Optimised.pop(kws, key))
      end

      Benchee.run(
        %{
          "Baseline.pop/3" => fn {kws, key} -> Baseline.pop(kws, key) end,
          "Optimised.pop/3" => fn {kws, key} -> Optimised.pop(kws, key) end
        },
        inputs: inputs,
        warmup: 1,
        time: 3,
        exclude_outliers: true,
        print: [fast_warning: false, benchmarking: false]
      )

    "take" ->
      sizes = [5, 10, 100, 1000]

      inputs =
        for n <- sizes, pct <- pcts, into: %{} do
          {"n=#{n} overlap=#{pct}%", take_input.(n, pct)}
        end

      for {label, {kws, keys}} <- inputs do
        assert_equal.("take/2 #{label}", Baseline.take(kws, keys), Optimised.take(kws, keys))
      end

      Benchee.run(
        %{
          "Baseline.take/2" => fn {kws, keys} -> Baseline.take(kws, keys) end,
          "Optimised.take/2" => fn {kws, keys} -> Optimised.take(kws, keys) end
        },
        inputs: inputs,
        warmup: 1,
        time: 3,
        exclude_outliers: true,
        print: [fast_warning: false, benchmarking: false]
      )

    "drop" ->
      sizes = [5, 10, 100, 1000]

      inputs =
        for n <- sizes, pct <- pcts, into: %{} do
          {"n=#{n} overlap=#{pct}%", take_input.(n, pct)}
        end

      for {label, {kws, keys}} <- inputs do
        assert_equal.("drop/2 #{label}", Baseline.drop(kws, keys), Optimised.drop(kws, keys))
      end

      Benchee.run(
        %{
          "Baseline.drop/2" => fn {kws, keys} -> Baseline.drop(kws, keys) end,
          "Optimised.drop/2" => fn {kws, keys} -> Optimised.drop(kws, keys) end
        },
        inputs: inputs,
        warmup: 1,
        time: 3,
        exclude_outliers: true,
        print: [fast_warning: false, benchmarking: false]
      )

    "split" ->
      sizes = [5, 10, 100, 1000]

      inputs =
        for n <- sizes, pct <- pcts, into: %{} do
          {"n=#{n} overlap=#{pct}%", take_input.(n, pct)}
        end

      for {label, {kws, keys}} <- inputs do
        assert_equal.("split/2 #{label}", Baseline.split(kws, keys), Optimised.split(kws, keys))
      end

      Benchee.run(
        %{
          "Baseline.split/2" => fn {kws, keys} -> Baseline.split(kws, keys) end,
          "Optimised.split/2" => fn {kws, keys} -> Optimised.split(kws, keys) end
        },
        inputs: inputs,
        warmup: 1,
        time: 3,
        exclude_outliers: true,
        print: [fast_warning: false, benchmarking: false]
      )

    other ->
      raise "Unknown BENCH_F=#{other}"
  end
end

@PJUllrich
Copy link
Copy Markdown
Contributor Author

Results
════════════════════════════════════════════════════════
=== BENCH_F=new2 ===
════════════════════════════════════════════════════════
Operating System: macOS
CPU Information: Apple M2 Pro
Number of Available Cores: 10
Available memory: 32 GB
Elixir 1.19.4
Erlang 28.2
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 1 s
time: 3 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: n=10 dup=0%, n=10 dup=25%, n=10 dup=50%, n=100 dup=0%, n=100 dup=25%, n=100 dup=50%, n=1000 dup=0%, n=1000 dup=25%, n=1000 dup=50%
Estimated total run time: 1 min 12 s
Excluding outliers: true


##### With input n=10 dup=0% #####
Name                      ips        average  deviation         median         99th %
Optimised.new/2        2.18 M      458.33 ns     ±0.10%         458 ns         459 ns
Baseline.new/2         2.10 M      476.70 ns     ±4.55%         459 ns         541 ns

Comparison:
Optimised.new/2        2.18 M
Baseline.new/2         2.10 M - 1.04x slower +18.37 ns

##### With input n=10 dup=25% #####
Name                      ips        average  deviation         median         99th %
Optimised.new/2        2.24 M      447.13 ns     ±4.66%         458 ns         500 ns
Baseline.new/2         2.13 M      468.62 ns     ±4.10%         458 ns         500 ns

Comparison:
Optimised.new/2        2.24 M
Baseline.new/2         2.13 M - 1.05x slower +21.49 ns

##### With input n=10 dup=50% #####
Name                      ips        average  deviation         median         99th %
Optimised.new/2        2.53 M      395.99 ns     ±5.52%         375 ns         458 ns
Baseline.new/2         2.29 M      435.82 ns     ±4.97%         417 ns         459 ns

Comparison:
Optimised.new/2        2.53 M
Baseline.new/2         2.29 M - 1.10x slower +39.83 ns

##### With input n=100 dup=0% #####
Name                      ips        average  deviation         median         99th %
Optimised.new/2      129.54 K        7.72 μs     ±1.38%        7.71 μs        8.04 μs
Baseline.new/2       108.47 K        9.22 μs     ±1.31%        9.21 μs        9.58 μs

Comparison:
Optimised.new/2      129.54 K
Baseline.new/2       108.47 K - 1.19x slower +1.50 μs

##### With input n=100 dup=25% #####
Name                      ips        average  deviation         median         99th %
Optimised.new/2      144.83 K        6.90 μs     ±1.38%        6.88 μs        7.17 μs
Baseline.new/2       116.92 K        8.55 μs     ±1.37%        8.54 μs        8.83 μs

Comparison:
Optimised.new/2      144.83 K
Baseline.new/2       116.92 K - 1.24x slower +1.65 μs

##### With input n=100 dup=50% #####
Name                      ips        average  deviation         median         99th %
Optimised.new/2      165.18 K        6.05 μs     ±1.57%        6.04 μs        6.33 μs
Baseline.new/2       149.36 K        6.70 μs     ±1.54%        6.67 μs           7 μs

Comparison:
Optimised.new/2      165.18 K
Baseline.new/2       149.36 K - 1.11x slower +0.64 μs

##### With input n=1000 dup=0% #####
Name                      ips        average  deviation         median         99th %
Optimised.new/2        7.60 K      131.63 μs    ±15.42%      129.54 μs      188.47 μs
Baseline.new/2         1.87 K      534.42 μs     ±0.86%      533.46 μs      546.92 μs

Comparison:
Optimised.new/2        7.60 K
Baseline.new/2         1.87 K - 4.06x slower +402.79 μs

##### With input n=1000 dup=25% #####
Name                      ips        average  deviation         median         99th %
Optimised.new/2        9.65 K      103.62 μs     ±8.55%      102.58 μs      127.46 μs
Baseline.new/2         2.11 K      474.71 μs     ±1.54%      472.67 μs      495.70 μs

Comparison:
Optimised.new/2        9.65 K
Baseline.new/2         2.11 K - 4.58x slower +371.10 μs

##### With input n=1000 dup=50% #####
Name                      ips        average  deviation         median         99th %
Optimised.new/2       13.15 K       76.03 μs     ±3.28%       75.17 μs       84.88 μs
Baseline.new/2         3.39 K      295.09 μs     ±0.64%      294.63 μs      301.08 μs

Comparison:
Optimised.new/2       13.15 K
Baseline.new/2         3.39 K - 3.88x slower +219.06 μs

════════════════════════════════════════════════════════
=== BENCH_F=merge2 ===
════════════════════════════════════════════════════════
Operating System: macOS
CPU Information: Apple M2 Pro
Number of Available Cores: 10
Available memory: 32 GB
Elixir 1.19.4
Erlang 28.2
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 1 s
time: 3 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: n=10 overlap=0%, n=10 overlap=25%, n=10 overlap=50%, n=100 overlap=0%, n=100 overlap=25%, n=100 overlap=50%, n=1000 overlap=0%, n=1000 overlap=25%, n=1000 overlap=50%
Estimated total run time: 1 min 12 s
Excluding outliers: true


##### With input n=10 overlap=0% #####
Name                        ips        average  deviation         median         99th %
Baseline.merge/2         4.45 M      224.80 ns    ±10.20%         209 ns         292 ns
Optimised.merge/2        4.44 M      225.13 ns     ±9.75%         209 ns         292 ns

Comparison:
Baseline.merge/2         4.45 M
Optimised.merge/2        4.44 M - 1.00x slower +0.34 ns

##### With input n=10 overlap=25% #####
Name                        ips        average  deviation         median         99th %
Baseline.merge/2         4.80 M      208.33 ns     ±0.23%         208 ns         209 ns
Optimised.merge/2        4.51 M      221.87 ns     ±9.39%         209 ns         291 ns

Comparison:
Baseline.merge/2         4.80 M
Optimised.merge/2        4.51 M - 1.06x slower +13.54 ns

##### With input n=10 overlap=50% #####
Name                        ips        average  deviation         median         99th %
Baseline.merge/2         5.27 M      189.87 ns    ±11.83%         208 ns         250 ns
Optimised.merge/2        4.80 M      208.33 ns     ±0.23%         208 ns         209 ns

Comparison:
Baseline.merge/2         5.27 M
Optimised.merge/2        4.80 M - 1.10x slower +18.47 ns

##### With input n=100 overlap=0% #####
Name                        ips        average  deviation         median         99th %
Optimised.merge/2      207.80 K        4.81 μs     ±1.75%        4.79 μs        5.04 μs
Baseline.merge/2        91.91 K       10.88 μs     ±0.68%       10.88 μs       11.08 μs

Comparison:
Optimised.merge/2      207.80 K
Baseline.merge/2        91.91 K - 2.26x slower +6.07 μs

##### With input n=100 overlap=25% #####
Name                        ips        average  deviation         median         99th %
Optimised.merge/2      217.98 K        4.59 μs     ±1.93%        4.58 μs        4.83 μs
Baseline.merge/2       116.89 K        8.56 μs     ±0.77%        8.54 μs        8.75 μs

Comparison:
Optimised.merge/2      217.98 K
Baseline.merge/2       116.89 K - 1.86x slower +3.97 μs

##### With input n=100 overlap=50% #####
Name                        ips        average  deviation         median         99th %
Optimised.merge/2      226.22 K        4.42 μs     ±1.88%        4.42 μs        4.67 μs
Baseline.merge/2       139.54 K        7.17 μs     ±1.06%        7.13 μs        7.42 μs

Comparison:
Optimised.merge/2      226.22 K
Baseline.merge/2       139.54 K - 1.62x slower +2.75 μs

##### With input n=1000 overlap=0% #####
Name                        ips        average  deviation         median         99th %
Optimised.merge/2        9.51 K       0.105 ms     ±0.74%       0.105 ms       0.107 ms
Baseline.merge/2         0.99 K        1.01 ms     ±1.61%        1.01 ms        1.06 ms

Comparison:
Optimised.merge/2        9.51 K
Baseline.merge/2         0.99 K - 9.60x slower +0.90 ms

##### With input n=1000 overlap=25% #####
Name                        ips        average  deviation         median         99th %
Optimised.merge/2       11.65 K       85.81 μs     ±3.61%       84.63 μs       98.42 μs
Baseline.merge/2         1.24 K      804.66 μs     ±2.58%      798.56 μs      862.54 μs

Comparison:
Optimised.merge/2       11.65 K
Baseline.merge/2         1.24 K - 9.38x slower +718.85 μs

##### With input n=1000 overlap=50% #####
Name                        ips        average  deviation         median         99th %
Optimised.merge/2       11.26 K       88.80 μs     ±7.92%       86.17 μs      106.71 μs
Baseline.merge/2         1.58 K      632.99 μs     ±1.04%      630.58 μs      652.05 μs

Comparison:
Optimised.merge/2       11.26 K
Baseline.merge/2         1.58 K - 7.13x slower +544.18 μs

════════════════════════════════════════════════════════
=== BENCH_F=merge3 ===
════════════════════════════════════════════════════════
Operating System: macOS
CPU Information: Apple M2 Pro
Number of Available Cores: 10
Available memory: 32 GB
Elixir 1.19.4
Erlang 28.2
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 1 s
time: 3 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: n=10 overlap=0%, n=10 overlap=25%, n=10 overlap=50%, n=100 overlap=0%, n=100 overlap=25%, n=100 overlap=50%, n=1000 overlap=0%, n=1000 overlap=25%, n=1000 overlap=50%
Estimated total run time: 1 min 12 s
Excluding outliers: true


##### With input n=10 overlap=0% #####
Name                        ips        average  deviation         median         99th %
Baseline.merge/3         4.80 M      208.33 ns     ±0.23%         208 ns         209 ns
Optimised.merge/3        3.12 M      320.44 ns     ±6.76%         333 ns         375 ns

Comparison:
Baseline.merge/3         4.80 M
Optimised.merge/3        3.12 M - 1.54x slower +112.11 ns

##### With input n=10 overlap=25% #####
Name                        ips        average  deviation         median         99th %
Baseline.merge/3         4.25 M      235.10 ns    ±10.08%         250 ns         292 ns
Optimised.merge/3        2.91 M      343.34 ns     ±6.13%         333 ns         417 ns

Comparison:
Baseline.merge/3         4.25 M
Optimised.merge/3        2.91 M - 1.46x slower +108.24 ns

##### With input n=10 overlap=50% #####
Name                        ips        average  deviation         median         99th %
Baseline.merge/3         3.71 M      269.39 ns     ±8.88%         250 ns         334 ns
Optimised.merge/3        2.53 M      394.60 ns     ±5.65%         375 ns         458 ns

Comparison:
Baseline.merge/3         3.71 M
Optimised.merge/3        2.53 M - 1.46x slower +125.21 ns

##### With input n=100 overlap=0% #####
Name                        ips        average  deviation         median         99th %
Optimised.merge/3      200.05 K        5.00 μs     ±1.76%           5 μs        5.25 μs
Baseline.merge/3        96.49 K       10.36 μs     ±0.61%       10.33 μs       10.54 μs

Comparison:
Optimised.merge/3      200.05 K
Baseline.merge/3        96.49 K - 2.07x slower +5.36 μs

##### With input n=100 overlap=25% #####
Name                        ips        average  deviation         median         99th %
Optimised.merge/3      149.22 K        6.70 μs     ±2.73%        6.67 μs        7.21 μs
Baseline.merge/3        47.64 K       20.99 μs     ±1.48%       20.92 μs       21.83 μs

Comparison:
Optimised.merge/3      149.22 K
Baseline.merge/3        47.64 K - 3.13x slower +14.29 μs

##### With input n=100 overlap=50% #####
Name                        ips        average  deviation         median         99th %
Optimised.merge/3      117.77 K        8.49 μs     ±4.17%        8.38 μs        9.71 μs
Baseline.merge/3        35.46 K       28.20 μs     ±2.03%       28.13 μs       29.54 μs

Comparison:
Optimised.merge/3      117.77 K
Baseline.merge/3        35.46 K - 3.32x slower +19.71 μs

##### With input n=1000 overlap=0% #####
Name                        ips        average  deviation         median         99th %
Optimised.merge/3        9.75 K       0.103 ms    ±20.52%      0.0900 ms       0.146 ms
Baseline.merge/3         1.00 K        1.00 ms     ±1.19%        1.00 ms        1.04 ms

Comparison:
Optimised.merge/3        9.75 K
Baseline.merge/3         1.00 K - 9.79x slower +0.90 ms

##### With input n=1000 overlap=25% #####
Name                        ips        average  deviation         median         99th %
Optimised.merge/3        6.09 K       0.164 ms    ±11.48%       0.168 ms        0.21 ms
Baseline.merge/3         0.43 K        2.35 ms     ±1.45%        2.35 ms        2.43 ms

Comparison:
Optimised.merge/3        6.09 K
Baseline.merge/3         0.43 K - 14.30x slower +2.18 ms

##### With input n=1000 overlap=50% #####
Name                        ips        average  deviation         median         99th %
Optimised.merge/3        4.94 K        0.20 ms    ±14.19%       0.192 ms        0.28 ms
Baseline.merge/3         0.31 K        3.24 ms     ±1.26%        3.24 ms        3.34 ms

Comparison:
Optimised.merge/3        4.94 K
Baseline.merge/3         0.31 K - 16.01x slower +3.04 ms

════════════════════════════════════════════════════════
=== BENCH_F=pop ===
════════════════════════════════════════════════════════
Operating System: macOS
CPU Information: Apple M2 Pro
Number of Available Cores: 10
Available memory: 32 GB
Elixir 1.19.4
Erlang 28.2
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 1 s
time: 3 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: n=10 pos=25%, n=10 pos=50%, n=10 pos=75%, n=100 pos=25%, n=100 pos=50%, n=100 pos=75%, n=1000 pos=25%, n=1000 pos=50%, n=1000 pos=75%
Estimated total run time: 1 min 12 s
Excluding outliers: true


##### With input n=10 pos=25% #####
Name                      ips        average  deviation         median         99th %
Optimised.pop/3       24.00 M       41.67 ns     ±1.13%          42 ns          42 ns
Baseline.pop/3        23.68 M       42.23 ns     ±5.18%       41.70 ns          50 ns

Comparison:
Optimised.pop/3       24.00 M
Baseline.pop/3        23.68 M - 1.01x slower +0.57 ns

##### With input n=10 pos=50% #####
Name                      ips        average  deviation         median         99th %
Optimised.pop/3       32.86 M       30.44 ns     ±7.04%       29.20 ns       37.50 ns
Baseline.pop/3        15.72 M       63.61 ns    ±33.01%          83 ns          84 ns

Comparison:
Optimised.pop/3       32.86 M
Baseline.pop/3        15.72 M - 2.09x slower +33.17 ns

##### With input n=10 pos=75% #####
Name                      ips        average  deviation         median         99th %
Optimised.pop/3       32.98 M       30.32 ns     ±7.37%       29.20 ns       37.50 ns
Baseline.pop/3        14.89 M       67.17 ns    ±30.67%          83 ns          84 ns

Comparison:
Optimised.pop/3       32.98 M
Baseline.pop/3        14.89 M - 2.22x slower +36.86 ns

##### With input n=100 pos=25% #####
Name                      ips        average  deviation         median         99th %
Optimised.pop/3           2 M         500 ns     ±0.00%         500 ns         500 ns
Baseline.pop/3         1.26 M      791.67 ns     ±0.06%         792 ns         792 ns

Comparison:
Optimised.pop/3           2 M
Baseline.pop/3         1.26 M - 1.58x slower +291.67 ns

##### With input n=100 pos=50% #####
Name                      ips        average  deviation         median         99th %
Optimised.pop/3        1.93 M      517.25 ns     ±5.00%         500 ns         584 ns
Baseline.pop/3         1.18 M      844.10 ns     ±2.43%         833 ns         916 ns

Comparison:
Optimised.pop/3        1.93 M
Baseline.pop/3         1.18 M - 1.63x slower +326.85 ns

##### With input n=100 pos=75% #####
Name                      ips        average  deviation         median         99th %
Optimised.pop/3        3.43 M      291.67 ns     ±0.16%         292 ns         292 ns
Baseline.pop/3         1.13 M      883.30 ns     ±2.43%         875 ns         958 ns

Comparison:
Optimised.pop/3        3.43 M
Baseline.pop/3         1.13 M - 3.03x slower +591.64 ns

##### With input n=1000 pos=25% #####
Name                      ips        average  deviation         median         99th %
Optimised.pop/3      162.86 K        6.14 μs     ±2.59%        6.13 μs        6.58 μs
Baseline.pop/3       129.09 K        7.75 μs     ±1.32%        7.75 μs           8 μs

Comparison:
Optimised.pop/3      162.86 K
Baseline.pop/3       129.09 K - 1.26x slower +1.61 μs

##### With input n=1000 pos=50% #####
Name                      ips        average  deviation         median         99th %
Optimised.pop/3      198.74 K        5.03 μs     ±2.68%        5.04 μs        5.38 μs
Baseline.pop/3       121.51 K        8.23 μs     ±1.55%        8.21 μs        8.54 μs

Comparison:
Optimised.pop/3      198.74 K
Baseline.pop/3       121.51 K - 1.64x slower +3.20 μs

##### With input n=1000 pos=75% #####
Name                      ips        average  deviation         median         99th %
Optimised.pop/3      251.71 K        3.97 μs     ±3.50%        3.96 μs        4.33 μs
Baseline.pop/3       114.30 K        8.75 μs     ±1.84%        8.75 μs        9.17 μs

Comparison:
Optimised.pop/3      251.71 K
Baseline.pop/3       114.30 K - 2.20x slower +4.78 μs

════════════════════════════════════════════════════════
=== BENCH_F=take ===
════════════════════════════════════════════════════════
Operating System: macOS
CPU Information: Apple M2 Pro
Number of Available Cores: 10
Available memory: 32 GB
Elixir 1.19.4
Erlang 28.2
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 1 s
time: 3 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: n=10 overlap=0%, n=10 overlap=25%, n=10 overlap=50%, n=100 overlap=0%, n=100 overlap=25%, n=100 overlap=50%, n=1000 overlap=0%, n=1000 overlap=25%, n=1000 overlap=50%, n=5 overlap=0%, n=5 overlap=25%, n=5 overlap=50%
Estimated total run time: 1 min 36 s
Excluding outliers: true


##### With input n=10 overlap=0% #####
Name                       ips        average  deviation         median         99th %
Baseline.take/2        15.40 M       64.95 ns     ±3.54%       66.60 ns       70.80 ns
Optimised.take/2       15.00 M       66.67 ns     ±0.07%       66.70 ns       66.70 ns

Comparison:
Baseline.take/2        15.40 M
Optimised.take/2       15.00 M - 1.03x slower +1.72 ns

##### With input n=10 overlap=25% #####
Name                       ips        average  deviation         median         99th %
Baseline.take/2        12.63 M       79.17 ns     ±0.06%       79.20 ns       79.20 ns
Optimised.take/2       12.41 M       80.57 ns     ±2.79%       79.20 ns       87.50 ns

Comparison:
Baseline.take/2        12.63 M
Optimised.take/2       12.41 M - 1.02x slower +1.41 ns

##### With input n=10 overlap=50% #####
Name                       ips        average  deviation         median         99th %
Baseline.take/2        11.66 M       85.76 ns     ±2.66%       87.50 ns       91.70 ns
Optimised.take/2       11.43 M       87.50 ns     ±0.00%       87.50 ns       87.50 ns

Comparison:
Baseline.take/2        11.66 M
Optimised.take/2       11.43 M - 1.02x slower +1.74 ns

##### With input n=100 overlap=0% #####
Name                       ips        average  deviation         median         99th %
Optimised.take/2        1.85 M      541.66 ns     ±0.09%         542 ns         542 ns
Baseline.take/2         1.85 M      541.67 ns     ±0.09%         542 ns         542 ns

Comparison:
Optimised.take/2        1.85 M
Baseline.take/2         1.85 M - 1.00x slower +0.00300 ns

##### With input n=100 overlap=25% #####
Name                       ips        average  deviation         median         99th %
Optimised.take/2      657.15 K        1.52 μs     ±1.66%        1.50 μs        1.58 μs
Baseline.take/2       599.82 K        1.67 μs     ±1.98%        1.67 μs        1.75 μs

Comparison:
Optimised.take/2      657.15 K
Baseline.take/2       599.82 K - 1.10x slower +0.145 μs

##### With input n=100 overlap=50% #####
Name                       ips        average  deviation         median         99th %
Optimised.take/2      335.57 K        2.98 μs     ±2.32%        2.96 μs        3.21 μs
Baseline.take/2       296.26 K        3.38 μs     ±2.00%        3.33 μs        3.58 μs

Comparison:
Optimised.take/2      335.57 K
Baseline.take/2       296.26 K - 1.13x slower +0.40 μs

##### With input n=1000 overlap=0% #####
Name                       ips        average  deviation         median         99th %
Optimised.take/2      193.65 K        5.16 μs     ±0.90%        5.17 μs        5.29 μs
Baseline.take/2       191.84 K        5.21 μs     ±1.68%        5.17 μs        5.46 μs

Comparison:
Optimised.take/2      193.65 K
Baseline.take/2       191.84 K - 1.01x slower +0.0489 μs

##### With input n=1000 overlap=25% #####
Name                       ips        average  deviation         median         99th %
Optimised.take/2       41.81 K       23.92 μs    ±17.65%       21.58 μs       33.29 μs
Baseline.take/2         4.62 K      216.60 μs     ±1.01%      215.91 μs      223.87 μs

Comparison:
Optimised.take/2       41.81 K
Baseline.take/2         4.62 K - 9.06x slower +192.68 μs

##### With input n=1000 overlap=50% #####
Name                       ips        average  deviation         median         99th %
Optimised.take/2       22.44 K       44.55 μs    ±24.20%       49.42 μs       67.04 μs
Baseline.take/2         2.75 K      364.12 μs     ±1.16%      361.92 μs      376.62 μs

Comparison:
Optimised.take/2       22.44 K
Baseline.take/2         2.75 K - 8.17x slower +319.57 μs

##### With input n=5 overlap=0% #####
Name                       ips        average  deviation         median         99th %
Baseline.take/2        30.00 M       33.33 ns     ±0.14%       33.30 ns       33.40 ns
Optimised.take/2       23.99 M       41.68 ns     ±1.17%          42 ns          43 ns

Comparison:
Baseline.take/2        30.00 M
Optimised.take/2       23.99 M - 1.25x slower +8.35 ns

##### With input n=5 overlap=25% #####
Name                       ips        average  deviation         median         99th %
Baseline.take/2        24.00 M       41.67 ns     ±0.11%       41.70 ns       41.70 ns
Optimised.take/2       17.27 M       57.91 ns    ±35.21%          42 ns          84 ns

Comparison:
Baseline.take/2        24.00 M
Optimised.take/2       17.27 M - 1.39x slower +16.25 ns

##### With input n=5 overlap=50% #####
Name                       ips        average  deviation         median         99th %
Baseline.take/2        24.00 M       41.67 ns     ±0.11%       41.70 ns       41.70 ns
Optimised.take/2       23.23 M       43.05 ns     ±4.77%       41.70 ns       45.90 ns

Comparison:
Baseline.take/2        24.00 M
Optimised.take/2       23.23 M - 1.03x slower +1.39 ns

════════════════════════════════════════════════════════
=== BENCH_F=drop ===
════════════════════════════════════════════════════════
Operating System: macOS
CPU Information: Apple M2 Pro
Number of Available Cores: 10
Available memory: 32 GB
Elixir 1.19.4
Erlang 28.2
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 1 s
time: 3 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: n=10 overlap=0%, n=10 overlap=25%, n=10 overlap=50%, n=100 overlap=0%, n=100 overlap=25%, n=100 overlap=50%, n=1000 overlap=0%, n=1000 overlap=25%, n=1000 overlap=50%, n=5 overlap=0%, n=5 overlap=25%, n=5 overlap=50%
Estimated total run time: 1 min 36 s
Excluding outliers: true


##### With input n=10 overlap=0% #####
Name                       ips        average  deviation         median         99th %
Optimised.drop/2        9.62 M      103.94 ns    ±21.00%          84 ns         166 ns
Baseline.drop/2         9.24 M      108.26 ns    ±20.71%         125 ns         167 ns

Comparison:
Optimised.drop/2        9.62 M
Baseline.drop/2         9.24 M - 1.04x slower +4.32 ns

##### With input n=10 overlap=25% #####
Name                       ips        average  deviation         median         99th %
Baseline.drop/2         8.79 M      113.75 ns    ±17.12%         125 ns         166 ns
Optimised.drop/2        8.74 M      114.38 ns    ±17.71%         125 ns         167 ns

Comparison:
Baseline.drop/2         8.79 M
Optimised.drop/2        8.74 M - 1.01x slower +0.63 ns

##### With input n=10 overlap=50% #####
Name                       ips        average  deviation         median         99th %
Baseline.drop/2         9.15 M      109.26 ns     ±4.96%      108.30 ns      120.90 ns
Optimised.drop/2           8 M         125 ns     ±0.00%         125 ns         125 ns

Comparison:
Baseline.drop/2         9.15 M
Optimised.drop/2           8 M - 1.14x slower +15.74 ns

##### With input n=100 overlap=0% #####
Name                       ips        average  deviation         median         99th %
Optimised.drop/2      861.06 K        1.16 μs     ±4.57%        1.13 μs        1.29 μs
Baseline.drop/2       845.39 K        1.18 μs     ±6.11%        1.17 μs        1.42 μs

Comparison:
Optimised.drop/2      861.06 K
Baseline.drop/2       845.39 K - 1.02x slower +0.0215 μs

##### With input n=100 overlap=25% #####
Name                       ips        average  deviation         median         99th %
Optimised.drop/2      512.53 K        1.95 μs     ±2.94%        1.92 μs        2.08 μs
Baseline.drop/2       500.06 K        2.00 μs     ±2.96%        1.96 μs        2.17 μs

Comparison:
Optimised.drop/2      512.53 K
Baseline.drop/2       500.06 K - 1.02x slower +0.0486 μs

##### With input n=100 overlap=50% #####
Name                       ips        average  deviation         median         99th %
Optimised.drop/2      299.13 K        3.34 μs     ±4.07%        3.33 μs        3.67 μs
Baseline.drop/2       284.29 K        3.52 μs     ±1.86%        3.50 μs        3.71 μs

Comparison:
Optimised.drop/2      299.13 K
Baseline.drop/2       284.29 K - 1.05x slower +0.174 μs

##### With input n=1000 overlap=0% #####
Name                       ips        average  deviation         median         99th %
Baseline.drop/2        81.16 K       12.32 μs     ±1.90%       12.29 μs          13 μs
Optimised.drop/2       78.17 K       12.79 μs     ±3.57%       12.71 μs       14.17 μs

Comparison:
Baseline.drop/2        81.16 K
Optimised.drop/2       78.17 K - 1.04x slower +0.47 μs

##### With input n=1000 overlap=25% #####
Name                       ips        average  deviation         median         99th %
Optimised.drop/2       32.02 K       31.23 μs    ±12.23%       30.71 μs       42.88 μs
Baseline.drop/2         4.54 K      220.35 μs     ±0.70%      220.12 μs      225.29 μs

Comparison:
Optimised.drop/2       32.02 K
Baseline.drop/2         4.54 K - 7.06x slower +189.12 μs

##### With input n=1000 overlap=50% #####
Name                       ips        average  deviation         median         99th %
Optimised.drop/2       19.79 K       50.53 μs    ±21.73%       53.46 μs       74.50 μs
Baseline.drop/2         2.74 K      365.02 μs     ±0.69%      364.33 μs      373.37 μs

Comparison:
Optimised.drop/2       19.79 K
Baseline.drop/2         2.74 K - 7.22x slower +314.50 μs

##### With input n=5 overlap=0% #####
Name                       ips        average  deviation         median         99th %
Optimised.drop/2       21.16 M       47.25 ns     ±5.76%       45.90 ns       54.20 ns
Baseline.drop/2        15.04 M       66.48 ns    ±32.68%          83 ns         125 ns

Comparison:
Optimised.drop/2       21.16 M
Baseline.drop/2        15.04 M - 1.41x slower +19.22 ns

##### With input n=5 overlap=25% #####
Name                       ips        average  deviation         median         99th %
Optimised.drop/2       19.21 M       52.05 ns     ±4.16%          50 ns       54.20 ns
Baseline.drop/2        14.61 M       68.44 ns    ±29.83%          83 ns          84 ns

Comparison:
Optimised.drop/2       19.21 M
Baseline.drop/2        14.61 M - 1.31x slower +16.39 ns

##### With input n=5 overlap=50% #####
Name                       ips        average  deviation         median         99th %
Optimised.drop/2       19.43 M       51.47 ns     ±4.11%          50 ns       54.20 ns
Baseline.drop/2        14.69 M       68.06 ns    ±30.04%          83 ns          84 ns

Comparison:
Optimised.drop/2       19.43 M
Baseline.drop/2        14.69 M - 1.32x slower +16.59 ns

════════════════════════════════════════════════════════
=== BENCH_F=split ===
════════════════════════════════════════════════════════
Operating System: macOS
CPU Information: Apple M2 Pro
Number of Available Cores: 10
Available memory: 32 GB
Elixir 1.19.4
Erlang 28.2
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 1 s
time: 3 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: n=10 overlap=0%, n=10 overlap=25%, n=10 overlap=50%, n=100 overlap=0%, n=100 overlap=25%, n=100 overlap=50%, n=1000 overlap=0%, n=1000 overlap=25%, n=1000 overlap=50%, n=5 overlap=0%, n=5 overlap=25%, n=5 overlap=50%
Estimated total run time: 1 min 36 s
Excluding outliers: true


##### With input n=10 overlap=0% #####
Name                        ips        average  deviation         median         99th %
Baseline.split/2         9.68 M      103.31 ns     ±5.71%         100 ns      116.70 ns
Optimised.split/2        8.75 M      114.25 ns    ±17.58%         125 ns         167 ns

Comparison:
Baseline.split/2         9.68 M
Optimised.split/2        8.75 M - 1.11x slower +10.94 ns

##### With input n=10 overlap=25% #####
Name                        ips        average  deviation         median         99th %
Baseline.split/2         9.30 M      107.56 ns     ±5.84%      104.20 ns         125 ns
Optimised.split/2        9.25 M      108.14 ns     ±5.36%      104.20 ns      120.90 ns

Comparison:
Baseline.split/2         9.30 M
Optimised.split/2        9.25 M - 1.01x slower +0.59 ns

##### With input n=10 overlap=50% #####
Name                        ips        average  deviation         median         99th %
Baseline.split/2         8.23 M      121.51 ns     ±4.52%      120.80 ns      133.40 ns
Optimised.split/2        7.21 M      138.64 ns    ±14.63%         125 ns         167 ns

Comparison:
Baseline.split/2         8.23 M
Optimised.split/2        7.21 M - 1.14x slower +17.13 ns

##### With input n=100 overlap=0% #####
Name                        ips        average  deviation         median         99th %
Optimised.split/2        1.18 M      845.64 ns     ±3.66%         834 ns         917 ns
Baseline.split/2         1.13 M      884.13 ns     ±6.70%         875 ns        1041 ns

Comparison:
Optimised.split/2        1.18 M
Baseline.split/2         1.13 M - 1.05x slower +38.49 ns

##### With input n=100 overlap=25% #####
Name                        ips        average  deviation         median         99th %
Optimised.split/2      538.83 K        1.86 μs     ±3.80%        1.83 μs        2.08 μs
Baseline.split/2       514.06 K        1.95 μs     ±3.21%        1.92 μs        2.13 μs

Comparison:
Optimised.split/2      538.83 K
Baseline.split/2       514.06 K - 1.05x slower +0.0894 μs

##### With input n=100 overlap=50% #####
Name                        ips        average  deviation         median         99th %
Optimised.split/2      315.45 K        3.17 μs     ±2.81%        3.13 μs        3.46 μs
Baseline.split/2       287.76 K        3.48 μs     ±2.18%        3.46 μs        3.71 μs

Comparison:
Optimised.split/2      315.45 K
Baseline.split/2       287.76 K - 1.10x slower +0.31 μs

##### With input n=1000 overlap=0% #####
Name                        ips        average  deviation         median         99th %
Optimised.split/2      123.05 K        8.13 μs     ±6.09%        8.08 μs        9.63 μs
Baseline.split/2       117.63 K        8.50 μs    ±10.27%        8.25 μs       11.54 μs

Comparison:
Optimised.split/2      123.05 K
Baseline.split/2       117.63 K - 1.05x slower +0.38 μs

##### With input n=1000 overlap=25% #####
Name                        ips        average  deviation         median         99th %
Optimised.split/2       38.70 K       25.84 μs     ±9.32%       24.46 μs       34.17 μs
Baseline.split/2         4.52 K      221.40 μs     ±1.22%      221.33 μs      229.63 μs

Comparison:
Optimised.split/2       38.70 K
Baseline.split/2         4.52 K - 8.57x slower +195.56 μs

##### With input n=1000 overlap=50% #####
Name                        ips        average  deviation         median         99th %
Optimised.split/2       17.18 K       58.22 μs    ±13.34%       56.17 μs       78.46 μs
Baseline.split/2         2.67 K      374.65 μs     ±2.71%      371.50 μs      402.71 μs

Comparison:
Optimised.split/2       17.18 K
Baseline.split/2         2.67 K - 6.43x slower +316.42 μs

##### With input n=5 overlap=0% #####
Name                        ips        average  deviation         median         99th %
Baseline.split/2        13.77 M       72.60 ns    ±27.08%          83 ns         125 ns
Optimised.split/2       12.00 M       83.33 ns     ±0.57%          83 ns          84 ns

Comparison:
Baseline.split/2        13.77 M
Optimised.split/2       12.00 M - 1.15x slower +10.73 ns

##### With input n=5 overlap=25% #####
Name                        ips        average  deviation         median         99th %
Baseline.split/2        12.00 M       83.33 ns     ±0.57%          83 ns          84 ns
Optimised.split/2       12.00 M       83.33 ns     ±0.57%          83 ns          84 ns

Comparison:
Baseline.split/2        12.00 M
Optimised.split/2       12.00 M - 1.00x slower +0.00022 ns

##### With input n=5 overlap=50% #####
Name                        ips        average  deviation         median         99th %
Baseline.split/2        17.54 M       57.00 ns     ±3.85%       58.30 ns       62.50 ns
Optimised.split/2       17.19 M       58.16 ns     ±4.01%       58.30 ns       66.70 ns

Comparison:
Baseline.split/2        17.54 M
Optimised.split/2       17.19 M - 1.02x slower +1.16 ns

@josevalim
Copy link
Copy Markdown
Member

@PJUllrich thank you! Instead of MapSet, let's please use Map instead (by using true as a value). It should be more efficient for high peformance algorithms like these. Then we can ship it!

@PJUllrich
Copy link
Copy Markdown
Contributor Author

I replaced the MapSet a Map. It affected merge/3 and the split/take/drop functions. The benchmarks look largely the same to me to be honest, maybe with a slight regression but that could also be my computer being slightly slower than before.

Noticeable changes for n=1000, overlap=50%:

merge/3 went from 16.01x + median: 192 μs to 15.93x + median: 200 μs (slower)
split/2 went from 6.43x + median: 56.17 μs to 6.64x + median: 57.92 μs (faster)
take/2 went from 8.17x + median: 49.42 μs to 7.71x + median: 53.96 μs(slower)
drop/2 went from 7.22x + median: 53.46 μs to 6.81x + median: 55.42 μs (slower)

New Benchmark Results for `merge/3`, `split/2`, `take/2`, and `drop/2`
════════════════════════════════════════════════════════
=== BENCH_F=merge3 ===
════════════════════════════════════════════════════════
Operating System: macOS
CPU Information: Apple M2 Pro
Number of Available Cores: 10
Available memory: 32 GB
Elixir 1.19.4
Erlang 28.2
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 1 s
time: 3 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: n=10 overlap=0%, n=10 overlap=25%, n=10 overlap=50%, n=100 overlap=0%, n=100 overlap=25%, n=100 overlap=50%, n=1000 overlap=0%, n=1000 overlap=25%, n=1000 overlap=50%
Estimated total run time: 1 min 12 s
Excluding outliers: true


##### With input n=10 overlap=0% #####
Name                        ips        average  deviation         median         99th %
Baseline.merge/3         4.80 M      208.33 ns     ±0.23%         208 ns         209 ns
Optimised.merge/3        3.27 M      305.72 ns     ±7.03%         292 ns         375 ns

Comparison:
Baseline.merge/3         4.80 M
Optimised.merge/3        3.27 M - 1.47x slower +97.39 ns

##### With input n=10 overlap=25% #####
Name                        ips        average  deviation         median         99th %
Baseline.merge/3         4.26 M      234.97 ns     ±9.48%         250 ns         292 ns
Optimised.merge/3        2.89 M      346.61 ns     ±7.97%         334 ns         417 ns

Comparison:
Baseline.merge/3         4.26 M
Optimised.merge/3        2.89 M - 1.48x slower +111.64 ns

##### With input n=10 overlap=50% #####
Name                        ips        average  deviation         median         99th %
Baseline.merge/3         3.69 M      270.96 ns     ±8.62%         250 ns         334 ns
Optimised.merge/3        2.59 M      385.56 ns     ±5.88%         375 ns         459 ns

Comparison:
Baseline.merge/3         3.69 M
Optimised.merge/3        2.59 M - 1.42x slower +114.60 ns

##### With input n=100 overlap=0% #####
Name                        ips        average  deviation         median         99th %
Optimised.merge/3      186.02 K        5.38 μs     ±3.75%        5.33 μs        5.96 μs
Baseline.merge/3        96.16 K       10.40 μs     ±0.71%       10.38 μs       10.63 μs

Comparison:
Optimised.merge/3      186.02 K
Baseline.merge/3        96.16 K - 1.93x slower +5.02 μs

##### With input n=100 overlap=25% #####
Name                        ips        average  deviation         median         99th %
Optimised.merge/3      139.69 K        7.16 μs     ±2.83%        7.13 μs        7.71 μs
Baseline.merge/3        47.61 K       21.00 μs     ±1.64%       20.92 μs          22 μs

Comparison:
Optimised.merge/3      139.69 K
Baseline.merge/3        47.61 K - 2.93x slower +13.84 μs

##### With input n=100 overlap=50% #####
Name                        ips        average  deviation         median         99th %
Optimised.merge/3      110.91 K        9.02 μs     ±4.27%        8.92 μs       10.21 μs
Baseline.merge/3        35.50 K       28.17 μs     ±2.16%       28.13 μs       29.63 μs

Comparison:
Optimised.merge/3      110.91 K
Baseline.merge/3        35.50 K - 3.12x slower +19.15 μs

##### With input n=1000 overlap=0% #####
Name                        ips        average  deviation         median         99th %
Optimised.merge/3        8.60 K       0.116 ms     ±9.83%       0.115 ms       0.143 ms
Baseline.merge/3         0.98 K        1.02 ms     ±3.15%        1.01 ms        1.11 ms

Comparison:
Optimised.merge/3        8.60 K
Baseline.merge/3         0.98 K - 8.81x slower +0.91 ms

##### With input n=1000 overlap=25% #####
Name                        ips        average  deviation         median         99th %
Optimised.merge/3        6.26 K       0.160 ms    ±12.70%       0.160 ms        0.21 ms
Baseline.merge/3         0.42 K        2.38 ms     ±2.22%        2.37 ms        2.52 ms

Comparison:
Optimised.merge/3        6.26 K
Baseline.merge/3         0.42 K - 14.90x slower +2.22 ms

##### With input n=1000 overlap=50% #####
Name                        ips        average  deviation         median         99th %
Optimised.merge/3        4.81 K        0.21 ms    ±15.35%       0.200 ms        0.29 ms
Baseline.merge/3         0.30 K        3.31 ms     ±1.85%        3.31 ms        3.47 ms

Comparison:
Optimised.merge/3        4.81 K
Baseline.merge/3         0.30 K - 15.93x slower +3.10 ms

════════════════════════════════════════════════════════
=== BENCH_F=split ===
════════════════════════════════════════════════════════
Operating System: macOS
CPU Information: Apple M2 Pro
Number of Available Cores: 10
Available memory: 32 GB
Elixir 1.19.4
Erlang 28.2
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 1 s
time: 3 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: n=10 overlap=0%, n=10 overlap=25%, n=10 overlap=50%, n=100 overlap=0%, n=100 overlap=25%, n=100 overlap=50%, n=1000 overlap=0%, n=1000 overlap=25%, n=1000 overlap=50%, n=5 overlap=0%, n=5 overlap=25%, n=5 overlap=50%
Estimated total run time: 1 min 36 s
Excluding outliers: true


##### With input n=10 overlap=0% #####
Name                        ips        average  deviation         median         99th %
Baseline.split/2            8 M         125 ns     ±0.00%         125 ns         125 ns
Optimised.split/2        7.35 M      136.01 ns    ±14.04%         125 ns         167 ns

Comparison:
Baseline.split/2            8 M
Optimised.split/2        7.35 M - 1.09x slower +11.01 ns

##### With input n=10 overlap=25% #####
Name                        ips        average  deviation         median         99th %
Baseline.split/2            8 M         125 ns     ±0.00%         125 ns         125 ns
Optimised.split/2        6.54 M      152.97 ns    ±14.51%         166 ns         209 ns

Comparison:
Baseline.split/2            8 M
Optimised.split/2        6.54 M - 1.22x slower +27.97 ns

##### With input n=10 overlap=50% #####
Name                        ips        average  deviation         median         99th %
Baseline.split/2         6.85 M      146.06 ns    ±14.81%         125 ns         167 ns
Optimised.split/2        5.40 M      185.04 ns    ±12.19%         167 ns         250 ns

Comparison:
Baseline.split/2         6.85 M
Optimised.split/2        5.40 M - 1.27x slower +38.98 ns

##### With input n=100 overlap=0% #####
Name                        ips        average  deviation         median         99th %
Baseline.split/2         1.10 M      909.99 ns     ±6.53%         916 ns        1042 ns
Optimised.split/2        1.09 M      920.44 ns     ±5.42%         917 ns        1042 ns

Comparison:
Baseline.split/2         1.10 M
Optimised.split/2        1.09 M - 1.01x slower +10.45 ns

##### With input n=100 overlap=25% #####
Name                        ips        average  deviation         median         99th %
Optimised.split/2      528.77 K        1.89 μs     ±5.27%        1.92 μs        2.13 μs
Baseline.split/2       495.52 K        2.02 μs     ±3.07%           2 μs        2.17 μs

Comparison:
Optimised.split/2      528.77 K
Baseline.split/2       495.52 K - 1.07x slower +0.127 μs

##### With input n=100 overlap=50% #####
Name                        ips        average  deviation         median         99th %
Baseline.split/2       279.06 K        3.58 μs     ±3.05%        3.58 μs        3.88 μs
Optimised.split/2      258.80 K        3.86 μs     ±1.67%        3.83 μs        4.04 μs

Comparison:
Baseline.split/2       279.06 K
Optimised.split/2      258.80 K - 1.08x slower +0.28 μs

##### With input n=1000 overlap=0% #####
Name                        ips        average  deviation         median         99th %
Baseline.split/2       118.00 K        8.47 μs     ±6.27%        8.46 μs       10.04 μs
Optimised.split/2      110.90 K        9.02 μs     ±2.93%        9.08 μs        9.79 μs

Comparison:
Baseline.split/2       118.00 K
Optimised.split/2      110.90 K - 1.06x slower +0.54 μs

##### With input n=1000 overlap=25% #####
Name                        ips        average  deviation         median         99th %
Optimised.split/2       25.85 K       38.68 μs     ±5.95%       38.67 μs       44.67 μs
Baseline.split/2         4.35 K      230.14 μs     ±2.23%      230.92 μs      241.93 μs

Comparison:
Optimised.split/2       25.85 K
Baseline.split/2         4.35 K - 5.95x slower +191.46 μs

##### With input n=1000 overlap=50% #####
Name                        ips        average  deviation         median         99th %
Optimised.split/2       17.26 K       57.94 μs     ±4.89%       57.92 μs       64.79 μs
Baseline.split/2         2.60 K      384.94 μs     ±1.50%      383.13 μs      400.36 μs

Comparison:
Optimised.split/2       17.26 K
Baseline.split/2         2.60 K - 6.64x slower +326.99 μs

##### With input n=5 overlap=0% #####
Name                        ips        average  deviation         median         99th %
Baseline.split/2        12.00 M       83.33 ns     ±0.57%          83 ns          84 ns
Optimised.split/2       12.00 M       83.33 ns     ±0.57%          83 ns          84 ns

Comparison:
Baseline.split/2        12.00 M
Optimised.split/2       12.00 M - 1.00x slower +0.00071 ns

##### With input n=5 overlap=25% #####
Name                        ips        average  deviation         median         99th %
Baseline.split/2        12.00 M       83.33 ns     ±0.57%          83 ns          84 ns
Optimised.split/2       10.17 M       98.36 ns    ±20.63%          84 ns         125 ns

Comparison:
Baseline.split/2        12.00 M
Optimised.split/2       10.17 M - 1.18x slower +15.02 ns

##### With input n=5 overlap=50% #####
Name                        ips        average  deviation         median         99th %
Baseline.split/2        12.00 M       83.33 ns     ±0.57%          83 ns          84 ns
Optimised.split/2        9.24 M      108.24 ns    ±20.03%         125 ns         166 ns

Comparison:
Baseline.split/2        12.00 M
Optimised.split/2        9.24 M - 1.30x slower +24.91 ns

════════════════════════════════════════════════════════
=== BENCH_F=take ===
════════════════════════════════════════════════════════
Operating System: macOS
CPU Information: Apple M2 Pro
Number of Available Cores: 10
Available memory: 32 GB
Elixir 1.19.4
Erlang 28.2
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 1 s
time: 3 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: n=10 overlap=0%, n=10 overlap=25%, n=10 overlap=50%, n=100 overlap=0%, n=100 overlap=25%, n=100 overlap=50%, n=1000 overlap=0%, n=1000 overlap=25%, n=1000 overlap=50%, n=5 overlap=0%, n=5 overlap=25%, n=5 overlap=50%
Estimated total run time: 1 min 36 s
Excluding outliers: true


##### With input n=10 overlap=0% #####
Name                       ips        average  deviation         median         99th %
Baseline.take/2        15.13 M       66.08 ns     ±4.11%       66.70 ns          75 ns
Optimised.take/2        9.86 M      101.39 ns    ±21.06%          84 ns         126 ns

Comparison:
Baseline.take/2        15.13 M
Optimised.take/2        9.86 M - 1.53x slower +35.31 ns

##### With input n=10 overlap=25% #####
Name                       ips        average  deviation         median         99th %
Baseline.take/2         9.76 M      102.49 ns    ±20.95%          84 ns         125 ns
Optimised.take/2        8.52 M      117.36 ns    ±20.07%         125 ns         167 ns

Comparison:
Baseline.take/2         9.76 M
Optimised.take/2        8.52 M - 1.15x slower +14.87 ns

##### With input n=10 overlap=50% #####
Name                       ips        average  deviation         median         99th %
Baseline.take/2         9.67 M      103.40 ns    ±20.70%          84 ns         125 ns
Optimised.take/2        6.58 M      151.88 ns    ±17.07%         166 ns         209 ns

Comparison:
Baseline.take/2         9.67 M
Optimised.take/2        6.58 M - 1.47x slower +48.47 ns

##### With input n=100 overlap=0% #####
Name                       ips        average  deviation         median         99th %
Baseline.take/2         1.85 M      541.67 ns     ±0.09%         542 ns         542 ns
Optimised.take/2        1.58 M      632.72 ns    ±12.86%         584 ns         875 ns

Comparison:
Baseline.take/2         1.85 M
Optimised.take/2        1.58 M - 1.17x slower +91.06 ns

##### With input n=100 overlap=25% #####
Name                       ips        average  deviation         median         99th %
Optimised.take/2      626.78 K        1.60 μs     ±8.45%        1.58 μs           2 μs
Baseline.take/2       556.55 K        1.80 μs     ±3.80%        1.75 μs           2 μs

Comparison:
Optimised.take/2      626.78 K
Baseline.take/2       556.55 K - 1.13x slower +0.20 μs

##### With input n=100 overlap=50% #####
Name                       ips        average  deviation         median         99th %
Baseline.take/2       281.69 K        3.55 μs     ±2.35%        3.54 μs        3.83 μs
Optimised.take/2      259.72 K        3.85 μs     ±1.82%        3.83 μs        4.04 μs

Comparison:
Baseline.take/2       281.69 K
Optimised.take/2      259.72 K - 1.08x slower +0.30 μs

##### With input n=1000 overlap=0% #####
Name                       ips        average  deviation         median         99th %
Optimised.take/2      104.88 K        9.53 μs    ±60.23%        6.83 μs       21.71 μs
Baseline.take/2        98.90 K       10.11 μs    ±58.82%        6.63 μs       21.58 μs

Comparison:
Optimised.take/2      104.88 K
Baseline.take/2        98.90 K - 1.06x slower +0.58 μs

##### With input n=1000 overlap=25% #####
Name                       ips        average  deviation         median         99th %
Optimised.take/2       28.69 K       34.85 μs     ±3.70%       34.50 μs       39.50 μs
Baseline.take/2         3.95 K      253.25 μs     ±4.16%      252.08 μs      280.10 μs

Comparison:
Optimised.take/2       28.69 K
Baseline.take/2         3.95 K - 7.27x slower +218.40 μs

##### With input n=1000 overlap=50% #####
Name                       ips        average  deviation         median         99th %
Optimised.take/2       18.29 K       54.67 μs     ±4.12%       53.96 μs       61.13 μs
Baseline.take/2         2.37 K      421.52 μs     ±3.21%      419.17 μs      455.04 μs

Comparison:
Optimised.take/2       18.29 K
Baseline.take/2         2.37 K - 7.71x slower +366.85 μs

##### With input n=5 overlap=0% #####
Name                       ips        average  deviation         median         99th %
Baseline.take/2        19.05 M       52.49 ns    ±35.25%          42 ns          84 ns
Optimised.take/2       15.91 M       62.86 ns    ±33.37%          83 ns          84 ns

Comparison:
Baseline.take/2        19.05 M
Optimised.take/2       15.91 M - 1.20x slower +10.37 ns

##### With input n=5 overlap=25% #####
Name                       ips        average  deviation         median         99th %
Baseline.take/2        16.20 M       61.72 ns    ±34.22%          42 ns          84 ns
Optimised.take/2       14.11 M       70.87 ns    ±27.89%          83 ns          84 ns

Comparison:
Baseline.take/2        16.20 M
Optimised.take/2       14.11 M - 1.15x slower +9.15 ns

##### With input n=5 overlap=50% #####
Name                       ips        average  deviation         median         99th %
Baseline.take/2        20.66 M       48.39 ns    ±14.86%       45.90 ns       70.80 ns
Optimised.take/2       12.00 M       83.33 ns     ±0.57%          83 ns          84 ns

Comparison:
Baseline.take/2        20.66 M
Optimised.take/2       12.00 M - 1.72x slower +34.94 ns

════════════════════════════════════════════════════════
=== BENCH_F=drop ===
════════════════════════════════════════════════════════
Operating System: macOS
CPU Information: Apple M2 Pro
Number of Available Cores: 10
Available memory: 32 GB
Elixir 1.19.4
Erlang 28.2
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 1 s
time: 3 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: n=10 overlap=0%, n=10 overlap=25%, n=10 overlap=50%, n=100 overlap=0%, n=100 overlap=25%, n=100 overlap=50%, n=1000 overlap=0%, n=1000 overlap=25%, n=1000 overlap=50%, n=5 overlap=0%, n=5 overlap=25%, n=5 overlap=50%
Estimated total run time: 1 min 36 s
Excluding outliers: true


##### With input n=10 overlap=0% #####
Name                       ips        average  deviation         median         99th %
Baseline.drop/2         9.35 M      106.98 ns    ±20.70%         125 ns         167 ns
Optimised.drop/2           8 M         125 ns     ±0.00%         125 ns         125 ns

Comparison:
Baseline.drop/2         9.35 M
Optimised.drop/2           8 M - 1.17x slower +18.02 ns

##### With input n=10 overlap=25% #####
Name                       ips        average  deviation         median         99th %
Baseline.drop/2         9.91 M      100.96 ns     ±5.67%         100 ns      112.50 ns
Optimised.drop/2        6.72 M      148.83 ns    ±19.65%         125 ns         209 ns

Comparison:
Baseline.drop/2         9.91 M
Optimised.drop/2        6.72 M - 1.47x slower +47.87 ns

##### With input n=10 overlap=50% #####
Name                       ips        average  deviation         median         99th %
Baseline.drop/2            8 M         125 ns     ±0.00%         125 ns         125 ns
Optimised.drop/2        6.00 M      166.67 ns     ±0.28%         167 ns         167 ns

Comparison:
Baseline.drop/2            8 M
Optimised.drop/2        6.00 M - 1.33x slower +41.67 ns

##### With input n=100 overlap=0% #####
Name                       ips        average  deviation         median         99th %
Optimised.drop/2      827.59 K        1.21 μs     ±0.04%        1.21 μs        1.21 μs
Baseline.drop/2       807.69 K        1.24 μs     ±8.32%        1.21 μs        1.54 μs

Comparison:
Optimised.drop/2      827.59 K
Baseline.drop/2       807.69 K - 1.02x slower +0.0298 μs

##### With input n=100 overlap=25% #####
Name                       ips        average  deviation         median         99th %
Optimised.drop/2      510.29 K        1.96 μs     ±5.41%           2 μs        2.21 μs
Baseline.drop/2       474.13 K        2.11 μs     ±2.71%        2.08 μs        2.25 μs

Comparison:
Optimised.drop/2      510.29 K
Baseline.drop/2       474.13 K - 1.08x slower +0.149 μs

##### With input n=100 overlap=50% #####
Name                       ips        average  deviation         median         99th %
Baseline.drop/2       270.11 K        3.70 μs     ±1.90%        3.67 μs        3.88 μs
Optimised.drop/2      261.79 K        3.82 μs     ±1.69%        3.79 μs           4 μs

Comparison:
Baseline.drop/2       270.11 K
Optimised.drop/2      261.79 K - 1.03x slower +0.118 μs

##### With input n=1000 overlap=0% #####
Name                       ips        average  deviation         median         99th %
Optimised.drop/2       78.64 K       12.72 μs     ±1.00%       12.71 μs       13.08 μs
Baseline.drop/2        77.81 K       12.85 μs     ±2.33%       12.79 μs       13.75 μs

Comparison:
Optimised.drop/2       78.64 K
Baseline.drop/2        77.81 K - 1.01x slower +0.137 μs

##### With input n=1000 overlap=25% #####
Name                       ips        average  deviation         median         99th %
Optimised.drop/2       25.26 K       39.59 μs     ±2.44%       39.33 μs       42.21 μs
Baseline.drop/2         4.33 K      231.18 μs     ±1.55%      230.50 μs      241.66 μs

Comparison:
Optimised.drop/2       25.26 K
Baseline.drop/2         4.33 K - 5.84x slower +191.59 μs

##### With input n=1000 overlap=50% #####
Name                       ips        average  deviation         median         99th %
Optimised.drop/2       17.81 K       56.15 μs     ±3.81%       55.42 μs       62.33 μs
Baseline.drop/2         2.62 K      382.36 μs     ±2.18%      380.33 μs      404.13 μs

Comparison:
Optimised.drop/2       17.81 K
Baseline.drop/2         2.62 K - 6.81x slower +326.21 μs

##### With input n=5 overlap=0% #####
Name                       ips        average  deviation         median         99th %
Baseline.drop/2        14.95 M       66.91 ns    ±31.68%          83 ns          84 ns
Optimised.drop/2       12.00 M       83.33 ns     ±0.57%          83 ns          84 ns

Comparison:
Baseline.drop/2        14.95 M
Optimised.drop/2       12.00 M - 1.25x slower +16.42 ns

##### With input n=5 overlap=25% #####
Name                       ips        average  deviation         median         99th %
Baseline.drop/2        14.82 M       67.47 ns    ±30.58%          83 ns          84 ns
Optimised.drop/2       12.00 M       83.33 ns     ±0.57%          83 ns          84 ns

Comparison:
Baseline.drop/2        14.82 M
Optimised.drop/2       12.00 M - 1.24x slower +15.86 ns

##### With input n=5 overlap=50% #####
Name                       ips        average  deviation         median         99th %
Baseline.drop/2        19.66 M       50.86 ns     ±3.70%          50 ns       54.20 ns
Optimised.drop/2       10.07 M       99.30 ns    ±22.00%          84 ns         167 ns

Comparison:
Baseline.drop/2        19.66 M
Optimised.drop/2       10.07 M - 1.95x slower +48.44 ns

@josevalim
Copy link
Copy Markdown
Member

CI is failing. You can reproduce it with make clean compile. We cannot use MapSet during bootstrap, so there are likely a few other usages around.

@PJUllrich
Copy link
Copy Markdown
Contributor Author

I replaced the Map again with a MapSet and re-ran the benchmarks as a sanity check. It looks like the differences between MapSet and Map were just noise. I'll stick to Map and fix the CI.

merge/3 now again 16.22x + median: 210μs
split/2 now again 6.48x + median: 57.25 μs
take/2 now again 7.16x + median: 51.96 μs
drop/2 now again 6.72x + median: 53.79 μs

@josevalim josevalim merged commit d50b032 into elixir-lang:main May 14, 2026
15 checks passed
@josevalim
Copy link
Copy Markdown
Member

💚 💙 💜 💛 ❤️

@PJUllrich PJUllrich deleted the improve-keyword-performance branch May 14, 2026 13:28
@sabiwara
Copy link
Copy Markdown
Contributor

sabiwara commented May 14, 2026

I tried some quick benchmarks, I'm a bit concerned these will make things slower for typical use cases (keywords should typically be quite small, given these are the wrong data structures otherwise).
Especially the memory use increase is a bit concerning.

Keyword.merge/2 1.20x slower 5.17x memory
Keyword.take/2 1.63x slower 6.83x memory

josevalim added a commit that referenced this pull request May 14, 2026
@josevalim
Copy link
Copy Markdown
Member

I have reverted this for now in main. @PJUllrich can you please do those again? We need to consider keyword lists with 2, 5, 10 and 20 elements as well.

@PJUllrich
Copy link
Copy Markdown
Contributor Author

@josevalim no problem. That was also a concern on my mind.

I re-ran the benchmarks and added the 2, 5, 10, and 20 element inputs. Because each iteration would take nanoseconds and is hard to measure, I measured 100 iterations for the small sizes and 1 iteration for the large sizes (100, 1000)

New Benchmark Results
════════════════════════════════════════════════════════
=== BENCH_F=new2 ===
════════════════════════════════════════════════════════
Operating System: macOS
CPU Information: Apple M2 Pro
Number of Available Cores: 10
Available memory: 32 GB
Elixir 1.19.4
Erlang 28.2
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 1 s
time: 3 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: n=10 dup=0%, n=10 dup=25%, n=10 dup=50%, n=100 dup=0%, n=100 dup=25%, n=100 dup=50%, n=1000 dup=0%, n=1000 dup=25%, n=1000 dup=50%
Estimated total run time: 1 min 12 s
Excluding outliers: true


##### With input n=10 dup=0% #####
Name                      ips        average  deviation         median         99th %
Baseline.new/2         2.10 M      476.42 ns     ±4.76%         459 ns         542 ns
Optimised.new/2        1.97 M      507.87 ns     ±4.21%         500 ns         583 ns

Comparison:
Baseline.new/2         2.10 M
Optimised.new/2        1.97 M - 1.07x slower +31.44 ns

##### With input n=10 dup=25% #####
Name                      ips        average  deviation         median         99th %
Baseline.new/2         2.13 M      470.23 ns     ±4.61%         459 ns         542 ns
Optimised.new/2        2.12 M      470.90 ns     ±4.67%         459 ns         542 ns

Comparison:
Baseline.new/2         2.13 M
Optimised.new/2        2.12 M - 1.00x slower +0.67 ns

##### With input n=10 dup=50% #####
Name                      ips        average  deviation         median         99th %
Optimised.new/2        2.40 M      416.67 ns     ±0.11%         417 ns         417 ns
Baseline.new/2         2.27 M      440.40 ns     ±5.21%         458 ns         500 ns

Comparison:
Optimised.new/2        2.40 M
Baseline.new/2         2.27 M - 1.06x slower +23.73 ns

##### With input n=100 dup=0% #####
Name                      ips        average  deviation         median         99th %
Optimised.new/2      119.83 K        8.35 μs     ±1.59%        8.33 μs        8.71 μs
Baseline.new/2       109.14 K        9.16 μs     ±2.03%        9.13 μs        9.75 μs

Comparison:
Optimised.new/2      119.83 K
Baseline.new/2       109.14 K - 1.10x slower +0.82 μs

##### With input n=100 dup=25% #####
Name                      ips        average  deviation         median         99th %
Optimised.new/2      140.25 K        7.13 μs     ±1.96%        7.13 μs        7.54 μs
Baseline.new/2       116.86 K        8.56 μs     ±2.29%        8.54 μs        9.08 μs

Comparison:
Optimised.new/2      140.25 K
Baseline.new/2       116.86 K - 1.20x slower +1.43 μs

##### With input n=100 dup=50% #####
Name                      ips        average  deviation         median         99th %
Optimised.new/2      160.32 K        6.24 μs     ±1.71%        6.21 μs        6.54 μs
Baseline.new/2       147.61 K        6.77 μs     ±2.49%        6.75 μs        7.25 μs

Comparison:
Optimised.new/2      160.32 K
Baseline.new/2       147.61 K - 1.09x slower +0.54 μs

##### With input n=1000 dup=0% #####
Name                      ips        average  deviation         median         99th %
Optimised.new/2        7.33 K      136.36 μs    ±12.58%      133.96 μs      184.33 μs
Baseline.new/2         1.77 K      563.80 μs     ±3.18%      560.54 μs      612.70 μs

Comparison:
Optimised.new/2        7.33 K
Baseline.new/2         1.77 K - 4.13x slower +427.44 μs

##### With input n=1000 dup=25% #####
Name                      ips        average  deviation         median         99th %
Optimised.new/2        8.89 K      112.53 μs     ±8.84%      112.38 μs      137.88 μs
Baseline.new/2         2.01 K      498.39 μs     ±3.33%      495.25 μs      544.45 μs

Comparison:
Optimised.new/2        8.89 K
Baseline.new/2         2.01 K - 4.43x slower +385.86 μs

##### With input n=1000 dup=50% #####
Name                      ips        average  deviation         median         99th %
Optimised.new/2       12.24 K       81.71 μs     ±5.84%       79.21 μs       95.83 μs
Baseline.new/2         3.22 K      310.27 μs     ±3.11%      308.96 μs      336.54 μs

Comparison:
Optimised.new/2       12.24 K
Baseline.new/2         3.22 K - 3.80x slower +228.56 μs

════════════════════════════════════════════════════════
=== BENCH_F=merge2 ===
════════════════════════════════════════════════════════
Operating System: macOS
CPU Information: Apple M2 Pro
Number of Available Cores: 10
Available memory: 32 GB
Elixir 1.19.4
Erlang 28.2
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 1 s
time: 3 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: n=10 overlap=0%, n=10 overlap=25%, n=10 overlap=50%, n=100 overlap=0%, n=100 overlap=25%, n=100 overlap=50%, n=1000 overlap=0%, n=1000 overlap=25%, n=1000 overlap=50%
Estimated total run time: 1 min 12 s
Excluding outliers: true


##### With input n=10 overlap=0% #####
Name                        ips        average  deviation         median         99th %
Baseline.merge/2         4.37 M      229.02 ns     ±9.86%         209 ns         292 ns
Optimised.merge/2        4.33 M      231.09 ns    ±10.17%         250 ns         292 ns

Comparison:
Baseline.merge/2         4.37 M
Optimised.merge/2        4.33 M - 1.01x slower +2.07 ns

##### With input n=10 overlap=25% #####
Name                        ips        average  deviation         median         99th %
Baseline.merge/2         4.80 M      208.33 ns     ±0.23%         208 ns         209 ns
Optimised.merge/2        4.37 M      228.94 ns    ±10.53%         209 ns         292 ns

Comparison:
Baseline.merge/2         4.80 M
Optimised.merge/2        4.37 M - 1.10x slower +20.61 ns

##### With input n=10 overlap=50% #####
Name                        ips        average  deviation         median         99th %
Baseline.merge/2         5.13 M      194.79 ns    ±11.16%         208 ns         250 ns
Optimised.merge/2        4.53 M      220.54 ns     ±9.94%         209 ns         292 ns

Comparison:
Baseline.merge/2         5.13 M
Optimised.merge/2        4.53 M - 1.13x slower +25.76 ns

##### With input n=100 overlap=0% #####
Name                        ips        average  deviation         median         99th %
Optimised.merge/2      192.52 K        5.19 μs     ±2.04%        5.17 μs        5.50 μs
Baseline.merge/2        89.16 K       11.22 μs     ±1.00%       11.21 μs       11.50 μs

Comparison:
Optimised.merge/2      192.52 K
Baseline.merge/2        89.16 K - 2.16x slower +6.02 μs

##### With input n=100 overlap=25% #####
Name                        ips        average  deviation         median         99th %
Optimised.merge/2      214.85 K        4.65 μs     ±2.52%        4.63 μs        5.04 μs
Baseline.merge/2       113.68 K        8.80 μs     ±1.29%        8.79 μs        9.08 μs

Comparison:
Optimised.merge/2      214.85 K
Baseline.merge/2       113.68 K - 1.89x slower +4.14 μs

##### With input n=100 overlap=50% #####
Name                        ips        average  deviation         median         99th %
Optimised.merge/2      213.67 K        4.68 μs     ±2.30%        4.67 μs           5 μs
Baseline.merge/2       136.24 K        7.34 μs     ±1.28%        7.33 μs        7.58 μs

Comparison:
Optimised.merge/2      213.67 K
Baseline.merge/2       136.24 K - 1.57x slower +2.66 μs

##### With input n=1000 overlap=0% #####
Name                        ips        average  deviation         median         99th %
Optimised.merge/2        8.64 K       0.116 ms     ±3.59%       0.115 ms       0.128 ms
Baseline.merge/2         0.95 K        1.05 ms     ±2.34%        1.04 ms        1.11 ms

Comparison:
Optimised.merge/2        8.64 K
Baseline.merge/2         0.95 K - 9.06x slower +0.93 ms

##### With input n=1000 overlap=25% #####
Name                        ips        average  deviation         median         99th %
Optimised.merge/2       11.35 K       88.12 μs     ±5.30%       86.38 μs      104.71 μs
Baseline.merge/2         1.21 K      825.85 μs     ±2.40%      822.29 μs      878.35 μs

Comparison:
Optimised.merge/2       11.35 K
Baseline.merge/2         1.21 K - 9.37x slower +737.73 μs

##### With input n=1000 overlap=50% #####
Name                        ips        average  deviation         median         99th %
Optimised.merge/2       10.34 K       96.69 μs    ±11.68%       94.63 μs      127.71 μs
Baseline.merge/2         1.51 K      661.74 μs     ±2.47%      658.31 μs      707.14 μs

Comparison:
Optimised.merge/2       10.34 K
Baseline.merge/2         1.51 K - 6.84x slower +565.05 μs

════════════════════════════════════════════════════════
=== BENCH_F=merge3 ===
════════════════════════════════════════════════════════
Operating System: macOS
CPU Information: Apple M2 Pro
Number of Available Cores: 10
Available memory: 32 GB
Elixir 1.19.4
Erlang 28.2
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 1 s
time: 3 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: n=10 overlap=0%, n=10 overlap=25%, n=10 overlap=50%, n=100 overlap=0%, n=100 overlap=25%, n=100 overlap=50%, n=1000 overlap=0%, n=1000 overlap=25%, n=1000 overlap=50%
Estimated total run time: 1 min 12 s
Excluding outliers: true


##### With input n=10 overlap=0% #####
Name                        ips        average  deviation         median         99th %
Baseline.merge/3         5.23 M      191.28 ns     ±2.67%      191.60 ns      204.20 ns
Optimised.merge/3        2.96 M      337.29 ns    ±12.79%         333 ns         458 ns

Comparison:
Baseline.merge/3         5.23 M
Optimised.merge/3        2.96 M - 1.76x slower +146.01 ns

##### With input n=10 overlap=25% #####
Name                        ips        average  deviation         median         99th %
Baseline.merge/3            4 M         250 ns     ±0.00%         250 ns         250 ns
Optimised.merge/3        2.76 M      362.75 ns    ±12.42%         334 ns         500 ns

Comparison:
Baseline.merge/3            4 M
Optimised.merge/3        2.76 M - 1.45x slower +112.75 ns

##### With input n=10 overlap=50% #####
Name                        ips        average  deviation         median         99th %
Baseline.merge/3         3.43 M      291.67 ns     ±0.16%         292 ns         292 ns
Optimised.merge/3        2.38 M      420.55 ns    ±11.93%         416 ns         542 ns

Comparison:
Baseline.merge/3         3.43 M
Optimised.merge/3        2.38 M - 1.44x slower +128.89 ns

##### With input n=100 overlap=0% #####
Name                        ips        average  deviation         median         99th %
Optimised.merge/3      168.99 K        5.92 μs     ±9.20%        5.67 μs        7.54 μs
Baseline.merge/3        93.19 K       10.73 μs     ±1.19%       10.71 μs       11.17 μs

Comparison:
Optimised.merge/3      168.99 K
Baseline.merge/3        93.19 K - 1.81x slower +4.81 μs

##### With input n=100 overlap=25% #####
Name                        ips        average  deviation         median         99th %
Optimised.merge/3      136.01 K        7.35 μs     ±8.79%        7.13 μs        9.29 μs
Baseline.merge/3        35.82 K       27.92 μs    ±24.97%          25 μs       44.04 μs

Comparison:
Optimised.merge/3      136.01 K
Baseline.merge/3        35.82 K - 3.80x slower +20.57 μs

##### With input n=100 overlap=50% #####
Name                        ips        average  deviation         median         99th %
Optimised.merge/3      107.84 K        9.27 μs    ±10.46%        8.92 μs       12.58 μs
Baseline.merge/3        22.71 K       44.03 μs    ±10.34%       44.29 μs       54.79 μs

Comparison:
Optimised.merge/3      107.84 K
Baseline.merge/3        22.71 K - 4.75x slower +34.76 μs

##### With input n=1000 overlap=0% #####
Name                        ips        average  deviation         median         99th %
Optimised.merge/3        7.14 K       0.140 ms    ±13.29%       0.140 ms       0.186 ms
Baseline.merge/3         0.95 K        1.06 ms     ±2.59%        1.05 ms        1.13 ms

Comparison:
Optimised.merge/3        7.14 K
Baseline.merge/3         0.95 K - 7.54x slower +0.92 ms

##### With input n=1000 overlap=25% #####
Name                        ips        average  deviation         median         99th %
Optimised.merge/3        5.91 K       0.169 ms    ±12.15%       0.172 ms        0.23 ms
Baseline.merge/3         0.29 K        3.41 ms     ±5.11%        3.40 ms        3.83 ms

Comparison:
Optimised.merge/3        5.91 K
Baseline.merge/3         0.29 K - 20.12x slower +3.24 ms

##### With input n=1000 overlap=50% #####
Name                        ips        average  deviation         median         99th %
Optimised.merge/3        4.45 K        0.22 ms    ±17.54%        0.21 ms        0.34 ms
Baseline.merge/3         0.20 K        4.88 ms     ±5.03%        4.92 ms        5.35 ms

Comparison:
Optimised.merge/3        4.45 K
Baseline.merge/3         0.20 K - 21.72x slower +4.66 ms

════════════════════════════════════════════════════════
=== BENCH_F=pop ===
════════════════════════════════════════════════════════
Operating System: macOS
CPU Information: Apple M2 Pro
Number of Available Cores: 10
Available memory: 32 GB
Elixir 1.19.4
Erlang 28.2
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 1 s
time: 3 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: n=10 pos=25%, n=10 pos=50%, n=10 pos=75%, n=100 pos=25%, n=100 pos=50%, n=100 pos=75%, n=1000 pos=25%, n=1000 pos=50%, n=1000 pos=75%
Estimated total run time: 1 min 12 s
Excluding outliers: true


##### With input n=10 pos=25% #####
Name                      ips        average  deviation         median         99th %
Optimised.pop/3       31.51 M       31.74 ns     ±7.43%       33.30 ns       37.50 ns
Baseline.pop/3        21.06 M       47.47 ns     ±4.75%       45.90 ns       54.20 ns

Comparison:
Optimised.pop/3       31.51 M
Baseline.pop/3        21.06 M - 1.50x slower +15.74 ns

##### With input n=10 pos=50% #####
Name                      ips        average  deviation         median         99th %
Optimised.pop/3       32.08 M       31.17 ns     ±7.85%       29.20 ns       37.50 ns
Baseline.pop/3        14.68 M       68.12 ns    ±30.05%          83 ns          84 ns

Comparison:
Optimised.pop/3       32.08 M
Baseline.pop/3        14.68 M - 2.19x slower +36.95 ns

##### With input n=10 pos=75% #####
Name                      ips        average  deviation         median         99th %
Optimised.pop/3       24.00 M       41.67 ns     ±1.13%          42 ns          42 ns
Baseline.pop/3        17.97 M       55.65 ns     ±3.95%       54.20 ns       62.50 ns

Comparison:
Optimised.pop/3       24.00 M
Baseline.pop/3        17.97 M - 1.34x slower +13.99 ns

##### With input n=100 pos=25% #####
Name                      ips        average  deviation         median         99th %
Optimised.pop/3           2 M         500 ns     ±0.00%         500 ns         500 ns
Baseline.pop/3         1.25 M      802.43 ns     ±3.31%         792 ns         875 ns

Comparison:
Optimised.pop/3           2 M
Baseline.pop/3         1.25 M - 1.60x slower +302.43 ns

##### With input n=100 pos=50% #####
Name                      ips        average  deviation         median         99th %
Optimised.pop/3        1.94 M      515.36 ns     ±5.32%         500 ns         584 ns
Baseline.pop/3         1.17 M      856.53 ns     ±3.06%         875 ns         917 ns

Comparison:
Optimised.pop/3        1.94 M
Baseline.pop/3         1.17 M - 1.66x slower +341.17 ns

##### With input n=100 pos=75% #####
Name                      ips        average  deviation         median         99th %
Optimised.pop/3        3.31 M      302.15 ns     ±9.17%         292 ns         375 ns
Baseline.pop/3         1.11 M      902.85 ns     ±2.84%         916 ns         959 ns

Comparison:
Optimised.pop/3        3.31 M
Baseline.pop/3         1.11 M - 2.99x slower +600.70 ns

##### With input n=1000 pos=25% #####
Name                      ips        average  deviation         median         99th %
Optimised.pop/3      162.72 K        6.15 μs     ±2.57%        6.13 μs        6.58 μs
Baseline.pop/3       128.42 K        7.79 μs     ±2.02%        7.75 μs        8.25 μs

Comparison:
Optimised.pop/3      162.72 K
Baseline.pop/3       128.42 K - 1.27x slower +1.64 μs

##### With input n=1000 pos=50% #####
Name                      ips        average  deviation         median         99th %
Optimised.pop/3      197.22 K        5.07 μs     ±3.35%        5.04 μs        5.54 μs
Baseline.pop/3       117.25 K        8.53 μs     ±4.99%        8.42 μs        9.92 μs

Comparison:
Optimised.pop/3      197.22 K
Baseline.pop/3       117.25 K - 1.68x slower +3.46 μs

##### With input n=1000 pos=75% #####
Name                      ips        average  deviation         median         99th %
Optimised.pop/3      247.95 K        4.03 μs     ±4.12%           4 μs        4.50 μs
Baseline.pop/3       110.89 K        9.02 μs     ±5.03%        8.83 μs       10.33 μs

Comparison:
Optimised.pop/3      247.95 K
Baseline.pop/3       110.89 K - 2.24x slower +4.98 μs

════════════════════════════════════════════════════════
=== BENCH_F=take ===
════════════════════════════════════════════════════════
Operating System: macOS
CPU Information: Apple M2 Pro
Number of Available Cores: 10
Available memory: 32 GB
Elixir 1.19.4
Erlang 28.2
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 1 s
time: 3 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: n=10 overlap=0% (x100), n=10 overlap=25% (x100), n=10 overlap=50% (x100), n=2 overlap=0% (x100), n=2 overlap=25% (x100), n=2 overlap=50% (x100), n=20 overlap=0% (x100), n=20 overlap=25% (x100), n=20 overlap=50% (x100), n=5 overlap=0% (x100), n=5 overlap=25% (x100), n=5 overlap=50% (x100)
Estimated total run time: 1 min 36 s
Excluding outliers: true


##### With input n=10 overlap=0% (x100) #####
Name                            ips        average  deviation         median         99th %
Baseline.take/2 x100       152.77 K        6.55 μs     ±1.93%        6.50 μs        6.92 μs
Optimised.take/2 x100      126.91 K        7.88 μs     ±1.03%        7.88 μs        8.13 μs

Comparison:
Baseline.take/2 x100       152.77 K
Optimised.take/2 x100      126.91 K - 1.20x slower +1.33 μs

##### With input n=10 overlap=25% (x100) #####
Name                            ips        average  deviation         median         99th %
Baseline.take/2 x100       125.30 K        7.98 μs     ±1.16%           8 μs        8.21 μs
Optimised.take/2 x100      100.40 K        9.96 μs     ±2.44%        9.96 μs       10.58 μs

Comparison:
Baseline.take/2 x100       125.30 K
Optimised.take/2 x100      100.40 K - 1.25x slower +1.98 μs

##### With input n=10 overlap=50% (x100) #####
Name                            ips        average  deviation         median         99th %
Baseline.take/2 x100       114.87 K        8.71 μs     ±1.35%        8.71 μs           9 μs
Optimised.take/2 x100       69.57 K       14.37 μs     ±2.00%       14.38 μs       15.13 μs

Comparison:
Baseline.take/2 x100       114.87 K
Optimised.take/2 x100       69.57 K - 1.65x slower +5.67 μs

##### With input n=2 overlap=0% (x100) #####
Name                            ips        average  deviation         median         99th %
Baseline.take/2 x100       540.29 K        1.85 μs     ±4.52%        1.83 μs        2.04 μs
Optimised.take/2 x100      300.44 K        3.33 μs     ±2.74%        3.33 μs        3.58 μs

Comparison:
Baseline.take/2 x100       540.29 K
Optimised.take/2 x100      300.44 K - 1.80x slower +1.48 μs

##### With input n=2 overlap=25% (x100) #####
Name                            ips        average  deviation         median         99th %
Baseline.take/2 x100       537.59 K        1.86 μs     ±5.22%        1.83 μs        2.17 μs
Optimised.take/2 x100      300.89 K        3.32 μs     ±2.50%        3.33 μs        3.54 μs

Comparison:
Baseline.take/2 x100       537.59 K
Optimised.take/2 x100      300.89 K - 1.79x slower +1.46 μs

##### With input n=2 overlap=50% (x100) #####
Name                            ips        average  deviation         median         99th %
Baseline.take/2 x100       456.35 K        2.19 μs     ±3.76%        2.21 μs        2.38 μs
Optimised.take/2 x100      226.34 K        4.42 μs     ±7.90%        4.38 μs        5.38 μs

Comparison:
Baseline.take/2 x100       456.35 K
Optimised.take/2 x100      226.34 K - 2.02x slower +2.23 μs

##### With input n=20 overlap=0% (x100) #####
Name                            ips        average  deviation         median         99th %
Baseline.take/2 x100        85.56 K       11.69 μs     ±1.71%       11.63 μs       12.25 μs
Optimised.take/2 x100       77.06 K       12.98 μs     ±0.87%       12.96 μs       13.29 μs

Comparison:
Baseline.take/2 x100        85.56 K
Optimised.take/2 x100       77.06 K - 1.11x slower +1.29 μs

##### With input n=20 overlap=25% (x100) #####
Name                            ips        average  deviation         median         99th %
Baseline.take/2 x100        59.52 K       16.80 μs     ±0.67%       16.79 μs       17.13 μs
Optimised.take/2 x100       46.07 K       21.70 μs     ±1.67%       21.71 μs       22.75 μs

Comparison:
Baseline.take/2 x100        59.52 K
Optimised.take/2 x100       46.07 K - 1.29x slower +4.90 μs

##### With input n=20 overlap=50% (x100) #####
Name                            ips        average  deviation         median         99th %
Baseline.take/2 x100        50.71 K       19.72 μs     ±1.62%       19.75 μs       20.50 μs
Optimised.take/2 x100       32.39 K       30.88 μs     ±1.69%       30.88 μs       32.38 μs

Comparison:
Baseline.take/2 x100        50.71 K
Optimised.take/2 x100       32.39 K - 1.57x slower +11.16 μs

##### With input n=5 overlap=0% (x100) #####
Name                            ips        average  deviation         median         99th %
Baseline.take/2 x100       295.74 K        3.38 μs     ±2.44%        3.38 μs        3.58 μs
Optimised.take/2 x100      205.17 K        4.87 μs     ±1.85%        4.88 μs        5.13 μs

Comparison:
Baseline.take/2 x100       295.74 K
Optimised.take/2 x100      205.17 K - 1.44x slower +1.49 μs

##### With input n=5 overlap=25% (x100) #####
Name                            ips        average  deviation         median         99th %
Baseline.take/2 x100       243.14 K        4.11 μs     ±1.46%        4.13 μs        4.25 μs
Optimised.take/2 x100      168.80 K        5.92 μs     ±4.39%        5.88 μs        6.67 μs

Comparison:
Baseline.take/2 x100       243.14 K
Optimised.take/2 x100      168.80 K - 1.44x slower +1.81 μs

##### With input n=5 overlap=50% (x100) #####
Name                            ips        average  deviation         median         99th %
Baseline.take/2 x100       230.39 K        4.34 μs     ±6.83%        4.21 μs        5.17 μs
Optimised.take/2 x100      137.29 K        7.28 μs     ±5.96%        7.13 μs        8.58 μs

Comparison:
Baseline.take/2 x100       230.39 K
Optimised.take/2 x100      137.29 K - 1.68x slower +2.94 μs
Operating System: macOS
CPU Information: Apple M2 Pro
Number of Available Cores: 10
Available memory: 32 GB
Elixir 1.19.4
Erlang 28.2
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 1 s
time: 3 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: n=100 overlap=0%, n=100 overlap=25%, n=100 overlap=50%, n=1000 overlap=0%, n=1000 overlap=25%, n=1000 overlap=50%
Estimated total run time: 48 s
Excluding outliers: true


##### With input n=100 overlap=0% #####
Name                       ips        average  deviation         median         99th %
Baseline.take/2         1.85 M      541.68 ns     ±0.09%         542 ns         542 ns
Optimised.take/2        1.77 M      563.76 ns     ±4.12%         583 ns         625 ns

Comparison:
Baseline.take/2         1.85 M
Optimised.take/2        1.77 M - 1.04x slower +22.08 ns

##### With input n=100 overlap=25% #####
Name                       ips        average  deviation         median         99th %
Optimised.take/2      587.04 K        1.70 μs     ±6.00%        1.71 μs        1.96 μs
Baseline.take/2       576.19 K        1.74 μs     ±3.59%        1.71 μs        1.92 μs

Comparison:
Optimised.take/2      587.04 K
Baseline.take/2       576.19 K - 1.02x slower +0.0321 μs

##### With input n=100 overlap=50% #####
Name                       ips        average  deviation         median         99th %
Baseline.take/2       289.29 K        3.46 μs     ±2.60%        3.46 μs        3.71 μs
Optimised.take/2      280.45 K        3.57 μs     ±2.09%        3.54 μs        3.75 μs

Comparison:
Baseline.take/2       289.29 K
Optimised.take/2      280.45 K - 1.03x slower +0.109 μs

##### With input n=1000 overlap=0% #####
Name                       ips        average  deviation         median         99th %
Baseline.take/2       190.71 K        5.24 μs     ±2.07%        5.21 μs        5.58 μs
Optimised.take/2      185.41 K        5.39 μs     ±1.55%        5.42 μs        5.58 μs

Comparison:
Baseline.take/2       190.71 K
Optimised.take/2      185.41 K - 1.03x slower +0.150 μs

##### With input n=1000 overlap=25% #####
Name                       ips        average  deviation         median         99th %
Optimised.take/2       29.62 K       33.76 μs     ±3.02%       33.58 μs       36.88 μs
Baseline.take/2         4.46 K      224.21 μs     ±3.68%      222.25 μs      248.13 μs

Comparison:
Optimised.take/2       29.62 K
Baseline.take/2         4.46 K - 6.64x slower +190.45 μs

##### With input n=1000 overlap=50% #####
Name                       ips        average  deviation         median         99th %
Optimised.take/2       18.85 K       53.04 μs     ±4.89%       52.25 μs       60.33 μs
Baseline.take/2         2.68 K      372.88 μs     ±3.02%      369.72 μs      405.30 μs

Comparison:
Optimised.take/2       18.85 K
Baseline.take/2         2.68 K - 7.03x slower +319.83 μs

════════════════════════════════════════════════════════
=== BENCH_F=drop ===
════════════════════════════════════════════════════════
Operating System: macOS
CPU Information: Apple M2 Pro
Number of Available Cores: 10
Available memory: 32 GB
Elixir 1.19.4
Erlang 28.2
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 1 s
time: 3 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: n=10 overlap=0% (x100), n=10 overlap=25% (x100), n=10 overlap=50% (x100), n=2 overlap=0% (x100), n=2 overlap=25% (x100), n=2 overlap=50% (x100), n=20 overlap=0% (x100), n=20 overlap=25% (x100), n=20 overlap=50% (x100), n=5 overlap=0% (x100), n=5 overlap=25% (x100), n=5 overlap=50% (x100)
Estimated total run time: 1 min 36 s
Excluding outliers: true


##### With input n=10 overlap=0% (x100) #####
Name                            ips        average  deviation         median         99th %
Baseline.drop/2 x100       108.45 K        9.22 μs     ±2.94%        9.21 μs        9.96 μs
Optimised.drop/2 x100       98.55 K       10.15 μs     ±1.63%       10.13 μs       10.58 μs

Comparison:
Baseline.drop/2 x100       108.45 K
Optimised.drop/2 x100       98.55 K - 1.10x slower +0.93 μs

##### With input n=10 overlap=25% (x100) #####
Name                            ips        average  deviation         median         99th %
Baseline.drop/2 x100       101.53 K        9.85 μs     ±2.23%        9.79 μs       10.79 μs
Optimised.drop/2 x100       67.96 K       14.71 μs     ±4.32%       14.75 μs       16.20 μs

Comparison:
Baseline.drop/2 x100       101.53 K
Optimised.drop/2 x100       67.96 K - 1.49x slower +4.87 μs

##### With input n=10 overlap=50% (x100) #####
Name                            ips        average  deviation         median         99th %
Baseline.drop/2 x100        90.17 K       11.09 μs     ±2.05%       11.08 μs       11.71 μs
Optimised.drop/2 x100       57.33 K       17.44 μs     ±3.78%       17.54 μs          19 μs

Comparison:
Baseline.drop/2 x100        90.17 K
Optimised.drop/2 x100       57.33 K - 1.57x slower +6.35 μs

##### With input n=2 overlap=0% (x100) #####
Name                            ips        average  deviation         median         99th %
Baseline.drop/2 x100       383.73 K        2.61 μs     ±7.43%        2.58 μs        3.13 μs
Optimised.drop/2 x100      262.59 K        3.81 μs     ±2.43%        3.83 μs        4.04 μs

Comparison:
Baseline.drop/2 x100       383.73 K
Optimised.drop/2 x100      262.59 K - 1.46x slower +1.20 μs

##### With input n=2 overlap=25% (x100) #####
Name                            ips        average  deviation         median         99th %
Baseline.drop/2 x100       384.47 K        2.60 μs     ±7.37%        2.58 μs        3.13 μs
Optimised.drop/2 x100      262.87 K        3.80 μs     ±2.29%        3.83 μs        4.04 μs

Comparison:
Baseline.drop/2 x100       384.47 K
Optimised.drop/2 x100      262.87 K - 1.46x slower +1.20 μs

##### With input n=2 overlap=50% (x100) #####
Name                            ips        average  deviation         median         99th %
Baseline.drop/2 x100       401.84 K        2.49 μs     ±3.34%        2.50 μs        2.71 μs
Optimised.drop/2 x100      214.93 K        4.65 μs     ±5.36%        4.63 μs        5.38 μs

Comparison:
Baseline.drop/2 x100       401.84 K
Optimised.drop/2 x100      214.93 K - 1.87x slower +2.16 μs

##### With input n=20 overlap=0% (x100) #####
Name                            ips        average  deviation         median         99th %
Baseline.drop/2 x100        60.11 K       16.64 μs     ±2.08%       16.63 μs       17.58 μs
Optimised.drop/2 x100       57.23 K       17.47 μs     ±1.77%       17.46 μs       18.25 μs

Comparison:
Baseline.drop/2 x100        60.11 K
Optimised.drop/2 x100       57.23 K - 1.05x slower +0.84 μs

##### With input n=20 overlap=25% (x100) #####
Name                            ips        average  deviation         median         99th %
Baseline.drop/2 x100        47.88 K       20.89 μs     ±1.49%       20.92 μs       21.67 μs
Optimised.drop/2 x100       36.95 K       27.07 μs     ±3.69%       27.25 μs       29.54 μs

Comparison:
Baseline.drop/2 x100        47.88 K
Optimised.drop/2 x100       36.95 K - 1.30x slower +6.18 μs

##### With input n=20 overlap=50% (x100) #####
Name                            ips        average  deviation         median         99th %
Baseline.drop/2 x100        43.43 K       23.02 μs     ±1.47%       23.04 μs       23.88 μs
Optimised.drop/2 x100       28.27 K       35.38 μs     ±3.86%       35.08 μs       39.67 μs

Comparison:
Baseline.drop/2 x100        43.43 K
Optimised.drop/2 x100       28.27 K - 1.54x slower +12.35 μs

##### With input n=5 overlap=0% (x100) #####
Name                            ips        average  deviation         median         99th %
Baseline.drop/2 x100       204.89 K        4.88 μs     ±4.88%        4.83 μs        5.54 μs
Optimised.drop/2 x100      167.00 K        5.99 μs     ±2.08%        5.96 μs        6.33 μs

Comparison:
Baseline.drop/2 x100       204.89 K
Optimised.drop/2 x100      167.00 K - 1.23x slower +1.11 μs

##### With input n=5 overlap=25% (x100) #####
Name                            ips        average  deviation         median         99th %
Baseline.drop/2 x100       195.48 K        5.12 μs     ±2.23%        5.13 μs        5.42 μs
Optimised.drop/2 x100      139.50 K        7.17 μs     ±3.75%        7.13 μs        7.92 μs

Comparison:
Baseline.drop/2 x100       195.48 K
Optimised.drop/2 x100      139.50 K - 1.40x slower +2.05 μs

##### With input n=5 overlap=50% (x100) #####
Name                            ips        average  deviation         median         99th %
Baseline.drop/2 x100       198.99 K        5.03 μs     ±2.05%        5.04 μs        5.29 μs
Optimised.drop/2 x100      115.00 K        8.70 μs     ±4.78%        8.71 μs        9.75 μs

Comparison:
Baseline.drop/2 x100       198.99 K
Optimised.drop/2 x100      115.00 K - 1.73x slower +3.67 μs
Operating System: macOS
CPU Information: Apple M2 Pro
Number of Available Cores: 10
Available memory: 32 GB
Elixir 1.19.4
Erlang 28.2
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 1 s
time: 3 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: n=100 overlap=0%, n=100 overlap=25%, n=100 overlap=50%, n=1000 overlap=0%, n=1000 overlap=25%, n=1000 overlap=50%
Estimated total run time: 48 s
Excluding outliers: true


##### With input n=100 overlap=0% #####
Name                       ips        average  deviation         median         99th %
Optimised.drop/2      853.50 K        1.17 μs     ±3.36%        1.17 μs        1.25 μs
Baseline.drop/2       831.08 K        1.20 μs     ±4.83%        1.21 μs        1.33 μs

Comparison:
Optimised.drop/2      853.50 K
Baseline.drop/2       831.08 K - 1.03x slower +0.0316 μs

##### With input n=100 overlap=25% #####
Name                       ips        average  deviation         median         99th %
Baseline.drop/2       490.95 K        2.04 μs     ±3.25%        2.04 μs        2.21 μs
Optimised.drop/2      478.55 K        2.09 μs     ±5.17%        2.08 μs        2.38 μs

Comparison:
Baseline.drop/2       490.95 K
Optimised.drop/2      478.55 K - 1.03x slower +0.0528 μs

##### With input n=100 overlap=50% #####
Name                       ips        average  deviation         median         99th %
Baseline.drop/2       280.86 K        3.56 μs     ±2.39%        3.58 μs        3.79 μs
Optimised.drop/2      270.08 K        3.70 μs     ±2.50%        3.71 μs        3.96 μs

Comparison:
Baseline.drop/2       280.86 K
Optimised.drop/2      270.08 K - 1.04x slower +0.142 μs

##### With input n=1000 overlap=0% #####
Name                       ips        average  deviation         median         99th %
Optimised.drop/2       81.60 K       12.25 μs     ±1.81%       12.21 μs       13.04 μs
Baseline.drop/2        79.46 K       12.58 μs     ±2.83%       12.54 μs       13.71 μs

Comparison:
Optimised.drop/2       81.60 K
Baseline.drop/2        79.46 K - 1.03x slower +0.33 μs

##### With input n=1000 overlap=25% #####
Name                       ips        average  deviation         median         99th %
Optimised.drop/2       25.81 K       38.75 μs     ±4.44%       38.29 μs       43.83 μs
Baseline.drop/2         4.39 K      227.73 μs     ±3.28%      225.71 μs      249.29 μs

Comparison:
Optimised.drop/2       25.81 K
Baseline.drop/2         4.39 K - 5.88x slower +188.98 μs

##### With input n=1000 overlap=50% #####
Name                       ips        average  deviation         median         99th %
Optimised.drop/2       17.99 K       55.58 μs     ±5.07%       54.79 μs       63.50 μs
Baseline.drop/2         2.67 K      375.18 μs     ±3.20%      372.00 μs      409.34 μs

Comparison:
Optimised.drop/2       17.99 K
Baseline.drop/2         2.67 K - 6.75x slower +319.60 μs

════════════════════════════════════════════════════════
=== BENCH_F=split ===
════════════════════════════════════════════════════════
Operating System: macOS
CPU Information: Apple M2 Pro
Number of Available Cores: 10
Available memory: 32 GB
Elixir 1.19.4
Erlang 28.2
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 1 s
time: 3 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: n=10 overlap=0% (x100), n=10 overlap=25% (x100), n=10 overlap=50% (x100), n=2 overlap=0% (x100), n=2 overlap=25% (x100), n=2 overlap=50% (x100), n=20 overlap=0% (x100), n=20 overlap=25% (x100), n=20 overlap=50% (x100), n=5 overlap=0% (x100), n=5 overlap=25% (x100), n=5 overlap=50% (x100)
Estimated total run time: 1 min 36 s
Excluding outliers: true


##### With input n=10 overlap=0% (x100) #####
Name                             ips        average  deviation         median         99th %
Baseline.split/2 x100        96.98 K       10.31 μs     ±3.54%       10.33 μs       11.21 μs
Optimised.split/2 x100       88.05 K       11.36 μs     ±2.30%       11.29 μs       12.08 μs

Comparison:
Baseline.split/2 x100        96.98 K
Optimised.split/2 x100       88.05 K - 1.10x slower +1.05 μs

##### With input n=10 overlap=25% (x100) #####
Name                             ips        average  deviation         median         99th %
Baseline.split/2 x100        92.30 K       10.83 μs     ±2.15%       10.79 μs       11.46 μs
Optimised.split/2 x100       74.81 K       13.37 μs     ±2.32%       13.29 μs       14.25 μs

Comparison:
Baseline.split/2 x100        92.30 K
Optimised.split/2 x100       74.81 K - 1.23x slower +2.53 μs

##### With input n=10 overlap=50% (x100) #####
Name                             ips        average  deviation         median         99th %
Baseline.split/2 x100        81.14 K       12.32 μs     ±2.11%       12.25 μs          13 μs
Optimised.split/2 x100       55.49 K       18.02 μs     ±1.55%          18 μs       18.79 μs

Comparison:
Baseline.split/2 x100        81.14 K
Optimised.split/2 x100       55.49 K - 1.46x slower +5.70 μs

##### With input n=2 overlap=0% (x100) #####
Name                             ips        average  deviation         median         99th %
Baseline.split/2 x100       278.95 K        3.58 μs    ±17.93%        3.83 μs        5.13 μs
Optimised.split/2 x100      200.69 K        4.98 μs    ±16.69%        5.50 μs        7.08 μs

Comparison:
Baseline.split/2 x100       278.95 K
Optimised.split/2 x100      200.69 K - 1.39x slower +1.40 μs

##### With input n=2 overlap=25% (x100) #####
Name                             ips        average  deviation         median         99th %
Baseline.split/2 x100       275.20 K        3.63 μs    ±16.89%        3.88 μs        5.13 μs
Optimised.split/2 x100      198.54 K        5.04 μs    ±16.51%        5.50 μs        7.17 μs

Comparison:
Baseline.split/2 x100       275.20 K
Optimised.split/2 x100      198.54 K - 1.39x slower +1.40 μs

##### With input n=2 overlap=50% (x100) #####
Name                             ips        average  deviation         median         99th %
Baseline.split/2 x100       324.33 K        3.08 μs    ±16.95%        2.83 μs        4.25 μs
Optimised.split/2 x100      185.10 K        5.40 μs     ±9.18%        5.21 μs        6.50 μs

Comparison:
Baseline.split/2 x100       324.33 K
Optimised.split/2 x100      185.10 K - 1.75x slower +2.32 μs

##### With input n=20 overlap=0% (x100) #####
Name                             ips        average  deviation         median         99th %
Baseline.split/2 x100        53.62 K       18.65 μs     ±3.11%       18.63 μs       20.08 μs
Optimised.split/2 x100       51.82 K       19.30 μs     ±1.38%       19.33 μs       19.96 μs

Comparison:
Baseline.split/2 x100        53.62 K
Optimised.split/2 x100       51.82 K - 1.03x slower +0.65 μs

##### With input n=20 overlap=25% (x100) #####
Name                             ips        average  deviation         median         99th %
Baseline.split/2 x100        44.96 K       22.24 μs     ±1.93%       22.21 μs       23.46 μs
Optimised.split/2 x100       36.46 K       27.42 μs     ±1.35%       27.42 μs       28.42 μs

Comparison:
Baseline.split/2 x100        44.96 K
Optimised.split/2 x100       36.46 K - 1.23x slower +5.18 μs

##### With input n=20 overlap=50% (x100) #####
Name                             ips        average  deviation         median         99th %
Baseline.split/2 x100        40.21 K       24.87 μs     ±1.55%       24.88 μs       25.88 μs
Optimised.split/2 x100       27.82 K       35.94 μs     ±1.55%       35.92 μs       37.54 μs

Comparison:
Baseline.split/2 x100        40.21 K
Optimised.split/2 x100       27.82 K - 1.45x slower +11.07 μs

##### With input n=5 overlap=0% (x100) #####
Name                             ips        average  deviation         median         99th %
Optimised.split/2 x100      145.77 K        6.86 μs     ±1.48%        6.83 μs        7.17 μs
Baseline.split/2 x100       138.04 K        7.24 μs    ±24.57%        6.08 μs       12.08 μs

Comparison:
Optimised.split/2 x100      145.77 K
Baseline.split/2 x100       138.04 K - 1.06x slower +0.38 μs

##### With input n=5 overlap=25% (x100) #####
Name                             ips        average  deviation         median         99th %
Baseline.split/2 x100       170.51 K        5.86 μs     ±2.74%        5.79 μs        6.42 μs
Optimised.split/2 x100      117.14 K        8.54 μs    ±16.69%        7.83 μs       11.46 μs

Comparison:
Baseline.split/2 x100       170.51 K
Optimised.split/2 x100      117.14 K - 1.46x slower +2.67 μs

##### With input n=5 overlap=50% (x100) #####
Name                             ips        average  deviation         median         99th %
Baseline.split/2 x100       172.14 K        5.81 μs     ±2.09%        5.75 μs        6.21 μs
Optimised.split/2 x100      113.51 K        8.81 μs     ±2.13%        8.79 μs        9.29 μs

Comparison:
Baseline.split/2 x100       172.14 K
Optimised.split/2 x100      113.51 K - 1.52x slower +3.00 μs
Operating System: macOS
CPU Information: Apple M2 Pro
Number of Available Cores: 10
Available memory: 32 GB
Elixir 1.19.4
Erlang 28.2
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 1 s
time: 3 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: n=100 overlap=0%, n=100 overlap=25%, n=100 overlap=50%, n=1000 overlap=0%, n=1000 overlap=25%, n=1000 overlap=50%
Estimated total run time: 48 s
Excluding outliers: true


##### With input n=100 overlap=0% #####
Name                        ips        average  deviation         median         99th %
Optimised.split/2        1.13 M      887.07 ns     ±4.06%         875 ns         959 ns
Baseline.split/2         1.08 M      926.97 ns     ±6.99%         917 ns        1084 ns

Comparison:
Optimised.split/2        1.13 M
Baseline.split/2         1.08 M - 1.04x slower +39.90 ns

##### With input n=100 overlap=25% #####
Name                        ips        average  deviation         median         99th %
Baseline.split/2       511.19 K        1.96 μs     ±2.93%        1.96 μs        2.08 μs
Optimised.split/2      508.30 K        1.97 μs     ±4.86%           2 μs        2.25 μs

Comparison:
Baseline.split/2       511.19 K
Optimised.split/2      508.30 K - 1.01x slower +0.0111 μs

##### With input n=100 overlap=50% #####
Name                        ips        average  deviation         median         99th %
Baseline.split/2       285.48 K        3.50 μs     ±2.33%        3.46 μs        3.71 μs
Optimised.split/2      273.01 K        3.66 μs     ±2.21%        3.67 μs        3.88 μs

Comparison:
Baseline.split/2       285.48 K
Optimised.split/2      273.01 K - 1.05x slower +0.160 μs

##### With input n=1000 overlap=0% #####
Name                        ips        average  deviation         median         99th %
Baseline.split/2       119.48 K        8.37 μs     ±5.65%        8.38 μs        9.75 μs
Optimised.split/2      114.37 K        8.74 μs     ±3.19%        8.83 μs        9.38 μs

Comparison:
Baseline.split/2       119.48 K
Optimised.split/2      114.37 K - 1.04x slower +0.37 μs

##### With input n=1000 overlap=25% #####
Name                        ips        average  deviation         median         99th %
Optimised.split/2       26.44 K       37.82 μs     ±4.77%       38.04 μs       42.50 μs
Baseline.split/2         4.47 K      223.56 μs     ±1.68%      222.71 μs      234.59 μs

Comparison:
Optimised.split/2       26.44 K
Baseline.split/2         4.47 K - 5.91x slower +185.75 μs

##### With input n=1000 overlap=50% #####
Name                        ips        average  deviation         median         99th %
Optimised.split/2       17.52 K       57.07 μs     ±4.70%       57.08 μs       63.71 μs
Baseline.split/2         2.73 K      366.52 μs     ±0.75%      366.00 μs      374.88 μs

Comparison:
Optimised.split/2       17.52 K
Baseline.split/2         2.73 K - 6.42x slower +309.45 μs

We're consistently slower for small lists unfortunately. Maybe we need to switch to the optimised logic only for larger lists.

take/2 per 100 calls

n overlap Baseline Optimised Delta Ratio
2 0% 1.85 µs 3.33 µs +1.48 µs 1.80x slower
2 25% 1.86 µs 3.32 µs +1.46 µs 1.79x slower
2 50% 2.19 µs 4.42 µs +2.23 µs 2.02x slower
5 0% 3.38 µs 4.87 µs +1.49 µs 1.44x slower
5 25% 4.11 µs 5.92 µs +1.81 µs 1.44x slower
5 50% 4.34 µs 7.28 µs +2.94 µs 1.68x slower
10 0% 6.55 µs 7.88 µs +1.33 µs 1.20x slower
10 25% 7.98 µs 9.96 µs +1.98 µs 1.25x slower
10 50% 8.71 µs 14.37 µs +5.67 µs 1.65x slower
20 0% 11.69 µs 12.98 µs +1.29 µs 1.11x slower
20 25% 16.80 µs 21.70 µs +4.90 µs 1.29x slower
20 50% 19.72 µs 30.88 µs +11.16 µs 1.57x slower

drop/2 per 100 calls

n overlap Baseline Optimised Delta Ratio
2 0% 2.61 µs 3.81 µs +1.20 µs 1.46x slower
2 25% 2.60 µs 3.80 µs +1.20 µs 1.46x slower
2 50% 2.49 µs 4.65 µs +2.16 µs 1.87x slower
5 0% 4.88 µs 5.99 µs +1.11 µs 1.23x slower
5 25% 5.12 µs 7.17 µs +2.05 µs 1.40x slower
5 50% 5.03 µs 8.70 µs +3.67 µs 1.73x slower
10 0% 9.22 µs 10.15 µs +0.93 µs 1.10x slower
10 25% 9.85 µs 14.71 µs +4.86 µs 1.49x slower
10 50% 11.09 µs 17.44 µs +6.35 µs 1.57x slower
20 0% 16.64 µs 17.47 µs +0.83 µs 1.05x (tied)
20 25% 20.89 µs 27.07 µs +6.18 µs 1.30x slower
20 50% 23.02 µs 35.38 µs +12.36 µs 1.54x slower

split/2 per 100 calls

n overlap Baseline Optimised Delta Ratio
2 0% 3.58 µs 4.98 µs +1.40 µs 1.39x slower
2 25% 3.63 µs 5.04 µs +1.41 µs 1.39x slower
2 50% 3.08 µs 5.40 µs +2.32 µs 1.75x slower
5 0% 7.24 µs 6.86 µs −0.38 µs 1.06x faster (WIN)
5 25% 5.86 µs 8.54 µs +2.68 µs 1.46x slower
5 50% 5.81 µs 8.81 µs +3.00 µs 1.52x slower
10 0% 10.31 µs 11.36 µs +1.05 µs 1.10x slower
10 25% 10.83 µs 13.37 µs +2.54 µs 1.23x slower
10 50% 12.32 µs 18.02 µs +5.70 µs 1.46x slower
20 0% 18.65 µs 19.30 µs +0.65 µs 1.03x (tied)
20 25% 22.24 µs 27.42 µs +5.18 µs 1.23x slower
20 50% 24.87 µs 35.94 µs +11.07 µs 1.45x slower
Function Optimised slower Tied (within ~5%) Optimised faster Wins at n=1000
pop/3 n=10 already 1.3x – 2.2x
new/2 n=10 dup=0% (1.07x slower, +31 ns) n=10 dup=25% n=10 dup=50% (1.06x), n=100+ 3.8x – 4.4x
merge/2 n=10 (all overlaps, +2–26 ns) n=100 (1.57x – 2.16x) 6.8x – 9.4x
merge/3 n=10 (all overlaps, 1.44x – 1.76x, +112–146 ns) n=100 (1.81x – 4.75x) 7.5x – 21.7x
take/2 n=2–20 (1.11x – 2.02x slower) n=100 (within 4%) n=1000 with overlap≥25% 6.6x – 7.0x
drop/2 n=2–20 (1.05x – 1.87x slower) n=100 (within 4%) n=1000 with overlap≥25% 5.9x – 6.8x
split/2 n=2–20 (1.03x – 1.75x slower) n=100 (within 5%) n=1000 with overlap≥25% 5.9x – 6.4x

@josevalim
Copy link
Copy Markdown
Member

@PJUllrich so it seems the only consistent win is with pop/3? Shall we have a PR for that?

@PJUllrich
Copy link
Copy Markdown
Contributor Author

PJUllrich commented May 14, 2026 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants