Faster Base decoding by PJUllrich · Pull Request #15337 · elixir-lang/elixir

PJUllrich · 2026-05-02T13:22:45Z

Disclosure: I used Opus 4.7 to identify and fix these small performance improvements. I tested and benchmarked them myself and wrote the description below.

Changes

Inline decode_name/1 for all three compile-time blocks (base 16/32/64) since it seems that was missing. All other functions (e.g. validate_char_name) were already inlined.
Loop unroll validate16(upper|lower|mixed) for 4 and 8 bytes. It had previously only one function head for 2 bytes. Other functions (e.g. decode_name/1) have all 2/4/8 function heads. This allows for larger chunks when iterating through the byte list (I think).
In remove_ignored(string, :whitespace), check whether the string contains whitespaces first before building a new binary. This will walk the string twice, but runs the costly path of building a new binary only if necessary.

Benchmarks

I generated 64 random kilobytes and encoded them in the formats 16/32/64 + 64 (URL) for Base.url_decode64!/1. I used Benchee to measure one iteration of each Base.decodeX function on the random 64 kbs. The times below are the median iteration time.

I tested the :lower, :mixed, :upper cases and ignore: :whitespaces with (dirty) and without (clean) whitespaces in the random string.

(Benchee, 1s warmup, 3s measurement; M2 Pro, 64 KiB random input)

Name	before	deviation	after	deviation	speedup
decode16! lower 64KiB	307.79 μs	±4.23%	202.04 μs	±5.34%	1.52×
decode16! mixed 64KiB	307.79 μs	±4.98%	200.25 μs	±5.79%	1.54×
decode16! upper 64KiB	308.08 μs	±4.84%	200.42 μs	±4.27%	1.54×
decode32! 64KiB	231.92 μs	±5.36%	143.42 μs	±3.16%	1.62×
decode64! 64KiB	190.13 μs	±5.34%	119.46 μs	±4.13%	1.59×
decode64! ignore:ws CLEAN 64KiB	1015.44 μs	±3.69%	198.67 μs	±7.05%	5.11×
decode64! ignore:ws DIRTY 64KiB	1043.58 μs	±2.97%	966.71 μs	±8.33%	1.08×
url_decode64! 64KiB	192.92 μs	±5.68%	119.58 μs	±3.07%	1.61×
valid16? lower 64KiB	299.79 μs	±7.03%	119.13 μs	±4.00%	2.52×
valid16? upper 64KiB	292.83 μs	±5.76%	119.17 μs	±3.46%	2.46×
valid32? 64KiB	69.71 μs	±7.01%	71.33 μs	±3.68%	within noise
valid64? 64KiB	57.79 μs	±6.81%	59.42 μs	±2.68%	within noise
valid64? ignore:ws CLEAN 64KiB	889.40 μs	±3.42%	137.54 μs	±2.72%	6.47×

josevalim · 2026-05-02T14:58:50Z

Can you please break those into distinct benchmarks? Also, for removing whitespace, have you tried using the result of binary:match to skip the initial traversal? Finally, you have one commit from the other branch.

josevalim · 2026-05-02T14:59:55Z

Oh, and please make sure you measure smaller payloads too (let’s say 1kb).

Unroll validate16XXX for 8/4/2 bytes Fast-path remove_ignored/2

PJUllrich · 2026-05-02T16:12:24Z

@josevalim yessir! I grouped the benchmarked functions by the change that affects them and also test payloads from 1kb -> 10kb -> 50kb -> 100kb -> 1mb. Below is a summary of the results. I also added the benchmark script below.

Each cell contains the original -> patched median iteration time and the speedup multiplier. Benchmarks were done with Benchee with 2s warmup and 5s measurement on an M2 macBook Pro.

decode_name/1 (decode hot paths)

op	1KiB	10KiB	50KiB	100KiB	1MiB
decode16! upper	5.21 → 3.25μs (1.60×)	51.13 → 31.67μs (1.61×)	251.46 → 157.88μs (1.59×)	504.21 → 314.17μs (1.60×)	5.34 → 3.35ms (1.59×)
decode16! lower	5.04 → 3.21μs (1.57×)	49.46 → 31.63μs (1.56×)	249.33 → 157.42μs (1.58×)	495.67 → 306.83μs (1.62×)	5.24 → 3.30ms (1.59×)
decode16! mixed	5.17 → 3.29μs (1.57×)	50.21 → 31.67μs (1.59×)	250.88 → 157.46μs (1.59×)	499.71 → 314.50μs (1.59×)	5.30 → 3.34ms (1.59×)
decode64!	3.25 → 2.08μs (1.56×)	31.17 → 19.04μs (1.64×)	153.75 → 94.75μs (1.62×)	310.46 → 188.92μs (1.64×)	3.31 → 2.05ms (1.61×)
url_decode64!	3.25 → 2.08μs (1.56×)	30.38 → 19.04μs (1.60×)	153.67 → 94.54μs (1.63×)	309.42 → 188.79μs (1.64×)	3.31 → 2.03ms (1.63×)
decode32!	3.83 → 2.42μs (1.58×)	37.50 → 22.79μs (1.65×)	184.71 → 113.54μs (1.63×)	372.08 → 227.00μs (1.64×)	3.95 → 2.43ms (1.63×)

validate16 (valid16?)

op	1KiB	10KiB	50KiB	100KiB	1MiB
valid16? upper	4.71 → 1.88μs (2.51×)	47.08 → 18.67μs (2.52×)	234.47 → 93.08μs (2.52×)	485.71 → 186.29μs (2.61×)	4.92 → 1.92ms (2.56×)
valid16? lower	4.71 → 1.92μs (2.45×)	47.08 → 18.71μs (2.52×)	238.96 → 93.21μs (2.56×)	484.38 → 186.42μs (2.60×)	5.01 → 1.93ms (2.60×)
valid16? mixed	4.67 → 1.92μs (2.43×)	30.58 → 18.67μs (1.64×)	155.96 → 93.08μs (1.68×)	321.04 → 186.71μs (1.72×)	3.62 → 1.91ms (1.90×)

remove_ignored - `ignore: :whitespace`

op	1KiB	10KiB	50KiB	100KiB	1MiB
valid64? CLEAN	14.21 → 2.92μs (4.87×)	137.88 → 22.17μs (6.22×)	694.34 → 107.71μs (6.45×)	1.39 → 0.22ms (6.32×)	14.80 → 2.22ms (6.67×)
decode64! CLEAN	16.63 → 4.00μs (4.16×)	162.83 → 31.75μs (5.13×)	819.58 → 155.75μs (5.26×)	1.63 → 0.31ms (5.26×)	16.86 → 3.33ms (5.06×)
valid64? DIRTY	14.50 → 15.29μs (0.95×)	141.00 → 142.38μs (0.99×)	703.42 → 719.71μs (0.98×)	1.42 → 1.44ms (0.99×)	14.78 → 15.17ms (0.97×)
decode64! DIRTY	16.75 → 16.42μs (1.02×)	165.46 → 151.88μs (1.09×)	816.29 → 771.83μs (1.06×)	1.63 → 1.51ms (1.08×)	17.22 → 15.84ms (1.09×)
valid64? (no ignore, ref)	1.00 → 1.54μs	9.38 → 18.63μs	46.46 → 65.92μs	93.54 → 93.54μs	0.97 → 1.35ms
decode64! (no ignore, ref)	3.13 → 2.08μs	30.29 → 19.04μs	147.00 → 94.67μs	290.00 → 188.92μs	*3.10 → 2.06ms

In the remove_ignored benchmark above, you can see the reference times for valid64? and decode64! which are always called, even if we don't provide ignore: :whitespace. So, these times are the baselines if remove_ignored/2 is not called. Now, if you look at the before&after times of valid64? CLEAN, you can see that before the fix, it would take ~14.21µs to call remove_ignored/2 with a string that had no whitespaces. That's the same time as valid64? DIRTY which filters a string that does have whitespaces. After the change, valid64? DIRTY took the same amount of time (within noise), but valid64? CLEAN dropped to 14.21 -> 2.92µs because the binary match would return :nomatch and we don't build the binary unnecessarily.

The benchmark script

# Microbench for Base findings. Run with the SYSTEM elixir from anywhere:
#   BENCH_F=decode_name elixir bench_base.exs
#
# Selects which finding to bench via BENCH_F:
#   decode_name — Inline decode_name/1 for base 16/32/64 (decode hot paths)
#   validate16 — Unroll validate16(upper|lower|mixed) for 4 and 8 bytes
#   remove_ignored — remove_ignored fast path via :binary.match for :whitespace
#
# Each scenario runs across payload sizes: 1KiB, 10KiB, 50KiB, 100KiB, 1MiB.
# Hot-loads the in-tree base.ex so patches take effect without `make stdlib`.

Mix.install([{:benchee, "~> 1.3"}])

Code.put_compiler_option(:ignore_module_conflict, true)

src =
  System.get_env(
    "BENCH_SRC",
    "./lib/elixir/lib/base.ex"
  )

IO.puts("# Benching Base from: #{src}")
Code.compile_file(src)

finding = System.get_env("BENCH_F", "decode_name")

sizes = [
  {"1KiB", 1 * 1024},
  {"10KiB", 10 * 1024},
  {"50KiB", 50 * 1024},
  {"100KiB", 100 * 1024},
  {"1MiB", 1024 * 1024}
]

# Sprinkle ~1 whitespace char per 76 chars (MIME-style line wrapping).
sprinkle_ws = fn b64 ->
  b64
  |> :erlang.binary_to_list()
  |> Enum.chunk_every(76)
  |> Enum.map_join("\n", &List.to_string/1)
end

case finding do
  "decode_name" ->
    inputs =
      for {label, n} <- sizes, into: %{} do
        data = :crypto.strong_rand_bytes(n)

        encoded = %{
          hex_upper: Base.encode16(data),
          hex_lower: Base.encode16(data, case: :lower),
          b64: Base.encode64(data),
          b64_url: Base.url_encode64(data),
          b32: Base.encode32(data)
        }

        {label, encoded}
      end

    Benchee.run(
      %{
        "decode16! upper" => fn %{hex_upper: s} -> Base.decode16!(s) end,
        "decode16! lower" => fn %{hex_lower: s} -> Base.decode16!(s, case: :lower) end,
        "decode16! mixed" => fn %{hex_upper: s} -> Base.decode16!(s, case: :mixed) end,
        "decode64!" => fn %{b64: s} -> Base.decode64!(s) end,
        "url_decode64!" => fn %{b64_url: s} -> Base.url_decode64!(s) end,
        "decode32!" => fn %{b32: s} -> Base.decode32!(s) end
      },
      inputs: inputs,
      warmup: 2,
      time: 5,
      print: [fast_warning: false]
    )

  "validate16" ->
    inputs =
      for {label, n} <- sizes, into: %{} do
        data = :crypto.strong_rand_bytes(n)

        encoded = %{
          hex_upper: Base.encode16(data),
          hex_lower: Base.encode16(data, case: :lower)
        }

        {label, encoded}
      end

    Benchee.run(
      %{
        "valid16? upper" => fn %{hex_upper: s} -> Base.valid16?(s) end,
        "valid16? lower" => fn %{hex_lower: s} -> Base.valid16?(s, case: :lower) end,
        "valid16? mixed" => fn %{hex_upper: s} -> Base.valid16?(s, case: :mixed) end
      },
      inputs: inputs,
      warmup: 2,
      time: 5,
      print: [fast_warning: false]
    )

  "remove_ignored" ->
    inputs =
      for {label, n} <- sizes, into: %{} do
        data = :crypto.strong_rand_bytes(n)
        b64 = Base.encode64(data)
        b64_with_ws = sprinkle_ws.(b64)
        {label, %{clean: b64, dirty: b64_with_ws}}
      end

    Benchee.run(
      %{
        "decode64! ignore:ws CLEAN" => fn %{clean: s} ->
          Base.decode64!(s, ignore: :whitespace)
        end,
        "decode64! ignore:ws DIRTY" => fn %{dirty: s} ->
          Base.decode64!(s, ignore: :whitespace)
        end,
        "valid64? ignore:ws CLEAN" => fn %{clean: s} ->
          Base.valid64?(s, ignore: :whitespace)
        end,
        "valid64? ignore:ws DIRTY" => fn %{dirty: s} ->
          Base.valid64?(s, ignore: :whitespace)
        end,
        # Reference rows: same ops without `ignore: :whitespace` to show the
        # floor cost (no remove_ignored work at all).
        "decode64! (no ignore, ref)" => fn %{clean: s} -> Base.decode64!(s) end,
        "valid64? (no ignore, ref)" => fn %{clean: s} -> Base.valid64?(s) end
      },
      inputs: inputs,
      warmup: 2,
      time: 5,
      print: [fast_warning: false]
    )

  other ->
    raise "Unknown BENCH_F=#{other} (use decode_name, validate16, remove_ignored)"
end

PJUllrich · 2026-05-02T16:13:18Z

And could you please elaborate what you mean with: Also, for removing whitespace, have you tried using the result of binary:match to skip the initial traversal??

PJUllrich · 2026-05-02T16:16:11Z

I'm currently trying to understand why valid64? became slower ... hang on.

EDIT: I re-ran the test only for the valid64? function and the before and after have the same times. The results above must have been due to my macbook running out of battery :D The were the last scenario to run

`valid64?` benchmark - Benchee with 2s warmup and 5s measurement

size	baseline	patched	ratio
1KiB	1.00μs	1.00μs	1.00×
10KiB	9.13μs	9.13μs	1.00×
50KiB	45.17μs	45.17μs	1.00×
100KiB	90.96μs	91.04μs	1.00×
1MiB	941.88μs	945.29μs	**1.00×

josevalim · 2026-05-02T16:19:38Z

@PJUllrich I mean something like this:

case binary:match(binary) do
  {prefix, _} -> binary_part(0, prefix) <> for(<<x <- binary_part(prefix + 1, binary_size(binary)-prefix)>>, ...)
  :nomatch -> binary
end

It is not going to be exactly the above but it will give a good approximation of what I meant. It may not be better though or it may require further adjustments.

josevalim · 2026-05-02T16:36:27Z

Also, inlining decode_name largely increases the byte code size of the module (~20%). If we want to inline it, we need to trim it.

josevalim · 2026-05-02T16:40:32Z

@PJUllrich if you want to optimise this module, ask your coding agent to explore SWAR techniques, as in this commit: #15255

You can first explore it for validation. It will remove the tuple lookups and should be quite more efficient (you can do 7 bytes at a time). You can do all validation cases first. If that works (which you can PR!), then we can start exploring decoding.

For this PR in particular, the change for validate16 looks good, the other ones I'd not merge for now cause I worry it may be worse in other cases we haven't considered yet.

PJUllrich · 2026-05-02T16:44:34Z

@josevalim I benchmarked your suggestion, but it did not make a difference for removing the whitespaces. This might be because in my "random" data with whitespaces, the first whitespace occurs after 76 characters already, so only the first 76 bytes would be skipped. If the whitespace would be further towards the end, it might make a difference indeed.

Sounds, good. I will look together with my agent into SWAR and propose changes if I can find significant improvements in another PR.

I'll remove the changes other than the validate16 headers.

EDIT: This was my (horrible, first draft) implementation of your suggestion if you want to validate it:

  defp remove_ignored(string, :whitespace) do
    case :binary.match(string, [<<?\s>>, <<?\t>>, <<?\r>>, <<?\n>>]) do
      {prefix, _} ->
        binary_part(string, 0, prefix) <>
          for(
            <<char::8 <- binary_part(string, prefix + 1, byte_size(string) - prefix - 1)>>,
            char not in ~c"\s\t\r\n",
            into: <<>>,
            do: <<char::8>>
          )

      :nomatch ->
        string
    end
  end

josevalim · 2026-05-02T20:07:59Z

@PJUllrich your patch is correct but it makes sense we won’t see a difference for that. We need to check smaller payloads and do some white space distribution (at none, beginning, middle, and end).

josevalim · 2026-05-02T20:56:55Z

💚 💙 💜 💛 ❤️

PJUllrich changed the title ~~Improve base.ex~~ Faster Base decoding May 2, 2026

Inline `decode_name/1 in all three compile-time blocks

0baf25d

Unroll validate16XXX for 8/4/2 bytes Fast-path remove_ignored/2

PJUllrich force-pushed the improve-base-ex branch from 13dda7d to 0baf25d Compare May 2, 2026 15:21

Revert decode_name and remove_ignored changes

67fbe99

josevalim merged commit 133dfa8 into elixir-lang:main May 2, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster Base decoding#15337

Faster Base decoding#15337
josevalim merged 2 commits intoelixir-lang:mainfrom
PJUllrich:improve-base-ex

PJUllrich commented May 2, 2026

Uh oh!

josevalim commented May 2, 2026

Uh oh!

josevalim commented May 2, 2026

Uh oh!

PJUllrich commented May 2, 2026 •

edited

Loading

Uh oh!

PJUllrich commented May 2, 2026

Uh oh!

PJUllrich commented May 2, 2026 •

edited

Loading

Uh oh!

josevalim commented May 2, 2026

Uh oh!

josevalim commented May 2, 2026

Uh oh!

josevalim commented May 2, 2026

Uh oh!

PJUllrich commented May 2, 2026 •

edited

Loading

Uh oh!

josevalim commented May 2, 2026

Uh oh!

Uh oh!

josevalim commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

PJUllrich commented May 2, 2026

Changes

Benchmarks

Uh oh!

josevalim commented May 2, 2026

Uh oh!

josevalim commented May 2, 2026

Uh oh!

PJUllrich commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

decode_name/1 (decode hot paths)

validate16 (valid16?)

remove_ignored - ignore: :whitespace

Uh oh!

PJUllrich commented May 2, 2026

Uh oh!

PJUllrich commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

valid64? benchmark - Benchee with 2s warmup and 5s measurement

Uh oh!

josevalim commented May 2, 2026

Uh oh!

josevalim commented May 2, 2026

Uh oh!

josevalim commented May 2, 2026

Uh oh!

PJUllrich commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josevalim commented May 2, 2026

Uh oh!

Uh oh!

josevalim commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

PJUllrich commented May 2, 2026 •

edited

Loading

remove_ignored - `ignore: :whitespace`

PJUllrich commented May 2, 2026 •

edited

Loading

`valid64?` benchmark - Benchee with 2s warmup and 5s measurement

PJUllrich commented May 2, 2026 •

edited

Loading