Skip to content

Conversation

@jessestimpson
Copy link
Collaborator

@jessestimpson jessestimpson commented Oct 24, 2025

Problem

I found performance problems with erlfdb_tuple:enc_null_terminated/1 and erlfdb_tuple:dec_null_terminated/1.

The problem is that the current implementation goes searching for ?NULLs and ?ESCAPEs with iterative Erlang function calls. The overhead becomes very noticeable as the strings grow in size, or if the frequency of 0 bytes increases.

Solution

Avoid iterating on each byte in Erlang code. Instead, use the optimized binary:match/3.

Aside -- why can't we do even better?

In many existing binary protocols, a string would be encoded with its length prefixed and without escaping. However, in the Tuple layer, this is not possible because the encoding must provide the correct ordering for variable length strings. Prefixing the string length would break this contract. We will always have to do iterative searching for the ?NULL and ?ESCAPE. It turns out the OTP team has already optimized this for us in binary:match/3.

Benchmark

For each run:

  1. Generate 10000 binaries, all of which having at least 1 0 byte within.
  2. From those binaries, encode to Base64, thus removing the 0 bytes.
  3. Pack and unpack both the binaries (bin) and Base64 binaries (b64)
  4. Print the average time per operation in microseconds.

Result

Summary

For a 32-byte string without an escaped 0 byte, the execution is reduced by 50%.
For such a 512-byte string, by 98%.

Details

Results from current erlfdb master

1> erlfdb_tuple_string_benchmark:run().
bin 32 pack=1.0916 us/sample, unpack=0.6854 us/sample
b64 32 pack=0.9595 us/sample, unpack=0.7501 us/sample

bin 64 pack=2.0956 us/sample, unpack=2.7483 us/sample
b64 64 pack=3.4701 us/sample, unpack=5.3966 us/sample

bin 128 pack=5.0393 us/sample, unpack=6.9414 us/sample
b64 128 pack=6.1951 us/sample, unpack=8.9551 us/sample

bin 256 pack=9.1127 us/sample, unpack=12.0444 us/sample
b64 256 pack=11.4351 us/sample, unpack=16.8047 us/sample

bin 512 pack=9.916 us/sample, unpack=23.3905 us/sample
b64 512 pack=21.2903 us/sample, unpack=32.6878 us/sample

Results with the proposed changes

1> erlfdb_tuple_string_benchmark:run().
bin 32 pack=1.0937 us/sample, unpack=0.7254 us/sample
b64 32 pack=0.4165 us/sample, unpack=0.3535 us/sample

bin 64 pack=0.6203 us/sample, unpack=0.4815 us/sample
b64 64 pack=0.3352 us/sample, unpack=0.2736 us/sample

bin 128 pack=0.9087 us/sample, unpack=1.1618 us/sample
b64 128 pack=0.5095 us/sample, unpack=0.4823 us/sample

bin 256 pack=0.9778 us/sample, unpack=1.514 us/sample
b64 256 pack=0.3356 us/sample, unpack=0.2985 us/sample

bin 512 pack=1.1391 us/sample, unpack=1.9582 us/sample
b64 512 pack=0.5569 us/sample, unpack=0.3962 us/sample

Benchmark Code

-module(erlfdb_tuple_string_benchmark).

-export([run/0, binary_samples/3]).

% Creates N binaries with Size bytes each.
% We guarantee that each binary has at least 1 zero byte, by inserting it at the middle if not already encountered.
binary_samples(Seed, Size, N) ->
    S = rand:seed_s(default, Seed),
    Sample = fun(State) ->
        lists:foldl(
            fun
                (Idx, {false, Acc, State0}) when Idx == (Size div 2) ->
                    {true, [0 | Acc], State0};
                (_, {HasZero, Acc, State0}) ->
                    {X, State1} = rand:uniform_s(256, State0),
                    {HasZero orelse (X - 1 == 0), [X - 1 | Acc], State1}
            end,
            {false, [], State},
            lists:seq(1, Size)
        )
    end,
    {Bins, _} = lists:foldl(
        fun(_, {Acc, State0}) ->
            {true, Bin, State1} = Sample(State0),
            {[Bin | Acc], State1}
        end,
        {[], S},
        lists:seq(1, N)
    ),
    [iolist_to_binary(X) || X <- Bins].

run() ->
    run(32),
    run(64),
    run(128),
    run(256),
    run(512).

run(Size) ->
    BinSamples = binary_samples(Size, Size, 10000),
    B64Samples = [base64:encode(X) || X <- BinSamples],
    run("bin " ++ integer_to_list(Size), BinSamples),
    run("b64 " ++ integer_to_list(Size), B64Samples),
    io:format("~n"),
    ok.

run(Label, Samples) ->
    Tuples = [{X} || X <- Samples],
    {TP, Packed} = timer:tc(fun() -> [erlfdb_tuple:pack({X}) || X <- Samples] end),
    {TU, Unpacked} = timer:tc(fun() -> [erlfdb_tuple:unpack(X) || X <- Packed] end),
    true = lists:all(fun({A, B}) -> A =:= B end, lists:zip(Tuples, Unpacked)),
    io:format("~s pack=~p us/sample, unpack=~p us/sample~n", [
        Label, TP / length(Samples), TU / length(Samples)
    ]),
    ok.

@jessestimpson jessestimpson merged commit 062ed99 into foundationdb-beam:main Oct 24, 2025
9 of 10 checks passed
@jessestimpson
Copy link
Collaborator Author

A simpler alternative uses standard recommended iteration and accumulation over the binary:

dec_null_terminated2(Bin) ->
    dec_null_terminated2(Bin, <<>>).
    
dec_null_terminated2(Bin, Acc) ->
    case Bin of
        <<?NULL, ?ESCAPE, Rest/binary>> ->
            dec_null_terminated2(Rest, <<Acc/binary, ?NULL>>);
        <<?NULL, Rest/binary>> ->
            {Acc, Rest};
        <<Byte, Rest/binary>> ->
            dec_null_terminated2(Rest, <<Acc/binary, Byte>>);
        <<>> ->
            {Acc, <<>>}
    end.

This has the advantage of being much much simpler than what's in this PR, but the worst case is 4 us vs 0.4 us using binary:match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant