Optimize binary_to_integer/1 and friends #7426

bjorng · 2023-06-21T07:13:37Z

No description provided.

When compiling 64-bit programs, goth gcc and clang support an 128-bit unsigned integer type. Using it to define the `ErtsDoubleDigit` type will speed up multiplication and division.

mikpe · 2023-06-21T09:33:47Z

erts/emulator/beam/big.h

@@ -37,8 +37,11 @@ typedef Uint16   ErtsHalfDigit;
 #undef  BIG_HAVE_DOUBLE_DIGIT
 typedef Uint16   ErtsHalfDigit;

+#elif (SIZEOF_VOID_P == 8) && defined(__GNUC__) && (__GNUC__ >= 4)
+typedef __uint128_t ErtsDoubleDigit;


Is this universally available on all 64-bit targets?

It seems to be. I tried all 64-bit gcc and clang targets I could find on godbolt.org.

mikpe · 2023-06-21T09:35:19Z

erts/emulator/beam/erl_term.h

+#  define BIG_ARITY_MAX		((1 << 16)-1)
+#else
+#  define BIG_ARITY_MAX		((1 << 17)-1)
+#endif


What's the reason for this restriction? The commit doesn't say.

Yes, and it looks a bit suspicious that the limit is larger on 64b architectures than on 32b architectures.

It's the same in bytes though, as these values are in words (pointer-sized integers).

We wanted to decrease the maximum execution time for a single arithmetic operation. We chose a new limit that we hope don't affect real applications of large integer arithmetic, but that will produce a system_limit exception faster for accidental use of large integers.

bjorng · 2023-06-21T13:13:07Z

I've pushed a fixup commit that simplifies the last part of the Karatsuba calculation.

rgrig · 2023-06-21T14:34:46Z

erts/preloaded/src/erlang.erl

+            %% Too large even for base 2.
+            erlang:exit(system_limit);
+        Size > 1262611, Base >= 10 ->
+            erlang:exit(system_limit);


How are these limits chosen?
Actually, what I'd like to know is this: what's the longest it can take (on a typical computer), within these limits?

Well under a second, at least on 64-bit systems.

On my slowest computer (MacBook Pro from 2013, 2.6 GHz, Intel Core i5) the conversion finishes in about 0.63 seconds. On a HP laptop running Linux (11th Gen Intel, i7-1185G7 @ 3.00 GHz) the time is 0.34 seconds.

rgrig · 2023-06-21T14:46:53Z

erts/preloaded/src/erlang.erl

+            Acc
+    end.
+
+combine(L0, Radix) ->


Thank you so much for rewriting it this way! It's nice to have better asymptotics.

I wonder if it would be faster to revert to linear scan for relatively short integers. It has worse asymptotics but maybe better constants and more cache friendliness.

What I mean is this: instead of starting with single digits, start with numbers made out of N digits (where N can be tuned based on benchmarks, but probably in the region of 10 or 100). And to get these N-digit numbers, do the linear loop (with result:=10*result+digit), as it was before in C.

(I see you start not from single digits but from what fits in a word. I suppose I'm asking if it's not worth doing linear scan algo on those "small" integers, as long as the scan isn't too long, because constants&cache.)

The helper BIF will convert decimal numbers with up to 17 digits efficiently. We think that will take care of the majority of real-world use cases. Therefore, I have optimized the helper BIFs for correctness and simplicity.

However, after discussions with @jhogberg and @garazdawi I now think that spawning a temporary process is a bad idea and I have now eliminated that extra process. Not spawning the process will improve performance for binaries with more than 17 decimal digits.

fwiw, i tried my proposal on some toy code, and it's not only more complicated but also slower

The helper BIF will convert decimal numbers with up to 17 digits efficiently. We think that will take care of the majority of real-world use cases.

I think 64b integers are a common case, especially when interacting with other languages. Probably not worth optimizing for it at this point.

Not spawning the process will improve performance [...]

I didn't understand why there's a process spawned there, but I assumed it's my lack of Erlang knowledge. (Was it perhaps to make sure that timeouts apply to the main process even if the parsing process somwhow gets stuck?? Just a wild guess.)

I think 64b integers are a common case, especially when interacting with other languages. Probably not worth optimizing for it at this point.

If someone can show real-world use cases where integers whose absolute value are greater than 99999999999999999 are frequently used we could optimize this in OTP 27.

I didn't understand why there's a process spawned there, but I assumed it's my lack of Erlang knowledge. (Was it perhaps to make sure that timeouts apply to the main process even if the parsing process somwhow gets stuck?? Just a wild guess.)

It was an attempt at optimization (reducing garbage collection times), but after some more thinking it became clear that it in most cases it would be a pessimization.

rgrig · 2023-06-21T14:54:24Z

erts/preloaded/src/erlang.erl

@@ -1371,15 +1543,52 @@ list_to_float(_String) ->
 %% list_to_integer/1
 -spec list_to_integer(String) -> integer() when
      String :: string().
-list_to_integer(_String) ->
-    erlang:nif_error(undefined).
+list_to_integer(String) ->


I'm curious: Why is this not implemented as list_to_integer(String, 10)?

That would result in a stacktrace pointing out the wrong function if it fails.

rgrig · 2023-06-21T14:57:52Z

lib/stdlib/test/string_SUITE.erl

    {-1,""} = test_to_integer("-1"),
    {1,"="} = test_to_integer("1="),
    {7,"F"} = test_to_integer("7F"),
    {709,""} = test_to_integer("709"),
    {709,"*2"} = test_to_integer("709*2"),
    {0,"xAB"} = test_to_integer("0xAB"),
    {16,"#FF"} = test_to_integer("16#FF"),
+
+    %% Test a bignum.
+    Big = 12385792987438978973984398348974398593,


nit: Would be nice to have more than one number. Maybe some powers of 2, powers of 10, etc.

string:list_to_integer/1 is implemented using binary_to_integer/1, which is thoroughly tested by num_bif_SUITE. Here I just test that handling of signs and trailing non-digits are handled correctly.

michalmuskala · 2023-06-25T14:05:04Z

erts/preloaded/src/erlang.erl

+list_to_integer(String) ->
+    Base = 10,
+    case erts_internal:list_to_integer(String, Base) of
+        {Int,[]} ->


I wonder if it would make sense to just return the integer here to avoid the overhead of the unconditional allocation. The string function could wrap the result in a tuple, so we only "pay" for the allocation where necessary.
Shouldn't be a big impact, but for such low-level functions IMO every little thing matters

This is not a change I want to make now, in case someone is using the BIF directly despite it being undocumented.

This new 4 Mbit limit should still be sufficiently large for practical applications.

These will be needed in a future commit when we will rewrite some exsisting BIFs in Erlang.

Use a divide-and-conquer algorithm to lower the complexity for the BIFs that convert binaries or strings to integers: * erlang:list_to_integer/1 * erlang:binary_to_integer/1 * erlang:binary_to_integer/2 * erlang:list_to_integer/2 * string:list_to_integer/1

Optimize bignum arithmetic on 64-bit platforms

8f40158

When compiling 64-bit programs, goth gcc and clang support an 128-bit unsigned integer type. Using it to define the `ErtsDoubleDigit` type will speed up multiplication and division.

bjorng self-assigned this Jun 21, 2023

bjorng requested review from garazdawi, sverker and jhogberg June 21, 2023 07:14

bjorng added team:VM Assigned to OTP team VM enhancement testing currently being tested, tag is used by OTP internal CI labels Jun 21, 2023

bjorng force-pushed the bjorn/binary_to_integer/otp26/OTP-18659 branch 2 times, most recently from 550768e to 858c4f1 Compare June 21, 2023 08:05

mikpe reviewed Jun 21, 2023

View reviewed changes

rgrig reviewed Jun 21, 2023

View reviewed changes

bjorng force-pushed the bjorn/binary_to_integer/otp26/OTP-18659 branch 2 times, most recently from 2523e03 to 354cfe9 Compare June 22, 2023 12:04

michalmuskala reviewed Jun 25, 2023

View reviewed changes

Optimize bignum multiplication using the Karatsuba algorithm

0cf242f

bjorng force-pushed the bjorn/binary_to_integer/otp26/OTP-18659 branch from 4d6c483 to a769c1f Compare June 27, 2023 05:23

bjorng added 4 commits June 27, 2023 08:12

Lower size limit for integers to 4 Mbit

0833411

This new 4 Mbit limit should still be sufficiently large for practical applications.

Add new helper BIFs in erts_internal

166c920

These will be needed in a future commit when we will rewrite some exsisting BIFs in Erlang.

Update preloaded modules

86fedb2

bjorng force-pushed the bjorn/binary_to_integer/otp26/OTP-18659 branch from a769c1f to 86fedb2 Compare June 27, 2023 06:12

bjorng merged commit a37afd1 into erlang:maint Jun 28, 2023

bjorng deleted the bjorn/binary_to_integer/otp26/OTP-18659 branch September 18, 2023 06:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize binary_to_integer/1 and friends #7426

Optimize binary_to_integer/1 and friends #7426

bjorng commented Jun 21, 2023

mikpe Jun 21, 2023

bjorng Jun 21, 2023

mikpe Jun 21, 2023

rgrig Jun 21, 2023

mikpe Jun 21, 2023

bjorng Jun 22, 2023

bjorng commented Jun 21, 2023

rgrig Jun 21, 2023

bjorng Jun 22, 2023

rgrig Jun 21, 2023 •

edited

rgrig Jun 21, 2023 •

edited

bjorng Jun 22, 2023

rgrig Jun 26, 2023

bjorng Jun 26, 2023

rgrig Jun 21, 2023

bjorng Jun 22, 2023

rgrig Jun 21, 2023

bjorng Jun 22, 2023

michalmuskala Jun 25, 2023

bjorng Jun 26, 2023

Optimize binary_to_integer/1 and friends #7426

Optimize binary_to_integer/1 and friends #7426

Conversation

bjorng commented Jun 21, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bjorng commented Jun 21, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rgrig Jun 21, 2023 • edited

Choose a reason for hiding this comment

rgrig Jun 21, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rgrig Jun 21, 2023 •

edited

rgrig Jun 21, 2023 •

edited