New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize binary_to_integer/1 and friends #7426
Optimize binary_to_integer/1 and friends #7426
Conversation
When compiling 64-bit programs, goth gcc and clang support an 128-bit unsigned integer type. Using it to define the `ErtsDoubleDigit` type will speed up multiplication and division.
550768e
to
858c4f1
Compare
@@ -37,8 +37,11 @@ typedef Uint16 ErtsHalfDigit; | |||
#undef BIG_HAVE_DOUBLE_DIGIT | |||
typedef Uint16 ErtsHalfDigit; | |||
|
|||
#elif (SIZEOF_VOID_P == 8) && defined(__GNUC__) && (__GNUC__ >= 4) | |||
typedef __uint128_t ErtsDoubleDigit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this universally available on all 64-bit targets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be. I tried all 64-bit gcc and clang targets I could find on godbolt.org.
# define BIG_ARITY_MAX ((1 << 16)-1) | ||
#else | ||
# define BIG_ARITY_MAX ((1 << 17)-1) | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for this restriction? The commit doesn't say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and it looks a bit suspicious that the limit is larger on 64b architectures than on 32b architectures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the same in bytes though, as these values are in words (pointer-sized integers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We wanted to decrease the maximum execution time for a single arithmetic operation. We chose a new limit that we hope don't affect real applications of large integer arithmetic, but that will produce a system_limit
exception faster for accidental use of large integers.
I've pushed a fixup commit that simplifies the last part of the Karatsuba calculation. |
erts/preloaded/src/erlang.erl
Outdated
%% Too large even for base 2. | ||
erlang:exit(system_limit); | ||
Size > 1262611, Base >= 10 -> | ||
erlang:exit(system_limit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are these limits chosen?
Actually, what I'd like to know is this: what's the longest it can take (on a typical computer), within these limits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well under a second, at least on 64-bit systems.
On my slowest computer (MacBook Pro from 2013, 2.6 GHz, Intel Core i5) the conversion finishes in about 0.63 seconds. On a HP laptop running Linux (11th Gen Intel, i7-1185G7 @ 3.00 GHz) the time is 0.34 seconds.
Acc | ||
end. | ||
|
||
combine(L0, Radix) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for rewriting it this way! It's nice to have better asymptotics.
I wonder if it would be faster to revert to linear scan for relatively short integers. It has worse asymptotics but maybe better constants and more cache friendliness.
What I mean is this: instead of starting with single digits, start with numbers made out of N digits (where N can be tuned based on benchmarks, but probably in the region of 10 or 100). And to get these N-digit numbers, do the linear loop (with result:=10*result+digit), as it was before in C.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I see you start not from single digits but from what fits in a word. I suppose I'm asking if it's not worth doing linear scan algo on those "small" integers, as long as the scan isn't too long, because constants&cache.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The helper BIF will convert decimal numbers with up to 17 digits efficiently. We think that will take care of the majority of real-world use cases. Therefore, I have optimized the helper BIFs for correctness and simplicity.
However, after discussions with @jhogberg and @garazdawi I now think that spawning a temporary process is a bad idea and I have now eliminated that extra process. Not spawning the process will improve performance for binaries with more than 17 decimal digits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw, i tried my proposal on some toy code, and it's not only more complicated but also slower
The helper BIF will convert decimal numbers with up to 17 digits efficiently. We think that will take care of the majority of real-world use cases.
I think 64b integers are a common case, especially when interacting with other languages. Probably not worth optimizing for it at this point.
Not spawning the process will improve performance [...]
I didn't understand why there's a process spawned there, but I assumed it's my lack of Erlang knowledge. (Was it perhaps to make sure that timeouts apply to the main process even if the parsing process somwhow gets stuck?? Just a wild guess.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 64b integers are a common case, especially when interacting with other languages. Probably not worth optimizing for it at this point.
If someone can show real-world use cases where integers whose absolute value are greater than 99999999999999999 are frequently used we could optimize this in OTP 27.
I didn't understand why there's a process spawned there, but I assumed it's my lack of Erlang knowledge. (Was it perhaps to make sure that timeouts apply to the main process even if the parsing process somwhow gets stuck?? Just a wild guess.)
It was an attempt at optimization (reducing garbage collection times), but after some more thinking it became clear that it in most cases it would be a pessimization.
@@ -1371,15 +1543,52 @@ list_to_float(_String) -> | |||
%% list_to_integer/1 | |||
-spec list_to_integer(String) -> integer() when | |||
String :: string(). | |||
list_to_integer(_String) -> | |||
erlang:nif_error(undefined). | |||
list_to_integer(String) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious: Why is this not implemented as list_to_integer(String, 10)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would result in a stacktrace pointing out the wrong function if it fails.
{-1,""} = test_to_integer("-1"), | ||
{1,"="} = test_to_integer("1="), | ||
{7,"F"} = test_to_integer("7F"), | ||
{709,""} = test_to_integer("709"), | ||
{709,"*2"} = test_to_integer("709*2"), | ||
{0,"xAB"} = test_to_integer("0xAB"), | ||
{16,"#FF"} = test_to_integer("16#FF"), | ||
|
||
%% Test a bignum. | ||
Big = 12385792987438978973984398348974398593, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Would be nice to have more than one number. Maybe some powers of 2, powers of 10, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string:list_to_integer/1
is implemented using binary_to_integer/1
, which is thoroughly tested by num_bif_SUITE
. Here I just test that handling of signs and trailing non-digits are handled correctly.
2523e03
to
354cfe9
Compare
list_to_integer(String) -> | ||
Base = 10, | ||
case erts_internal:list_to_integer(String, Base) of | ||
{Int,[]} -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would make sense to just return the integer here to avoid the overhead of the unconditional allocation. The string
function could wrap the result in a tuple, so we only "pay" for the allocation where necessary.
Shouldn't be a big impact, but for such low-level functions IMO every little thing matters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a change I want to make now, in case someone is using the BIF directly despite it being undocumented.
4d6c483
to
a769c1f
Compare
This new 4 Mbit limit should still be sufficiently large for practical applications.
These will be needed in a future commit when we will rewrite some exsisting BIFs in Erlang.
Use a divide-and-conquer algorithm to lower the complexity for the BIFs that convert binaries or strings to integers: * erlang:list_to_integer/1 * erlang:binary_to_integer/1 * erlang:binary_to_integer/2 * erlang:list_to_integer/2 * string:list_to_integer/1
a769c1f
to
86fedb2
Compare
No description provided.