Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance OTP27 on binary_to_term slower #8329

Open
ThomasArts opened this issue Apr 2, 2024 · 11 comments
Open

Performance OTP27 on binary_to_term slower #8329

ThomasArts opened this issue Apr 2, 2024 · 11 comments
Assignees
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM

Comments

@ThomasArts
Copy link
Contributor

Describe the bug
It seems that binary_to_term got slower.

To Reproduce
I used the following simple way to generate a rather large term and then N copies of that term as a binary in a list.
On each of the copies we run binary_to_term. Expectation is that it is equally fast as for OTP-26.1.2... but it isn't.

On OTP26, roughly:

Erlang/OTP 26 [erts-14.2.1] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [jit]

Eshell V14.2.1 (press Ctrl+G to abort, type help(). for help)
3> [ppp:test(100) || _ <- lists:seq(1, 10)], ok.
Took 108.98 ms
Took 80.54 ms
Took 80.28 ms
Took 80.18 ms
...
ok
4> [ppp:test(1000) || _ <- lists:seq(1, 10)], ok.
Took 921.97 ms
Took 908.09 ms
Took 907.46 ms
...
ok
5> [ppp:test(10000) || _ <- lists:seq(1, 10)], ok.
Took 12371.10 ms
Took 12916.74 ms
Took 12741.55 ms
...

But on the latest master: source-8504d0e0b8 the 10k test gets much slower!

Erlang/OTP 27 [RELEASE CANDIDATE 2] [erts-14.2.3] [source-8504d0e0b8] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [jit]

Eshell V14.2.3 (press Ctrl+G to abort, type help(). for help)
1> [ppp:test(100) || _ <- lists:seq(1, 10)], ok.
Took 104.34 ms
Took 96.72 ms
Took 97.04 ms
...
ok
2> [ppp:test(1000) || _ <- lists:seq(1, 10)], ok.
Took 1083.05 ms
Took 1079.13 ms
Took 1075.11 ms
...
ok
3> [ppp:test(10000) || _ <- lists:seq(1, 10)], ok.
Took 14786.23 ms
Took 14768.29 ms
Took 15070.95 ms
...

Test program

-module(ppp).
-compile([export_all, nowarn_export_all]).

test(N) ->
  erlang:garbage_collect(),
  Terms = terms(10, 3),
  %% N copies of the same binary
  Bins = lists:duplicate(N, term_to_binary(Terms)),
  {T, _} = timer:tc(fun() -> b2t(Bins) end),
  io:format("Took ~.2f ms\n", [T / 1000]).

terms(L, 0) ->
  [ {I, crypto:strong_rand_bytes(8)} || I <- lists:seq(1, L)];
terms(L, Depth) ->
  [ terms(L, Depth-1) || _ <- lists:seq(1, L) ].

b2t([]) ->
  ok;
b2t([B | Bs]) ->
  T = binary_to_term(B),
  [T | b2t(Bs)].

Expected behavior
Equal speeds are expected

Affected versions
This seems introduced in commit: 24ef4cb for OTP27.
The last commit with faster times is: 49024e8

Additional context
This is observed when testing riak and may be related to issue: #8229

@ThomasArts ThomasArts added the bug Issue is reported as a bug label Apr 2, 2024
@ThomasArts ThomasArts changed the title Performance OTP27 on binary_to_term slower and heavy non-linear Performance OTP27 on binary_to_term slower Apr 2, 2024
@jhogberg jhogberg self-assigned this Apr 2, 2024
@jhogberg jhogberg added the team:VM Assigned to OTP team VM label Apr 2, 2024
@jhogberg
Copy link
Contributor

jhogberg commented Apr 2, 2024

Thanks, I have a rough idea of what might have caused this. I'll have a deeper look at this later in the week. :-)

@jhogberg
Copy link
Contributor

jhogberg commented Apr 4, 2024

I can reproduce a slight slowdown with 24ef4cb, but I'm really puzzled to see it on 8504d0e as 3fcd74c fixes it completely for me.

What platform is this? Have you turned off all kinds of CPU frequency scaling?

@jhogberg
Copy link
Contributor

jhogberg commented Apr 5, 2024

I've opened #8347 that should improve things a bit, I'm seeing slightly better results locally but I'm still a bit puzzled as to why it was so much slower on your machine.

@mkuratczyk
Copy link
Contributor

I've run Thomas's test on my machines:

Linux, i9-10900X CPU @ 3.70GHz 26.2.3 master pr8347
100 139 143 123
1000 1580 1615 1780
10000 18500 18160 17250
M1 Pro, macOS 14.4.1 26.2.2 master pr8347
100 170 174 173
1000 2240 2330 2340
10000 29400 30500 30400

@ThomasArts
Copy link
Contributor Author

ThomasArts commented Apr 8, 2024

Just to be sure I did run test again on Mac M2 Pro, Ventura 13.6 with results comparable to earlier run:

M2 Pro, macOS 13.6 (32 GB) 26.2.1 26.2.2 master (8504d0e) pr8347 (436568a)
100 80 81 95 95
1000 920 930 1050 1060
10000 13400 13400 15700 15200

I'll try to build pr8347? as well to see if I can relate to the above numbers.
It might of course, be something M2 specific. I built from source.

@mkuratczyk
Copy link
Contributor

This is tangental to the main topic here, but how come your M2 Pro is twice as fast as my M1 Pro @ThomasArts? :)
According to generic benchmarks, there should be no more than 10-20% performance difference between these chips. After I saw your results I compiled Erlang without extra microstate accounting and without DTrace support, to avoid any overhead from these features. I also made sure to test without using the shell (erlperf -pa . 'ppp:test(100).') but I still get ~170ms results. Do you do anything special when running your tests? What flags have you compiler Erlang with?

I don't see you on the Erlanger slack. If you don't mind joining and chatting there for a bit, I'd appreciate that. Hopefully we can all learn something about Erlang/OTP performance on Apple Silicon (or in general, based on what we find).

@bjorng
Copy link
Contributor

bjorng commented Apr 8, 2024

This is tangental to the main topic here, but how come your M2 Pro is twice as fast as my M1 Pro

Have you compiled all Erlang code you are running with the Erlang compiler in Erlang/OTP 27?

The format of the type information in BEAM files are different in OTP 26 and OTP 27. If the format of type information is not the expected one, the JIT cannot do any type-guided optimizations, and will potentially emit worse code.

This is just a wild guess.

@mkuratczyk
Copy link
Contributor

Yes, I compile the code with the version I use at runtime. I built from master today, compiled ppp.erl with this version, ran it with erlperf as above and my times are ~170ms, which is like twice what Thomas gets.

@mkuratczyk
Copy link
Contributor

mkuratczyk commented Apr 8, 2024

Oh, I found the culprit. When debugging a different issue recently I set the CFLAGS to -O0. After setting back to -O3, I get the ~87ms.

Just as an anecdote: the issue I was debugging turned out to be a problem running Docker image built for x86 on an ARM Macbook. Thanks to all the Docker/Rosetta2 magic that generally works, but recently we made some changes in RabbitMQ that triggered some bug probably (in Rosetta2 I guess? Not sure, too much low-level magic). Anyway, we get some crazy failures where the stacktraces reported don't match the source code, which is why I suspected that maybe GCC optimisations had something to do with that.
The problem still exists, but since one needs to run an "incorrect" architecture of an image, I haven't reported it. Instead, we are working on having ARM builds of the Docker images in our pipelines (they've been available for the final releases for a long time, I'm just talking about our internal builds).

@max-au
Copy link
Contributor

max-au commented Apr 15, 2024

I'm interested in figuring out a way to tell options that were used to build ERTS. Specifically, I want erlperf to output some sort of a warning "your ERTS is built with these flags". That'd prevent situations like one provided above.

@nickva
Copy link
Contributor

nickva commented Apr 15, 2024

@max-au there is erlang:system_info(compile_info) that returns a proplist with cflags, ldflags and config_h flags. Maybe that could help?

1> proplists:get_value(cflags, erlang:system_info(compile_info)).
"-Werror=undef -Werror=implicit -Werror=return-type  -fno-common -g -O2 
-I...asdf_24.3.4.16/otp_src_24.3.4.16/erts/x86_64-apple-darwin22.6.0    
-DHAVE_CONFIG_H 
-Wall -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Wdeclaration-after-statement 
-DUSE_THREADS -D_THREAD_SAFE -D_REENTRANT -DPOSIX_THREADS   -DBEAMASM=1"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM
Projects
None yet
Development

No branches or pull requests

6 participants