Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERL-972: Long-Term SSL/TLS Performance Regression #4308

Closed
OTP-Maintainer opened this issue Jun 11, 2019 · 8 comments
Closed

ERL-972: Long-Term SSL/TLS Performance Regression #4308

OTP-Maintainer opened this issue Jun 11, 2019 · 8 comments
Labels
bug Issue is reported as a bug cannot reproduce Cannot be reproduced by OTP priority:medium team:PS Assigned to OTP team PS

Comments

@OTP-Maintainer
Copy link

Original reporter: ferd
Affected version: OTP-22.0.2
Component: ssl
Migrated from: https://bugs.erlang.org/browse/ERL-972


I was recently doing some fiddling with scripts and applications and noticed a rather significant performance regression when comparing older Erlang versions (19.3.1 and to a lesser extent 20.3.1) with newer ones (21.3.8.3 and 22.0).

All the measurements in this issue have been measured by running the scripts at https://github.com/ferd/snit/tree/perf-regression (the branch is important) with various versions, on Linux. The script runs on port 8080 and expects {{localhost}} to map to {{127.0.0.1}}.

1. Compile the {{snit}} application with rebar3 on OTP-19.3.1 (the last one I checked to be fast enough)
2. Run the application by calling {{rebar3 as demo shell}} and then booting the test server by calling {{snit_infinite_stream:run().}}
3. In another terminal window, run the client script by calling {{escript tls_script.erl}} on OTP-19.3.1, and note the delays per 10,000 packets sent after a few seconds
4. Stop the script
5. Change the version of the current terminal window (the one running the client) to OTP-22.0
6. run the client script by calling {{escript tls_script.erl}} on OTP-22.0, and note the delays per 10,000 packets sent after a few seconds

The end result is that, according to my experiments, the OTP-22 client is 2x-5x slower than the OTP-19 version. OTP-20.3.1 appeared to be a _bit_ slower than 19, but not significantly so.

For example, in my case, OTP-19.3.1 gives runs ranging from 100ms to 175ms per 10,000 packets sent over the loopback interface, and OTP-22.0 gives me ranges of 380ms to 600ms per 10,000 packets. The effect is more significant when packets are smaller, which to me points towards a cost impacted by packet handling rather than any cryptographic functions, but I haven't dug in too much detail.

I've also attached snapshots I had taken of the {{msacc}} values over 1 second that appeared representative of each. The more detailed view (with the dirty scheduler) are over OTP-22. You can particularly see port usage eating up stuff from the emulator.

I would usually not worry too much about small regressions, but 200%-500% performance decreases are worrisome.

This might also be related to https://bugs.erlang.org/browse/ERL-934 ? It was marked as fixed before 21.3.8.3, but I replicated the performance drop with it as well.
@OTP-Maintainer
Copy link
Author

ingela said:

What platform have you run the tests on? We will of course look into this and try to run your tests too.

@OTP-Maintainer
Copy link
Author

ferd said:

I ran these more directly on an old-ish VPS with 2GB RAM because that's the only one I had around:

$ uname -a
Linux stacktrace 2.6.32-042stable136.1 #1 SMP Wed Feb 27 09:04:24 MSK 2019 x86_64 x86_64 x86_64 GNU/Linux


That's the instance the results in my report come from. I've had similar (but not as dire) observations on a macbook air running OSX.

@OTP-Maintainer
Copy link
Author

ingela said:

Just wanted to make sure it was not windows as there is a bug on windows for active n that can manifest it self as performance degradation. 

@OTP-Maintainer
Copy link
Author

ingela said:

Trying to follow the instructions above I get the following result!

{code}

> rebar3 as demo shell

===> Verifying dependencies...
===> Compiling snit
Erlang/OTP 22 [erts-10.4.1] [source-898b9b35f2] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe]

Eshell V10.4.1  (abort with ^G)
1> snit_infinite_stream:run().
** exception error: undefined function snit_infinite_stream:run/0
2> 

{code}

@OTP-Maintainer
Copy link
Author

ferd said:

Did you check out the right branch (perf-regression)?  The file is there: https://github.com/ferd/snit/blob/perf-regression/src/snit_infinite_stream.erl#L57

I only used `snit` as a test bed because it would normalize all the options for SSL over many versions for me, but that's the one relevant file.

@OTP-Maintainer
Copy link
Author

ingela said:

Ah, I checked your test program and I found a problem with it. That it is it treats the TCP stream as if it was packet based which it is not. 

If you change your test to receiving a certain amount of bytes each time in the loop instead of whatever is currently available you will get a different result.


{code}
loop(S, N, T0) ->
    case ssl:recv(S, 1000) of
        {ok, _Data} ->
            case N >= 10000 of
                true ->
                    T1 = erlang:monotonic_time(millisecond),
                    io:format("10000 packets in ~pms~n", [T1-T0]),
                    loop(S, 0, T1);
                false ->
                    loop(S, N+1, T0)
            end;
        {error, R} ->
            io:format("Stopping for reason ~p~n", [R]),
            halt(0)
    end.

{code}

@OTP-Maintainer
Copy link
Author

ferd said:

You're right. I've tried the alternative of just counting bytes and seeing how long per N kilobytes (to avoid accidentally testing the accumulation rather than low-latency "give what you have" path), and the performance across OTP-19 and OTP-22 is roughly the same. 

OTP-22 seemed a bit more "jumpy", but that could really well be small changes on scheduling or BIF yielding, considering I'm testing on a single process and a node as a whole might be healthier.

You can consider this closed. I guess I've just been surprised by the change of packet handling mechanism. I don't think any code I use would be hurt by that, but that's a subtle difference :)

@OTP-Maintainer
Copy link
Author

ingela said:

We have done quite a lot of changes to increase throughput lately, especially for Erlang distribution over TLS, but many changes should have a positive impact on normal TLS connections too. Although depending on the use case it might not be a big improvement. 

But timing has definitely changed a lot, and  I would say the old implementation was probably
more predictable but had higher latency.

@OTP-Maintainer OTP-Maintainer added cannot reproduce Cannot be reproduced by OTP bug Issue is reported as a bug team:PS Assigned to OTP team PS priority:medium labels Feb 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug cannot reproduce Cannot be reproduced by OTP priority:medium team:PS Assigned to OTP team PS
Projects
None yet
Development

No branches or pull requests

1 participant