Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERL-366: Given large binaries, base64:mime_decode/1 has 58% throughput of base64:decode/1 #3455

Closed
OTP-Maintainer opened this issue Feb 23, 2017 · 3 comments
Assignees
Labels
bug Issue is reported as a bug priority:low team:VM Assigned to OTP team VM
Milestone

Comments

@OTP-Maintainer
Copy link

Original reporter: jessestimpson
Affected version: OTP-20.0
Fixed in version: OTP-20.0
Component: stdlib
Migrated from: https://bugs.erlang.org/browse/ERL-366


While profiling my application, which handles many large base64 binaries from an external entity (it's an SMTP server), I noticed that base64:mime_decode/1 is significantly slower than base64:decode/1.

On my local machine, I've produced the following timings from a 400000 byte base64-encoded binary. I decoded it 1000 times and computed the average.

{{Function                | Avg exec   |   Throughput
------------------------+------------+---------------
base64:decode/1         | 0.017 s    |   22.502 MiB/s
base64:mime_decode/1    | 0.029 s    |   12.953 MiB/s}}

CPU: 2.6 GHz Intel Core i7
Memory: 16 GB 1600 MHz DDR3
OS: macOS 10.11.4
Erlang: Erlang/OTP 20 [DEVELOPMENT] [erts-8.2.2].

You can see here that the throughput is at 58%. I've attached the module I used to produce these results.

I dove into the source and saw that mime_decode isn't very much doing more work, but the implementation is structurally different. I have a hunch that the compiler is unable to fully optimize with the implementation structured this way.

I plan on submitting a Pull Request that includes a discussion of this implementation, and my suggestions, but I also wanted to include a short version here, so you can see the full picture in one place.

I was able to isolate the performance degradation to the pattern matching that occurs after the call to function tail_contains_more/2. When I restructured the implementation of mime_decode/1 to avoid this, I saw a dramatic improvement, using the same test module:

{{Function                | Avg exec   |   Throughput
------------------------+------------+---------------
base64:decode/1         | 0.018 s    |   21.170 MiB/s
base64:mime_decode/1    | 0.016 s    |   23.519 MiB/s}}

This will be my first Pull Request to Erlang/OTP, so it may take me some time to format it correctly, run the tests, etc. Please bear with me as I get up to speed!
@OTP-Maintainer
Copy link
Author

bjorn said:

Thanks!

We will wait for your pull request.

A tip: You can use the option {{bin_opt_info}} to verify whether optimizations of binary matching are applied or not (see the Efficiency Guide).

@OTP-Maintainer
Copy link
Author

jessestimpson said:

Indeed, compiling with bin_opt_info reveals the problem

lib/stdlib/src/base64.erl:220: Warning: NOT OPTIMIZED: called function tail_contains_more/2 does not begin with a suitable binary matching instruction

Thanks so much for the tip! I will include this information in the PR.

@OTP-Maintainer
Copy link
Author

jessestimpson said:

My PR can be found here:
https://github.com/erlang/otp/pull/1354

@OTP-Maintainer OTP-Maintainer added bug Issue is reported as a bug team:VM Assigned to OTP team VM priority:low labels Feb 10, 2021
@OTP-Maintainer OTP-Maintainer added this to the OTP-20.0 milestone Feb 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug priority:low team:VM Assigned to OTP team VM
Projects
None yet
Development

No branches or pull requests

2 participants