-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inline masking #83
Inline masking #83
Conversation
Inlining the (trivial) |
On second thought, I've pushed UTF-8 validation improvements upstream: elixir-lang/elixir#12354. This looks great! Merging! |
Bad news: this is actually quite a bit slower on x86: Benchmark
I've rescinded the upstream PR, and think I'll likely back this one out too. WDYT? |
In general it's surprising how often things end up benching differently between Apple Silicon and x86. |
Looks like it is not just between Apple and x86, but even between Intel and AMD: AMD BenchmarkOperating System: Windows
CPU Information: AMD Ryzen 7 3700X 8-Core Processor
Number of Available Cores: 16
Available memory: 15.93 GB
Elixir 1.13.2
Erlang 24.1.7
##### With input huge #####
Name ips average deviation median 99th %
adaptive_512 168.38 5.94 ms ┬▒5.96% 5.94 ms 7.17 ms
new_512 161.73 6.18 ms ┬▒5.81% 6.25 ms 7.48 ms
old 27.98 35.74 ms ┬▒2.79% 35.53 ms 39.01 ms
Comparison:
adaptive_512 168.38
new_512 161.73 - 1.04x slower +0.24 ms
old 27.98 - 6.02x slower +29.80 ms
Memory usage statistics:
Name Memory usage
adaptive_512 4.11 MB
new_512 4.41 MB - 1.07x memory usage +0.30 MB
old 22.89 MB - 5.57x memory usage +18.78 MB
**All measurements for memory usage were the same**
##### With input large #####
Name ips average deviation median 99th %
adaptive_512 1.70 K 587.95 ╬╝s ┬▒19.27% 614.40 ╬╝s 921.60 ╬╝s
new_512 1.63 K 611.64 ╬╝s ┬▒16.68% 614.40 ╬╝s 921.60 ╬╝s
old 0.37 K 2708.38 ╬╝s ┬▒7.54% 2662.40 ╬╝s 3481.60 ╬╝s
Comparison:
adaptive_512 1.70 K
new_512 1.63 K - 1.04x slower +23.69 ╬╝s
old 0.37 K - 4.61x slower +2120.43 ╬╝s
Memory usage statistics:
Name Memory usage
adaptive_512 422.02 KB
new_512 452.69 KB - 1.07x memory usage +30.66 KB
old 2344.01 KB - 5.55x memory usage +1921.98 KB
**All measurements for memory usage were the same**
##### With input medium #####
Name ips average deviation median 99th %
adaptive_512 15.27 K 65.48 ╬╝s ┬▒18.60% 61.44 ╬╝s 92.16 ╬╝s
new_512 14.89 K 67.18 ╬╝s ┬▒192.73% 0 ╬╝s 819.20 ╬╝s
old 3.53 K 283.30 ╬╝s ┬▒48.07% 307.20 ╬╝s 614.40 ╬╝s
Comparison:
adaptive_512 15.27 K
new_512 14.89 K - 1.03x slower +1.69 ╬╝s
old 3.53 K - 4.33x slower +217.82 ╬╝s
Memory usage statistics:
Name Memory usage
adaptive_512 42.77 KB
new_512 45.89 KB - 1.07x memory usage +3.13 KB
old 234.74 KB - 5.49x memory usage +191.98 KB
**All measurements for memory usage were the same**
##### With input micro #####
Name ips average deviation median 99th %
old 1.31 M 761.90 ns ┬▒26.03% 716.80 ns 1228.80 ns
new_512 1.12 M 890.21 ns ┬▒24.17% 921.60 ns 1331.20 ns
adaptive_512 0.99 M 1005.94 ns ┬▒210.98% 1024 ns 12288 ns
Comparison:
old 1.31 M
new_512 1.12 M - 1.17x slower +128.31 ns
adaptive_512 0.99 M - 1.32x slower +244.05 ns
Memory usage statistics:
Name Memory usage
old 528 B
new_512 608 B - 1.15x memory usage +80 B
adaptive_512 568 B - 1.08x memory usage +40 B
**All measurements for memory usage were the same**
##### With input small #####
Name ips average deviation median 99th %
new_512 108.55 K 9.21 ╬╝s ┬▒215.74% 10.24 ╬╝s 122.88 ╬╝s
adaptive_512 81.60 K 12.26 ╬╝s ┬▒1243.51% 0 ╬╝s 102.40 ╬╝s
old 29.47 K 33.93 ╬╝s ┬▒48.36% 30.72 ╬╝s 71.68 ╬╝s
Comparison:
new_512 108.55 K
adaptive_512 81.60 K - 1.33x slower +3.04 ╬╝s
old 29.47 K - 3.68x slower +24.72 ╬╝s
Memory usage statistics:
Name Memory usage
new_512 5.70 KB
adaptive_512 5.23 KB - 0.92x memory usage -0.46875 KB
old 23.78 KB - 4.18x memory usage +18.09 KB
**All measurements for memory usage were the same**
##### With input tiny #####
Name ips average deviation median 99th %
adaptive_512 460.18 K 2.17 ╬╝s ┬▒91.32% 1.02 ╬╝s 9.22 ╬╝s
new_512 452.29 K 2.21 ╬╝s ┬▒84.12% 1.02 ╬╝s 8.19 ╬╝s
old 241.77 K 4.14 ╬╝s ┬▒35.99% 4.10 ╬╝s 8.19 ╬╝s
Comparison:
adaptive_512 460.18 K
new_512 452.29 K - 1.02x slower +0.0379 ╬╝s
old 241.77 K - 1.90x slower +1.96 ╬╝s
Memory usage statistics:
Name Memory usage
adaptive_512 1.36 KB
new_512 1.55 KB - 1.14x memory usage +0.195 KB
old 2.69 KB - 1.98x memory usage +1.33 KB
**All measurements for memory usage were the same** |
Wild. In any case there's enough of a downside on enough platforms that I think we ought to pull this. Too bad. 😞 |
One thing that jumps out is the difference in available cores. The Apple test shows 10 cores, AMD shows 16 cores, and Intel Xeon shows 2 cores. I don't know enough about the benchmark you're running but if it's heavy on parallelization that might explain the differences. |
Reverted at 934106f |
Inlining the mask function seems to yield some more performance benefits.
Benchmark