-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Base64 Zig test has been added. #188
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is an improved version:
const std = @import("std");
const time = std.time;
const Timer = time.Timer;
const str_size = 10000000;
const tries = 100;
var str1: [str_size]u8 = undefined;
var str2: [std.base64.Base64Encoder.calcSize(str1.len)]u8 = undefined;
var str3: [str_size]u8 = undefined;
pub fn main() !void {
var stdout_file = try std.io.getStdOut();
const stdout = &stdout_file.outStream().stream;
std.mem.set(u8, &str1, 'a');
var timer = try Timer.start();
var s: usize = 0;
var i: usize = 0;
var start = timer.lap();
while (i < tries) : (i += 1) {
std.base64.standard_encoder.encode(&str2, &str1);
s += str2.len;
}
var end = timer.read();
var elapsed_s = @intToFloat(f64, end - start) / time.ns_per_s;
try stdout.print("encode: {}, {d:.2}\n", s, elapsed_s);
i = 0;
s = 0;
start = timer.lap();
while (i < tries) : (i += 1) {
std.base64.standard_decoder.decode(&str3, &str2) catch unreachable;
s += str3.len;
}
end = timer.read();
elapsed_s = @intToFloat(f64, end - start) / time.ns_per_s;
try stdout.print("decode: {}, {d:.2}\n", s, elapsed_s);
}
Results:
direct_allocator:
encode: 1333333600, 1.13
decode: 1000000000, 0.12
no allocator:
encode: 1333333600, 0.46
decode: 1000000000, 0.14
var i: usize = 0; | ||
var start = timer.lap(); | ||
while (i < tries) : (i += 1) { | ||
const str2 = try allocator.alloc(u8, std.base64.Base64Encoder.calcSize(str1.len)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using std.heap.direct_allocator
inside the hot path is no good. This is doing a mmap
per iteration, no wonder it is so slow.
https://ziglang.org/documentation/master/#Choosing-an-Allocator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's intentionally - the benchmark is about measuring allocating + coding/decoding at once. Surely, low level languages could allocate memory in advance, use memory pools and other advance techniques and it would give them an unfair advantage comparing with programming languages with GC.
Moreover, direct_allocator
using mmap
is an implementation detail that could change in the future - the same way as malloc
uses either sbrk
or mmap
, the allocator could implement more advanced methods in the future, thus making the general use case (allocate a lot of memory & encode the blob into it) tested in this benchmark faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's intentionally - the benchmark is about measuring allocating + coding/decoding at once.
This is a poorly defined benchmark. Your malloc
call in the C benchmark might or might not make a syscall, for example. The appropriate allocator for this code is none. If you want to measure allocation performance then you need to come up with an example where allocating heap memory actually makes sense. And again, your C code is not guaranteed either to make syscalls, because there is code that runs before main() which uses malloc/free.
Moreover,
direct_allocator
usingmmap
is an implementation detail that could change in the future
No it's not, it's defined to make a syscall for every allocation and free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benchmark is about how average programmers would code the given task. Undoubtedly, you would write the optimized version, but the average Joe uses NPM package for left padding (https://www.davidhaney.io/npm-left-pad-have-we-forgotten-how-to-program/). Surely, I don't like the current state of the industry and would prefer everyone write the optimized and carefully thoughtful code.
The same is about other tests - like bf2. It's not optimized version, but it resembles the code that would average Joe would write. And good compilers try to optimize the code like that (a good example is the loop unrolling, in perfect world the compiler won't care about it as the decent developers could implement that on their own).
As for the direct_allocator
, sorry, it's the first day I'm using your language, so I don't know all the peculiarities yet. I meant that in the future you may have some smart allocator by default that may have better heuristics than malloc
's.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is a disingenuous argument. I did not write an optimized version. My version is clearly simpler than yours, both to read and to write.
it's the first day I'm using your language, so I don't know all the peculiarities yet
The zig community welcomes newbies with open arms. However, newbies should not be writing benchmarks that publicly represent the language, if they are not willing to read the documentation for the functions they are calling.
I suggest you take my advice to fix the benchmark because if you merge this pull request as is, your reputation and this project's reputation will be harmed, being associated with dishonest benchmarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not using unnecessary memory allocations in a loop isn't "fancy optimizing," it's basic logic. I was going to comment that there was no way that the other implementations were this bad but some of them are. As is, these benchmarks are entirely useless, because you're not measuring the speed of the language, you're measuring completely different variables for different languages.
The base64 benchmark currently is more of a test of the allocator (Zig, C, etc) and/or VM/GC (JS, etc) than it is of actual base64 performance, and I have no doubt the rest of the benchmarks are equally off-base from what they claim to be.
If you're going to do this, at least document it. (edit: for instance, call it a "base64/allocation" benchmark even. That'd at least make it clear this wasn't just raw performance. Preferably, if "The benchmark is about how average programmers would code the given task," just say so. If you only claim that on a PR for a specific language, it doesn't look very good. Literally, just add "The benchmark is about how average programmers would code the given task, and does not reflect the performance of well-written, optimized code" to the README. Problem solved).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as @nuald said, we measuring something like what real task would look like, if you not like this, i guess we not going to merge this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like what real task would look like
real task will not use std.heap.direct_allocator
.
if you not like this, i guess we not going to merge this.
thank you
while (i < tries) : (i += 1) { | ||
const str3 = try allocator.alloc(u8, try std.base64.standard_decoder.calcSize(str2)); | ||
defer allocator.free(str3); | ||
try std.base64.standard_decoder.decode(str3, str2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, here, we can use catch unreachable
(which you can see in my proposed code), because this is guaranteed to not fail, because the input just came from base64 encoded output and is therefore guaranteed to be valid. This provides additional guarantees to the optimizer. I did not notice performance differences in this case, however, but it is still correct, and could improve performance in future versions of LLVM.
Partially addressing #185. Used test.c as a reference except using standard base64 implementation within the Zig language. I've tried both 0.5.0 and the master version - no differences whatsover, hence I've decided to use the master version as it looks like that for non-stable releases we use the nightly builds (like for V).