Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base64 Zig test has been added. #188

Closed
wants to merge 1 commit into from
Closed

Base64 Zig test has been added. #188

wants to merge 1 commit into from

Conversation

nuald
Copy link
Collaborator

@nuald nuald commented Oct 20, 2019

Partially addressing #185. Used test.c as a reference except using standard base64 implementation within the Zig language. I've tried both 0.5.0 and the master version - no differences whatsover, hence I've decided to use the master version as it looks like that for non-stable releases we use the nightly builds (like for V).

Copy link

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an improved version:

const std = @import("std");
const time = std.time;
const Timer = time.Timer;

const str_size = 10000000;
const tries = 100;

var str1: [str_size]u8 = undefined;
var str2: [std.base64.Base64Encoder.calcSize(str1.len)]u8 = undefined;
var str3: [str_size]u8 = undefined;

pub fn main() !void {
    var stdout_file = try std.io.getStdOut();
    const stdout = &stdout_file.outStream().stream;

    std.mem.set(u8, &str1, 'a');
    var timer = try Timer.start();
    var s: usize = 0;

    var i: usize = 0;
    var start = timer.lap();
    while (i < tries) : (i += 1) {
        std.base64.standard_encoder.encode(&str2, &str1);
        s += str2.len;
    }
    var end = timer.read();
    var elapsed_s = @intToFloat(f64, end - start) / time.ns_per_s;
    try stdout.print("encode: {}, {d:.2}\n", s, elapsed_s);

    i = 0;
    s = 0;
    start = timer.lap();
    while (i < tries) : (i += 1) {
        std.base64.standard_decoder.decode(&str3, &str2) catch unreachable;
        s += str3.len;
    }
    end = timer.read();
    elapsed_s = @intToFloat(f64, end - start) / time.ns_per_s;
    try stdout.print("decode: {}, {d:.2}\n", s, elapsed_s);
}

Results:

direct_allocator:
encode: 1333333600, 1.13
decode: 1000000000, 0.12

no allocator:
encode: 1333333600, 0.46
decode: 1000000000, 0.14

var i: usize = 0;
var start = timer.lap();
while (i < tries) : (i += 1) {
const str2 = try allocator.alloc(u8, std.base64.Base64Encoder.calcSize(str1.len));
Copy link

@andrewrk andrewrk Oct 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using std.heap.direct_allocator inside the hot path is no good. This is doing a mmap per iteration, no wonder it is so slow.

https://ziglang.org/documentation/master/#Choosing-an-Allocator

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's intentionally - the benchmark is about measuring allocating + coding/decoding at once. Surely, low level languages could allocate memory in advance, use memory pools and other advance techniques and it would give them an unfair advantage comparing with programming languages with GC.

Moreover, direct_allocator using mmap is an implementation detail that could change in the future - the same way as malloc uses either sbrk or mmap, the allocator could implement more advanced methods in the future, thus making the general use case (allocate a lot of memory & encode the blob into it) tested in this benchmark faster.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's intentionally - the benchmark is about measuring allocating + coding/decoding at once.

This is a poorly defined benchmark. Your malloc call in the C benchmark might or might not make a syscall, for example. The appropriate allocator for this code is none. If you want to measure allocation performance then you need to come up with an example where allocating heap memory actually makes sense. And again, your C code is not guaranteed either to make syscalls, because there is code that runs before main() which uses malloc/free.

Moreover, direct_allocator using mmap is an implementation detail that could change in the future

No it's not, it's defined to make a syscall for every allocation and free.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benchmark is about how average programmers would code the given task. Undoubtedly, you would write the optimized version, but the average Joe uses NPM package for left padding (https://www.davidhaney.io/npm-left-pad-have-we-forgotten-how-to-program/). Surely, I don't like the current state of the industry and would prefer everyone write the optimized and carefully thoughtful code.

The same is about other tests - like bf2. It's not optimized version, but it resembles the code that would average Joe would write. And good compilers try to optimize the code like that (a good example is the loop unrolling, in perfect world the compiler won't care about it as the decent developers could implement that on their own).

As for the direct_allocator, sorry, it's the first day I'm using your language, so I don't know all the peculiarities yet. I meant that in the future you may have some smart allocator by default that may have better heuristics than malloc's.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is a disingenuous argument. I did not write an optimized version. My version is clearly simpler than yours, both to read and to write.

it's the first day I'm using your language, so I don't know all the peculiarities yet

The zig community welcomes newbies with open arms. However, newbies should not be writing benchmarks that publicly represent the language, if they are not willing to read the documentation for the functions they are calling.

I suggest you take my advice to fix the benchmark because if you merge this pull request as is, your reputation and this project's reputation will be harmed, being associated with dishonest benchmarks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not using unnecessary memory allocations in a loop isn't "fancy optimizing," it's basic logic. I was going to comment that there was no way that the other implementations were this bad but some of them are. As is, these benchmarks are entirely useless, because you're not measuring the speed of the language, you're measuring completely different variables for different languages.

The base64 benchmark currently is more of a test of the allocator (Zig, C, etc) and/or VM/GC (JS, etc) than it is of actual base64 performance, and I have no doubt the rest of the benchmarks are equally off-base from what they claim to be.

If you're going to do this, at least document it. (edit: for instance, call it a "base64/allocation" benchmark even. That'd at least make it clear this wasn't just raw performance. Preferably, if "The benchmark is about how average programmers would code the given task," just say so. If you only claim that on a PR for a specific language, it doesn't look very good. Literally, just add "The benchmark is about how average programmers would code the given task, and does not reflect the performance of well-written, optimized code" to the README. Problem solved).

Copy link
Owner

@kostya kostya Oct 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as @nuald said, we measuring something like what real task would look like, if you not like this, i guess we not going to merge this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like what real task would look like

real task will not use std.heap.direct_allocator.

if you not like this, i guess we not going to merge this.

thank you

while (i < tries) : (i += 1) {
const str3 = try allocator.alloc(u8, try std.base64.standard_decoder.calcSize(str2));
defer allocator.free(str3);
try std.base64.standard_decoder.decode(str3, str2);
Copy link

@andrewrk andrewrk Oct 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, here, we can use catch unreachable (which you can see in my proposed code), because this is guaranteed to not fail, because the input just came from base64 encoded output and is therefore guaranteed to be valid. This provides additional guarantees to the optimizer. I did not notice performance differences in this case, however, but it is still correct, and could improve performance in future versions of LLVM.

@nuald nuald closed this Oct 21, 2019
@nuald nuald mentioned this pull request Aug 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants