C implementation for addition to benchmark #53

nkurz · 2014-12-22T23:35:50Z

I offer a version written in C for inclusion in the benchmark:
https://gist.github.com/nkurz/5e49ba0ddb04e23de03f

I've only tested on Haswell under Linux, but results are good when compiled with gcc-4.8 -O2:
// 8981 LANGUAGE C 623
// 8981 LANGUAGE C++/clang 734
// 8981 LANGUAGE C++/gcc 755

I have tested, but I think the implementation should continue to perform well with larger graph sizes.

logicchains · 2014-12-23T11:59:31Z

Cool, I've added it to the makefile and run script. I found the Use Highbit one to be fastest.

nkurz · 2014-12-26T22:01:09Z

It might be interesting to add the following x64 benchmarks to the writeup. These were done on a Haswell i7-4770 CPU @ 3.40GHz (Turbo Off, Hyperthreading Off) running Linux 3.13, which I think is a representative current generation desktop processor.

nate@haswell:~/git/LPATHBench$ make run-all
for prog in "fs.exe" "./cpp_gcc" "./cpp_clang" "./cpp_cached" "java jv" "./ml" "./lisp" "./rs" "./rs_unsafe" "./go" "./gccgo" "./d-dmd" "./nim" "luajit lj.lua" "./f03" "./c-highbit" "./c-bitmap" "./c-branchless" ; do $prog; done

8981 LANGUAGE C++Cached 18
8981 LANGUAGE C/branchless 618
8981 LANGUAGE C/highbit 749
8981 LANGUAGE C++/gcc 755
8981 LANGUAGE C++/clang 735
8981 LANGUAGE Nim 756
8981 LANGUAGE Rust 877
8981 LANGUAGE RustUnsafe 869
8981 LANGUAGE C/bitmap 899
8981 LANGUAGE Java 909
8981 LANGUAGE Fortran_2003 986
8981 LANGUAGE Ocaml 997
8981 LANGUAGE Lisp 1004
8981 LANGUAGE Go 1021
8981 LANGUAGE FSharp 1088
8981 LANGUAGE D 1203
8981 LANGUAGE GCCGo 1502
8981 LANGUAGE LuaJit 2088

My main conclusions would be:

Modern processors have improved more than clockspeed alone might imply.
Implementation and algorithm probably matter more than choice of language.

I feel the C/branchless implementation I did is well optimized for this processor. My guess would be that assembly tweaks might get it down to 500 ms, but probably not much more. It's fairly incredible to me that the much higher level languages are able to stay as close to this speed as they do.

logicchains · 2014-12-27T11:47:08Z

I added them. I don't suppose you have a different D compiler handy? Dmd is
generally the slowest by a significant margin, so it's not the fairest for
benchmarking

While language doesn't seem to matter, note that fast code was much easier
to write in D/C++/Rust than in the managed languages. C# for instance
required changing a foreach-style loop to a for(i = 0; i< len; i++) style
loop, as C# iterators aren't cost-free. Java required changing the node
class to an []int, to avoid boxing. Common Lisp required adding many type
annotations and it wasn't initially obvious which to add.

Still pretty impressive how fast the higher-level languages are though.

On 27 December 2014 at 09:01, nkurz notifications@github.com wrote:

It might be interesting to add the following x64 benchmarks to the
writeup. These were done on a Haswell i7-4770 CPU @ 3.40GHz (Turbo Off,
Hyperthreading Off) running Linux 3.13, which I think is a representative
current generation desktop processor.

nate@haswell:~/git/LPATHBench$ make run-all
for prog in "fs.exe" "./cpp_gcc" "./cpp_clang" "./cpp_cached" "java jv"
"./ml" "./lisp" "./rs" "./rs_unsafe" "./go" "./gccgo" "./d-dmd" "./nim"
"luajit lj.lua" "./f03" "./c-highbit" "./c-bitmap" "./c-branchless" ; do
$prog; done

8981 LANGUAGE C++Cached 18
8981 LANGUAGE C/branchless 618
8981 LANGUAGE C/highbit 749
8981 LANGUAGE C++/gcc 755
8981 LANGUAGE C++/clang 735
8981 LANGUAGE Nim 756
8981 LANGUAGE Rust 877
8981 LANGUAGE RustUnsafe 869
8981 LANGUAGE C/bitmap 899
8981 LANGUAGE Java 909
8981 LANGUAGE Fortran_2003 986
8981 LANGUAGE Ocaml 997
8981 LANGUAGE Lisp 1004
8981 LANGUAGE Go 1021
8981 LANGUAGE FSharp 1088
8981 LANGUAGE D 1203
8981 LANGUAGE GCCGo 1502
8981 LANGUAGE LuaJit 2088

My main conclusions would be:

Modern processors have improved more than clockspeed alone might imply.

Implementation and algorithm probably matter more than choice of
language.

I feel the C/branchless implementation I did is well optimized for this
processor. My guess would be that assembly tweaks might get it down to 500
ms, but probably not much more. It's fairly incredible to me that the much
higher level languages are able to stay as close to this speed as they do.

—
Reply to this email directly or view it on GitHub
#53 (comment)
.

nkurz · 2014-12-27T18:39:07Z

While language doesn't seem to matter, note that fast code was much easier
to write in D/C++/Rust than in the managed languages.

I'm not sure. I'd guess there are more ways to do have poor performance in each language than there are to do it fast, and it will always be a debate as to which approach is most idiomatic. Also, I think that choice of which is fastest will depend on the target architecture and change with compiler/interpreter versions. It would be interesting to have many versions written in each language to show the spread of performance within that language for plausible implementations, and then plot with the overlaps.

I don't suppose you have a different D compiler handy? Dmd is
generally the slowest by a significant margin, so it's not the fairest for
benchmarking

I'm installing compilers to try this, and can easily try to install others. I started with gdc-4.8, but failed with the following error:

gdc d.d -o d-ddc -O3 -frelease -finline -fno-bounds-check
d.d:26: error: pure function 'd.readPlaces' cannot call impure function 'std.algorithm.splitter!(string, string).splitter'
d.d:26: error: pure function 'd.readPlaces' cannot call impure function 'std.array.array!(Result).array'

I didn't know if this was a version problem, something easy to change in the source, or a known incompatibility. I can try again if you suggest a fix.

But just tried ldc2, and indeed the results are among the fastest

8981 LANGUAGE D 1201  // dmd DMD64 D Compiler v2.066.1
8981 LANGUAGE D 739 // ldc2 0.15.1 (DMD v2.066.1, LLVM 3.5.0)

While I've got you, one more language I'm unfamiliar with but had errors was Julia.

nate@haswell:~/git/LPATHBench$ julia -v
julia version 0.2.1
nate@haswell:~/git/LPATHBench$ julia julia.jl
ERROR: no method Node(Array{None,1},)
 in readPlaces at /home/nate/git/LPATHBench/julia.jl:16
 in open at io.jl:342
 in main at /home/nate/git/LPATHBench/julia.jl:45
 in include at boot.jl:238
 at /home/nate/git/LPATHBench/julia.jl:55

Do you know if this means I should move up or down in version? Or if there is an easy fix?

logicchains · 2014-12-28T09:56:41Z

It would be interesting to have many versions written in each language to
show the spread of performance within that language for plausible
implementations, and then plot with the overlaps.

With the next benchmark I might take a more functional approach like this,
leaving older versions in rather than replacing them. The issue would be
deciding which versions were worth keeping, and how to give them meaningful
names so that readers could distinguish between two different, similar
implementations in the same language.

That gdc issue could be fixed either by updating gdc, or by removing all
'pure' annotations from the code. Presumably 4.8 has different, impure
implementations of a couple of functions that are marked pure in more
recent versions, so it's not possible to mark a function calling those
functions pure in 4.8.

I don't have much experience with Julia either; someone else submitted that
implementation. So the only way to fix it would probably be to upgrade; I'm
currently using version 0.3.3, which I believe has had a fair few changes
since 0.2.1.

On 28 December 2014 at 05:39, nkurz notifications@github.com wrote:

While language doesn't seem to matter, note that fast code was much easier
to write in D/C++/Rust than in the managed languages.

I'm not sure. I'd guess there are more ways to do have poor performance in
each language than there are to do it fast, and it will always be a debate
as to which approach is most idiomatic. Also, I think that choice of which
is fastest will depend on the target architecture and change with
compiler/interpreter versions. It would be interesting to have many
versions written in each language to show the spread of performance within
that language for plausible implementations, and then plot with the
overlaps.

I don't suppose you have a different D compiler handy? Dmd is
generally the slowest by a significant margin, so it's not the fairest for
benchmarking

I'm installing compilers to try this, and can easily try to install
others. I started with gdc-4.8, but failed with the following error:

gdc d.d -o d-ddc -O3 -frelease -finline -fno-bounds-check
d.d:26: error: pure function 'd.readPlaces' cannot call impure function 'std.algorithm.splitter!(string, string).splitter'
d.d:26: error: pure function 'd.readPlaces' cannot call impure function 'std.array.array!(Result).array'

I didn't know if this was a version problem, something easy to change in
the source, or a known incompatibility. I can try again if you suggest a
fix.

But just tried ldc2, and indeed the results are among the fastest

8981 LANGUAGE D 1201 // dmd DMD64 D Compiler v2.066.1
8981 LANGUAGE D 739 // ldc2 0.15.1 (DMD v2.066.1, LLVM 3.5.0)

While I've got you, one more language I'm unfamiliar with but had errors
was Julia.

nate@haswell:/git/LPATHBench$ julia -v
julia version 0.2.1
nate@haswell:/git/LPATHBench$ julia julia.jl
ERROR: no method Node(Array{None,1},)
in readPlaces at /home/nate/git/LPATHBench/julia.jl:16
in open at io.jl:342
in main at /home/nate/git/LPATHBench/julia.jl:45
in include at boot.jl:238
at /home/nate/git/LPATHBench/julia.jl:55

Do you know if this means I should move up or down in version? Or if there
is an easy fix?

—
Reply to this email directly or view it on GitHub
#53 (comment)
.

This was referenced Aug 2, 2019

What's new in CPUs since the 80s? guevara/read-it-later#5022

Open

What's new in CPUs since the 80s? guevara/read-it-later#5023

Open

What's new in CPUs since the 80s? guevara/read-it-later#5024

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C implementation for addition to benchmark #53

C implementation for addition to benchmark #53

nkurz commented Dec 22, 2014

logicchains commented Dec 23, 2014

nkurz commented Dec 26, 2014

logicchains commented Dec 27, 2014

nkurz commented Dec 27, 2014

logicchains commented Dec 28, 2014

C implementation for addition to benchmark #53

C implementation for addition to benchmark #53

Comments

nkurz commented Dec 22, 2014

logicchains commented Dec 23, 2014

nkurz commented Dec 26, 2014

logicchains commented Dec 27, 2014

nkurz commented Dec 27, 2014

logicchains commented Dec 28, 2014