C++ memory results might be a bit misleading #43

PMunch · 2018-05-16T11:10:24Z

During my work on the Nim version of this code (which it it's most optimised version beats the C++ versions on speed), I did some runs of the C++ code as well. One thing I noticed is that I regularly got 504KiB memory usage, and not the 0.3MiB (so somewhere around the 300KiB mark). I did sometimes see it dip to 376KiB however, so I went to see how often this would happen.

To get lot's of measurements I went with:

for i in {0..10000}; do; ~/div/cgmemtime/cgmemtime --force-empty -t ./main-gcc 2>>memtest2; done

However I did abort the run after 5826 samples as it was starting to drag on a bit. The average memory consumption of all the runs were 524KiB and the breakdown of samples was as follows:

Memory usage	Occurences
376	29
380	42
500	1
504	4512
508	295
632	891
636	37
760	16
764	2
Total Result	5825

As we can see only 71 of the 5825 runs used less than 500KiB, a mere 1.22%.

Running with the Nim version with manual memory allocation we actually get a better average at 513KiB with a breakdown of:

Memory usage	Occurences
500	1
504	880
508	53
632	60
636	4
760	2
Total Result	1000

And just for comparison the Nim version using the GC get's an average of 899KiB with a breakdown of:

Memory usage	Occurences
888	868
892	48
1016	77
1020	4
1144	3
Total Result	1000

The speed seems to be affected as well, although not as much:

C++ raw pointers	Nim manual memory	Nim GC
0.314	0.280	0.282
0.241	0.190	0.193

First row is average speed, second is minimum speed (C++ version run 5825 times, the Nim versions only 1000 times).

The text was updated successfully, but these errors were encountered:

frol · 2018-05-16T11:23:06Z

The memory tracking with cgroup is a bit tricky and not exactly correct (cgmemtime uses it as there is simply nothing better than that). The issue with it is that it counts caches and shared objects (.so) if those were not cached/loaded by other processes. Thus, I take the lowest of the high watermarks. It is not ideal, and it is even less informative on such small footprints, so I am going to drop the "normalized memory column" from the table.

PMunch · 2018-05-16T11:34:43Z

Yeah, that's why I used average instead of minimum in all of my testing, normally running 100 runs at a time (works fine for C++ and Nim, probably wouldn't do it for the slower candidates as it would take a really long time).

frol · 2018-05-16T11:49:38Z

I believe the minimum value is fair enough given the algorithm require a constant amount of allocations between executions. Averaging catches some irrelevant fluctuations.

PMunch · 2018-05-16T12:00:09Z

Yeah but as you say it also will count less if something else had a shared library cached. So you will catch irrelevant fluctuations both ways. Probably shouldn't use an average though, but rather a median.

Just calculated some medians for my samples:

C++ raw pointers	Nim manual memory
0.330	0.272
504	504

First line is time, second is memory.

Medians are good for this sort of thing since (quote from Wikipedia):

The basic advantage of the median in describing data compared to the mean (often simply described as the "average") is that it is not skewed so much by extremely large or small values, and so it may give a better idea of a "typical" value.

What you are basically doing with choosing the minimum is choosing an outlier, something that doesn't represent a typical run.

Note that this was run mostly to get memory data, so the machine was being used at the same time and as such the timing might be off.

frol · 2018-05-16T12:16:39Z

If I compile C++, D, and Nim solutions statically I see the following memory footprints stable:

Solution	time, seconds	memory, MB
C++	0.208	0.248
D (no-RT)	0.198	0.248
Nim (manual)	0.179 (OH MY GOD)	0.376

PMunch · 2018-05-16T12:21:26Z

Oh nice! What switches did you use to compile them statically?

frol · 2018-05-16T12:23:50Z

clang++ -O3 -s --std=c++17 -static -flto -o main-raw-clang main-raw.cpp

ldc2 main_nort.d -O3 -release -Xcc -flto -Xcc -s -static -of=main_nort-ldc

nim compile -d:release --passC:-flto --passL:-static --passL:-s --gc:none --out:main_manual-nim-release main_manual.nim

PMunch · 2018-05-16T12:46:26Z

Really interesting, wasn't expecting to see much more speed increase on this without having to use some really dirty tricks. Reran on my machine all three versions with static linking.
Speed, first is average, second is median, third is min, fourth is max:

C++ raw	Nim manual	Nim GC
0.243	0.211	0.217
0.242	0.210	0.215
0.227	0.197	0.199
0.264	0.233	0.302

Memory, order same as before:

C++ raw	Nim manual	Nim GC
249	376	760
248	376	760
248	376	760
376	380	764

Very interesting to see that Nim with garbage collector is able to keep up this well, albeit using a bit more memory.
The memory results also seem to have stabilised a lot now that no external things are loaded, C++ raw for some reason has 1/100 samples being higher, but that's probably an anomaly. All of these are run with 100 runs by the way.

PMunch · 2018-05-16T12:52:17Z

I still think median is the way to go though, it's more "correct" for this kind of thing.

mratsim · 2018-05-16T13:04:46Z

It's probably less accurate but Kostya uses this short ruby script for memory + time tracking: https://github.com/kostya/benchmarks/blob/master/xtime.rb

Given how fast the tasks are it might not be precise enough though.

PMunch · 2018-05-16T13:10:25Z

Hmm, interesting. I tried to run the fast version of the Nim code using the default garbage collector, and it achieved the same 376KiB as the manual memory version, but it does so using 0.535s (median again, 100 samples). So it appears you can choose (compared to the manual version), near-perfect memory usage, or near-perfect speed, good to know.

frol · 2018-05-16T13:39:39Z

I have just updated the scoreboards!

frol · 2018-05-20T15:39:27Z

I still believe that the minimum memory consumption is the best way to estimate the solution memory footprint since the memory footprints of statically linked solutions do not fluctuate (i.e. the fluctuations are caused by something external to the solution itself).

frol closed this as completed May 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C++ memory results might be a bit misleading #43

C++ memory results might be a bit misleading #43

PMunch commented May 16, 2018 •

edited

frol commented May 16, 2018

PMunch commented May 16, 2018

frol commented May 16, 2018

PMunch commented May 16, 2018 •

edited

frol commented May 16, 2018

PMunch commented May 16, 2018 •

edited

frol commented May 16, 2018 •

edited

PMunch commented May 16, 2018 •

edited

PMunch commented May 16, 2018

mratsim commented May 16, 2018

PMunch commented May 16, 2018

frol commented May 16, 2018

frol commented May 20, 2018

C++ memory results might be a bit misleading #43

C++ memory results might be a bit misleading #43

Comments

PMunch commented May 16, 2018 • edited

frol commented May 16, 2018

PMunch commented May 16, 2018

frol commented May 16, 2018

PMunch commented May 16, 2018 • edited

frol commented May 16, 2018

PMunch commented May 16, 2018 • edited

frol commented May 16, 2018 • edited

PMunch commented May 16, 2018 • edited

PMunch commented May 16, 2018

mratsim commented May 16, 2018

PMunch commented May 16, 2018

frol commented May 16, 2018

frol commented May 20, 2018

PMunch commented May 16, 2018 •

edited

PMunch commented May 16, 2018 •

edited

PMunch commented May 16, 2018 •

edited

frol commented May 16, 2018 •

edited

PMunch commented May 16, 2018 •

edited