Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++ memory results might be a bit misleading #43

Closed
PMunch opened this issue May 16, 2018 · 13 comments
Closed

C++ memory results might be a bit misleading #43

PMunch opened this issue May 16, 2018 · 13 comments

Comments

@PMunch
Copy link
Contributor

PMunch commented May 16, 2018

During my work on the Nim version of this code (which it it's most optimised version beats the C++ versions on speed), I did some runs of the C++ code as well. One thing I noticed is that I regularly got 504KiB memory usage, and not the 0.3MiB (so somewhere around the 300KiB mark). I did sometimes see it dip to 376KiB however, so I went to see how often this would happen.

To get lot's of measurements I went with:

for i in {0..10000}; do; ~/div/cgmemtime/cgmemtime --force-empty -t ./main-gcc 2>>memtest2; done

However I did abort the run after 5826 samples as it was starting to drag on a bit. The average memory consumption of all the runs were 524KiB and the breakdown of samples was as follows:

Memory usage Occurences
376 29
380 42
500 1
504 4512
508 295
632 891
636 37
760 16
764 2
Total Result 5825

As we can see only 71 of the 5825 runs used less than 500KiB, a mere 1.22%.

Running with the Nim version with manual memory allocation we actually get a better average at 513KiB with a breakdown of:

Memory usage Occurences
500 1
504 880
508 53
632 60
636 4
760 2
Total Result 1000

And just for comparison the Nim version using the GC get's an average of 899KiB with a breakdown of:

Memory usage Occurences
888 868
892 48
1016 77
1020 4
1144 3
Total Result 1000

The speed seems to be affected as well, although not as much:

C++ raw pointers Nim manual memory Nim GC
0.314 0.280 0.282
0.241 0.190 0.193

First row is average speed, second is minimum speed (C++ version run 5825 times, the Nim versions only 1000 times).

@frol
Copy link
Owner

frol commented May 16, 2018

The memory tracking with cgroup is a bit tricky and not exactly correct (cgmemtime uses it as there is simply nothing better than that). The issue with it is that it counts caches and shared objects (.so) if those were not cached/loaded by other processes. Thus, I take the lowest of the high watermarks. It is not ideal, and it is even less informative on such small footprints, so I am going to drop the "normalized memory column" from the table.

@PMunch
Copy link
Contributor Author

PMunch commented May 16, 2018

Yeah, that's why I used average instead of minimum in all of my testing, normally running 100 runs at a time (works fine for C++ and Nim, probably wouldn't do it for the slower candidates as it would take a really long time).

@frol
Copy link
Owner

frol commented May 16, 2018

I believe the minimum value is fair enough given the algorithm require a constant amount of allocations between executions. Averaging catches some irrelevant fluctuations.

@PMunch
Copy link
Contributor Author

PMunch commented May 16, 2018

Yeah but as you say it also will count less if something else had a shared library cached. So you will catch irrelevant fluctuations both ways. Probably shouldn't use an average though, but rather a median.

Just calculated some medians for my samples:

C++ raw pointers Nim manual memory
0.330 0.272
504 504

First line is time, second is memory.

Medians are good for this sort of thing since (quote from Wikipedia):

The basic advantage of the median in describing data compared to the mean (often simply described as the "average") is that it is not skewed so much by extremely large or small values, and so it may give a better idea of a "typical" value.

What you are basically doing with choosing the minimum is choosing an outlier, something that doesn't represent a typical run.

Note that this was run mostly to get memory data, so the machine was being used at the same time and as such the timing might be off.

@frol
Copy link
Owner

frol commented May 16, 2018

If I compile C++, D, and Nim solutions statically I see the following memory footprints stable:

Solution time, seconds memory, MB
C++ 0.208 0.248
D (no-RT) 0.198 0.248
Nim (manual) 0.179 (OH MY GOD) 0.376

@PMunch
Copy link
Contributor Author

PMunch commented May 16, 2018

Oh nice! What switches did you use to compile them statically?

@frol
Copy link
Owner

frol commented May 16, 2018

clang++ -O3 -s --std=c++17 -static -flto -o main-raw-clang main-raw.cpp

ldc2 main_nort.d -O3 -release -Xcc -flto -Xcc -s -static -of=main_nort-ldc

nim compile -d:release --passC:-flto --passL:-static --passL:-s --gc:none --out:main_manual-nim-release main_manual.nim

@PMunch
Copy link
Contributor Author

PMunch commented May 16, 2018

Really interesting, wasn't expecting to see much more speed increase on this without having to use some really dirty tricks. Reran on my machine all three versions with static linking.
Speed, first is average, second is median, third is min, fourth is max:

C++ raw Nim manual Nim GC
0.243 0.211 0.217
0.242 0.210 0.215
0.227 0.197 0.199
0.264 0.233 0.302

Memory, order same as before:

C++ raw Nim manual Nim GC
249 376 760
248 376 760
248 376 760
376 380 764

Very interesting to see that Nim with garbage collector is able to keep up this well, albeit using a bit more memory.
The memory results also seem to have stabilised a lot now that no external things are loaded, C++ raw for some reason has 1/100 samples being higher, but that's probably an anomaly. All of these are run with 100 runs by the way.

@PMunch
Copy link
Contributor Author

PMunch commented May 16, 2018

I still think median is the way to go though, it's more "correct" for this kind of thing.

@mratsim
Copy link

mratsim commented May 16, 2018

It's probably less accurate but Kostya uses this short ruby script for memory + time tracking: https://github.com/kostya/benchmarks/blob/master/xtime.rb

Given how fast the tasks are it might not be precise enough though.

@PMunch
Copy link
Contributor Author

PMunch commented May 16, 2018

Hmm, interesting. I tried to run the fast version of the Nim code using the default garbage collector, and it achieved the same 376KiB as the manual memory version, but it does so using 0.535s (median again, 100 samples). So it appears you can choose (compared to the manual version), near-perfect memory usage, or near-perfect speed, good to know.

@frol
Copy link
Owner

frol commented May 16, 2018

I have just updated the scoreboards!

@frol
Copy link
Owner

frol commented May 20, 2018

I still believe that the minimum memory consumption is the best way to estimate the solution memory footprint since the memory footprints of statically linked solutions do not fluctuate (i.e. the fluctuations are caused by something external to the solution itself).

@frol frol closed this as completed May 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants