Description
[Please don't close as "not Nim problem; won't fix", at least not before some discussion]
motivation
valgrind's callgrind likely offers the most accurate profiling results, eg see: http://gernotklingler.com/blog/gprof-valgrind-gperftools-evaluation-tools-application-level-cpu-profiling-linux/
, at the expense of large slowdown eg 50x.
In short, it uses valgrind's VM with JIT recompilation of x86 => RISC-like Ucode.
IMO we should have a good support for it, as it can inform us more accurately than other tools about nim compiler (or any other nim program)'s actual hotspots, which is important when optimizing code (eg helps avoiding blindly optimizing stuff that's not actual hotspot). For nim itself, that means: making nim compiler faster!
While I'm making some improvements to --profile:on
(eg #10119), I'm not yet convinced Nim profiling story starts and ends with --profile:on
until I can at least compare results with valgrind's callgrind, for several reasons:
- (most important point) really not clear how accurate are results given its sampling strategy:
These calls are injected at every loop end (except perhaps loops that have no side-effects). At every Nth call a stack trace is taken.
- no wall/cpu time info generated by nimprof
- not sure how well it supports shared libraries / syscalls (not clear from https://nim-lang.github.io/Nim/estp.html)
- not sure how well it works with recursive calls given that it reports stacktraces
- stdlib has a few
{.push profiler: off.}
here and there that could affect profiling results
Besides accuracy, valgrind's callgrind has other advantages:
- works without needing recompilation with specially instrumented profiling code (maybe just needs
--passC:-g
, but that's less intrusive, and IIUC, optional) - kcachegrind for visualization
- many more features
- nimprof shows stacktraces instead of a callgraph, which has pros and cons, but oftentimes callgraph may be more useful
callgrind usage example
# note that `--passC:-g` isn't required, but helps, since it shows debugging info
nim c --passC:-g main.nim
valgrind --tool=callgrind ./main
# inspect callgrind.out.$pid using text editor or kcachegrind or `callgrind_annotate callgrind.out.$pid` (see [1])
callgrind issues
- would be nice to have tooling to convert
callgrind_annotate
(see [1]) to Nim source code(with line info and un-mangled name, IIRC there was some filename generated by some tool (?) that generated a mapping bw mangled Nim names and C names); EDIT: probably--genMapping
-
valgrind --tool=callgrind nim c main.nim
fails with : see [2] [wontfix] [profiling] provide good support for valgrind's callgrind #10121 (comment)
Note that it works with some other programs built by nim, just notnim
itself
links
- https://nim-lang.github.io/Nim/estp.html
- http://gernotklingler.com/blog/gprof-valgrind-gperftools-evaluation-tools-application-level-cpu-profiling-linux/
[1] example output with callgrind_annotate
... other info
10,198,980,164 build/nimcache/timn_t0088.c:nimFrame [/Users/timothee/git_clone//nim//timn//bugs/all/t0088]
5,249,475,000 build/nimcache/timn_t0088.c:fun_BlHd55MPDKaD6djg1BFZKw [/Users/timothee/git_clone//nim//timn//bugs/all/t0088]
4,500,450,000 build/nimcache/timn_t0088.c:addInt [/Users/timothee/git_clone//nim//timn//bugs/all/t0088]
3,899,610,000 build/nimcache/timn_t0088.c:pluseq__7kHiltrvRlcg6wSYR3CxAwt0088 [/Users/timothee/git_clone//nim//timn//bugs/all/t0088]
3,024,705,957 ???:exp$fenv_access_off [/usr/lib/system/libsystem_m.dylib]
2,399,760,040 build/nimcache/timn_t0088.c:popFrame [/Users/timothee/git_clone//nim//timn//bugs/all/t0088]
2,300,310,108 build/nimcache/timn_t0088.c:main_JpOAt9ckLsMzNQ7rIf9bUW9bw [/Users/timothee/git_clone//nim//timn//bugs/all/t0088]
1,150,155,054 build/nimcache/timn_t0088.c:main2_JpOAt9ckLsMzNQ7rIf9bUW9bw_2 [/Users/timothee/git_clone//nim//timn//bugs/all/t0088]