Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Spacetime: a new memory profiler #585
Discover the dark secrets of your OCaml programs with this new memory profiler, Spacetime, designed to identify hard-to-find memory leaks and excessive memory consumption. Spacetime, which can instrument industrial-scale applications, records how your program executes so it can reliably tell you the full stack backtrace at every point in the program that caused an allocation. Spacetime is not a statistical profiler. It is capable of recording allocations that happened in C stubs together with allocations that happened in code loaded via
We have a version of this pull request based on 4.03 which can be tried out:
To profile a program:
The profiling information is currently written into a file called
To read the profiling information there are new libraries provided in
Then in the directory containing the Spacetime file, run the binary produced (
The browser-based visualisation can be slow to display the graph, so have patience; we hope to rectify this shortly. We may also produce a curses-based interface and some kind of query language associated with Spacetime_lib.
There is not yet support for visualising the total number of words allocated at each program point across the lifetime of the program, although the instrumentation code for this is complete. We plan to fix this deficiency pretty soon.
We propose the compiler patch shown on this GPR for inclusion in trunk. Our intention is that only basic support for reading the profiles is provided within the compiler distribution; complicated visualisation can live outside. The compiler patch is largely orthogonal to existing code. The vast majority of changes not in the backend are actually related to propagating extra location information, which we also need for enhanced debugging information. A (slightly modified) version of these changes will be presented shortly by @lpw25 . Subject to those being accepted this diff will be greatly simplified.
Spacetime works by instrumenting OCaml code such that it builds the dynamic call graph of the program, outside of the OCaml heap, at runtime. The majority of nodes in this graph usually correspond to invocations of OCaml functions. An edge from one node to another indicates a function call. A path from the root to one of these nodes gives the stack backtrace at that function invocation. (There may be multiple nodes for a given function, of course.) Nodes that correspond to OCaml functions that might allocate have space within them for the recording of the number of times that allocation point has been passed. They also contain space for unique identifiers that will be written into spare space in values' headers; these identifiers are read from the heap and correlated with the graph to produce the human-readable profile. (The technique of using extra bits in the header was independently discovered some years ago by myself and Fabrice Le Fessant's team.)
If building the call graph of an OCaml program one has to be careful about tail calls. Spacetime is careful about this, and will correctly form cycles in the graph corresponding to tail calls. Self-recursive calls (i.e. recursive calls to the function currently being defined) are also treated as tail calls to simplify the graph. The information loss here is minimal.
Each thread has its own graph. There is also a single distinguished graph used for asynchronous execution of finalisers and signal handlers.
Nodes in the graph not corresponding to OCaml functions correspond to C functions. Spacetime uses the
Spacetime does not rebuild the call graph if it already exists: if a given function in a given backtrace context has already been called, the nodes will be reused. This means that whilst programs may see an initial performance penalty (running maybe about half of normal speed), programs that run for longer periods of time should substantially speed up once the graph of hot paths has been built.
The instrumentation code is partially emitted directly as assembly and partially implemented in C. An extra register (to keep track of where in the graph we are) is required when functions are called, which makes it imperative that all parts of a program using Spacetime are compiled with such. The emission of instrumentation is cunning: it requires information (specifically as to whether calls will be tail or non-tail) only deduced during instruction selection---yet we do not want to write Mach code when describing the instrumentation. Instead, there are callbacks that generate more Cmm code on the fly from
The call graph has a compact representation which is not uniform: each OCaml function generates a different shape of node depending on its pattern of call and allocation points. These representations are described in shape tables, which parallel the frame tables. For decoding locations, Spacetime uses the frame tables when possible, which helps with cross-platform portability. However resolution of symbols in C stubs is going to require platform-dependent code; it is proposed to use the
The backend changes required for Spacetime, which are fairly minor, have only currently been implemented for x86-64. It works very well on Linux; on the Mac, it may be rather slow (we suspect this is due to libunwind, and we may either need to emit compact unwind info or write a frame-pointer-based unwinder instead). As it stands, it should function on Windows (although without support for recording allocations in C), but we have not yet tested it. 32-bit platforms are not supported at all, as there is insufficient space in values' headers for the profiling information words.
Code to snapshot the heap is currently quite naive (there is at least one extant bug relating to the "hole in the minor heap"): in particular it performs a linear scan of the minor heap rather than traversing from roots (we intend to fix this now that we have support for recording total allocations across the lifetime of the program; previously it was important to scan rather than work from roots on the minor heap or some very short-lived values might continually be missed out of the profile). It may also record values in the major heap that are about to be swept up. However neither of these deficiencies appears to hinder its usefulness.
Spacetime's instrumentation, possibly extended, may well be useful for other analyses. Two such might be analysis of write barrier hits, and profile-directed feedback for optimisation.
This is still a work in progress, although mostly finished. One major item remaining relates to the lack of cross-compilation support in the compiler. At present, if you configure with
I imagine there will be a number of questions about this work, so I will leave it at that for now.
This merge caused quite a bit of failure in the CI serves.
linux32, arm32, ppc32,openbsd32:
@mshinwell Could you fix this quickly enough? It's not a good period to have our CI testing ineffective.
New failure on ppc-32 (with flambda):
(This is in
We don't (yet) have a policy for cleaning up history before merging. What problem does it cause in practice?
Le lundi, 1 août 2016 à 14:59, Damien Doligez a écrit :
Bissecting the compiler becomes more painful, since it increases the commits where the compiler may not build.
Does it always make bisection worse? I thought git-bisect could be directed down a particular arm of a merge based on a numerical index; if that index is consistent (as maybe it is if Github is always used for merging) it seems like it should work. It's maybe not very robust though. I think I favour squashing them in general.
We arguably do, in the Clean patch series section of the CONTRIBUTING.md document.
I don't mean to imply that this document should be taken as word of law (especially as it would be presumptuous given that I wrote most of it), it is rather intended for advice, in particular for external contributors. But I do think that this particular advice should be followed strictly -- and that in general frequent contributors should be expected to respect the same quality standards as infrequent contributors.
I'm not commenting on this particular PR that I have not had the occasion to review or study the design of.
I would note however that @mshinwell at did some effort to send many prerequisite PRs that could be reviewed and merged independently. In an ideal world, a big merge would be formed of a series of well-defined patches or patch groups that are held to the same quality standards as those smaller prerequisite PRs.
(The Linux kernel handles massively more contributions than the OCaml distribution, some fairly large, and was able to uphold high quality standards for patches.)