-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Eventlog tracing system #9082
Conversation
This looks very nice, thanks! (cc @damiendoligez, @jhjourdan) I think it could be nice to push the removal of the instrumented runtime instructions (I don't really care about cleaning up the build system part etc., the commit could remove just the runtime changes) as an extra commit. This makes it easy, looking at the diff, to get an idea of the maintainability impact of this change if we consider going the "full route" that your propose. (But it remains possible to review what is in the current PR, without the CAML_INSTR removal, by doing a diff without that new commit.) |
runtime/unix.c
Outdated
(int64_t)t.tv_sec * (int64_t)1000000000 + | ||
(int64_t)t.tv_usec * (int64_t)1000; | ||
#else | ||
# error "No timesource available" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Windows, I think you can use the functions QueryPerformanceCounter + QueryPerformanceFrequency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also please don't fallback on gettimeofday
it can go back in time which can only mislead. On macOS use mach_absolute_time
, clock_gettime
is not supported. Here's code for this that supports all major platforms, IIRC the windows path has been tested at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much, I will as well test the Windows codepath.
@kayceesrk this function was imported from OCaml multicore, should I backport it later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reused the code for OSX and Windows, and it seems to work fine after testing it on both platforms.
One thing I'm still unsure about is how to handle failure and how to handle other platforms:
For now it returns 0 on "unsupported" and failure (initialization of state in win32 and MacOS).
I'm thinking that the right way to do it would be to have every tracing code behind macros and be explicit we request a proper monotonic clock for it to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now it returns 0 on "unsupported"
For that one you could at least emit a compilation warning.
I'm thinking that the right way to do it would be to have every tracing code behind macros and be explicit we request a proper monotonic clock for it to work.
Not sure I understand what you mean by that and how it solves the problem. It seems to me that returning 0
on failure is ok. No program correctness depends on that and it should make it quite clear when perusing the resulting data set you get that something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The instrumentation code looks good.
(In term of potential performance regression, I would look at the minor-gc instrumentation. It looks like everything are in cold path, but I haven't reviewed this part carefully.)
I think that minor improvements could be made to the API. In particular:
-
It would be nice if tracing code (in "user" modules) never had to check for
caml_eventlog_enabled
directly (this requires adding API support for the idea of counting things during instrumentation, to supportcaml_instr_alloc_jump
; unfortunately the natural name "counter" is used for something else). -
There should be an easy way to tell which functions are no-ops when tracing is not enabled (or is paused), and which are not.
For example, one option would be to guarantee that all caml_ev_*
functions are no-ops, and name caml_eventlog_*
the few that are not: caml_eventlog_{init,resume,pause,disable}
. Remark: Using "init" rather than "setup" makes more sense when called from caml_gc_init.
I also included some questions on the choice of integer sizes for information, which look a bit small to me. How easy will it be to extend the wire format in the future if we want to support tracking more kind of runtime events?
runtime/gen_primitives.sh
Outdated
alloc array compare eventlog extern floats gc_ctrl hash intern \ | ||
interp ints io lexing md5 meta memprof obj parsing signals str sys \ | ||
callback weak finalise stacks dynlink backtrace_byt backtrace \ | ||
spacetime_byt afl bigarray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Maybe we should have one name per line to avoid this kind of noisy diffs?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @gasche. Or at least when adding things, just cut the overlong lines but do not reformat the block.
runtime/freelist.c
Outdated
@@ -200,6 +203,8 @@ static header_t *nf_allocate (mlsize_t wo_sz) | |||
} | |||
prev = cur; | |||
cur = Next_small (prev); | |||
if (caml_eventlog_enabled) | |||
++ caml_instr_alloc_jump; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe there could be another macro that avoids those naked checks to caml_eventlog_enabled
in traced code?
runtime/eventlog.c
Outdated
ocaml_eventlog_filename, eventlog_startup_pid); | ||
} else { | ||
filename = malloc(64); | ||
sprintf(filename, "caml-eventlog-%d", eventlog_startup_pid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comment above on sprintf
runtime/eventlog.c
Outdated
uint8_t phase; /* for GC events */ | ||
uint8_t counter_kind; /* misc counter name */ | ||
uint8_t alloc_bucket; /* for alloc counters */ | ||
uint32_t count; /* for misc counters */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the size restrictions here are future-proof. Wouldn't it be safer to use int16 for phase
, counter_kind
and alloc_bucket
? For "misc counters", can't we have above-32bit values so wouldn't int64 be safer? (Eg. memory usage counters for large programs.) You already use 128bits of header metadata, so it's not clear that int8 vs. int16 is so important there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
phase
and counter_kind
here refer not just to toplevel notions of phase/kind (mark, sweep, clean, etc.), but also the name finer subdivisions; there is in fact a hierarchy of moments with a tree structure. It seems fairly conceivable that we would end up with more than 2^8 "counter kinds" if we extend to support more runtime tracing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think counters should be pushed to 64bits indeed but I think the packet header could be shortened further and made more efficient. (so I think the payload size topic is not quite irrelevant)
For instance the pid
field can be removed and put into the stream header itself, since for now we do not plan to have multiple process/domains emitting to a same stream. (rather, I believe it make more sense to have a separate stream per process/domain).
As for the phase and counter kind, I think you are right.
I'm wondering if this is the right categorization model however, surely an ever-growing enumeration is not adviceable? Maybe things could be sorted in a better way and split in various enumerations.
I added to the repository a few days ago the TSDL declaration, to make discussions on the schema a bit simpler. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a high-level perspective a natural categorization I think would be hierarchical, with "intervals" as nodes in the tree and "events" as child-less leaves. But It's not clear to me whether the event format supports this, and given that the killer feature is the availability of nice visualization tools we don't want to go for something non-standard. Flattening all interval and event names into basically a single enum, as you did for GC instrumentation, may be a way to go to fit the common format -- and then we can indeed expect an explosion of enumeration size.
runtime/eventlog.c
Outdated
static FILE* output; | ||
static uintnat eventlog_startup_timestamp = 0; | ||
static uint32_t eventlog_startup_pid = 0; | ||
static uintnat eventlog_paused = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be part of the domain state structure? Eventually, I would like each domain to be able to start and stop logging independently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a good idea,
Is it too much state to be added there? I don't think we'll require much more of these assuming we stay on a one stream per domain logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was done in d4f25e7
runtime/eventlog.c
Outdated
struct event ev = eb->events[i]; | ||
ev.header.pid = eventlog_startup_pid; | ||
|
||
fwrite(&ev.header, sizeof(struct ctf_event_header), 1, out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwrite
returns the number of items actually written, so you need to check the return value of all of these invocations to ensure that it is equal to the value you intended to write and that it is also not zero (EOF).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went on to check fwrite failures and I'm not entirely satisfied with the solution.
I think retry mechanisms on partial writes would add much complexity (in the current form of the implementation) and in the event of a failure eventlog should be disabled.
I think the current code structure doesn't help with finer grained error handling, not relying on fwrite buffering would make life easier (vs handling our own buffer and write logic).
At least logging a message when eventlog is forcibly disabled should make it acceptable?
runtime/eventlog.c
Outdated
|
||
if (!caml_eventlog_enabled) return; | ||
if (eventlog_paused) return; | ||
if (!evbuf) setup_evbuf(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where you assume that setup_evbuf
always succeeds in initialising evbuf
to non-NULL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is now handled, failure to allocate event buffer results in a caml_fatal_error
call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to avoid calling caml_fatal_error
(even if only theoretically) wherever it is possible to raise an exception instead. You are allocating at the first occurring event, where you do not know whether you can raise. I see two solutions to avoid this:
- you could call
setup_evbuf
at the moment eventlog is enabled. For now it is incaml_gc_init
so it is still necessary to callcaml_fatal_error
, but I find it more sensible to fail there than during execution. But if later you wanted to allow enabling eventlog at runtime, you could raise then (out of the C stub). - if you really want to delay its allocation, you could delay the call to
setup_evbuf
to the first time eventlog is resumed, whenever it starts paused. Then it is safe to raise at that point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think delaying allocation is an interesting property to keep and it make state a bit too fuzzy to track.
I will go with the first idea, whether it is enabled or paused on startup, allocating if eventlog was explicitly enabled seems fair enough.
With regards to the Control Plane, right now the environment variable I think it's worth considering a production use-case where we'd want tracing available if we needed to diagnose a potential issue with a running application but don't want it started initially. Might it make sense to have three states for The last one would require |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only checked for asynchronous exceptions. Overall, the scoped instruments are a nice use of the new exception-safety guarantees in the runtime, but a few cases remain to handle.
Related to @sadiqj comment about the control plane I think something needs to be addressed regarding custom path for traces (see https://github.com/ocaml/ocaml/pull/9082/files#diff-7e12f3a136d56d50b3f4e2beafb30368R75): This feature was added as a part of our tests using sandmark, to simplify trace extraction from a large run. Any suggestions? |
runtime/eventlog.c
Outdated
{ | ||
char_os *filename = NULL; | ||
char_os *eventlog_filename; | ||
value tmp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be registered as root of the GC -- which probably requires using some CAMLfoo macro on the function declaration as well.
runtime/eventlog.c
Outdated
fwrite(&header, sizeof(struct ctf_stream_header), 1, output); | ||
fflush(output); | ||
} else { | ||
caml_eventlog_enabled = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the API would be slightly nicer if this function returned a boolean (0
when it failed to setup the event log, 1
if the output is correctly initialized), and the caller was in charge of disabling event-logging in the failure case.
4a81b61
to
8072d2b
Compare
runtime/caml/eventlog.h
Outdated
void caml_ev_alloc(uintnat size); | ||
void caml_ev_alloc_flush(void); | ||
void caml_ev_flush(void); | ||
void caml_ev_disable(void); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This header should document the locking requirements for using this API. Since I don't see any explicit locking in the implementation, my guess is that this API must only be called while holding the runtime lock (that is, do not call caml_ev_foo
from within a blocking section).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a mention of the locking requirements here: 0f07748
runtime/major_gc.c
Outdated
@@ -416,6 +417,8 @@ static void mark_slice (intnat work) | |||
INSTR (slice_fields += end - start;) | |||
INSTR (if (size > end) | |||
CAML_INSTR_INT ("major/mark/slice/remain", size - end);) | |||
if (size > end) | |||
caml_ev_counter (EV_C_MAJOR_MARK_SLICE_REMAIN, size - end); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand this code. (I realise it's adapted from the CAML_INSTR_INT
logic above, but I don't understand that either).
This event is only posted in the size > end
case. However, the code just below asserts that this case never occurs (either end < size
or end == size
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds to me the logic is ok, I'm maybe missing something?
size > end
and end < size
sounds equivalent in these locations appart. I think the counter could be moved behind the first case to make it more palatable.
Generally, this looks great. CTF looks much nicer than either catapult or threadscope. |
We discussed this at our OCaml developper meeting today. There is support for the idea of replacing the existing GC implementation by the current proposal (so: you shold implement my suggestion of getting rid of Damien's code), but we feel that the "runtime code" (the supporting library) needs a thorough review by someone familiar with the C runtime style. (@stedolan may or may not volunteer himself, otherwise we need to find someone else.) There as a discussion on whether the runtime overhead is really reasonable, or whether we do want the IFDEFs again. More analysis is needed there. Note: there was no consensus on the idea of using this approach for arbitrary / userland OCaml program tracing, just on the GC-instrumentation part. We will need to come back to that issue later, separately. |
@gasche Thank you very much for the follow up. |
I attempted a broader approach which somewhat hides the eventlog state from the user through a I think the alternative approach to expose a better counter API would require rework of the Do you think this is reasonable @gasche ? The evbuf logic needs rework (write failure scenarios are hard to handle right now), so this part may move in the future. |
Hi, Thank you for your feedbacks, I tried to address the issues that were brought up to me during review, let me know if there's more improvements to be done. I will update tomorrow the performance report that was attached with this PR. |
Hello, Sorry for the time elapsed since the last update on this PR. After meeting with the team at Cambium in early December (with @Octachron @damiendoligez @shindere and @fpottier ) and in light of some numbers gathered regarding the performance impact of this PR in some real-world programs ( I landed a few commits recently isolating the code in its own runtime, I took the freedom of suffixing the runtime with Only missing change for the eventlog runtime to be complete would be adjusting the configure step in search of |
Engil (2020/01/07 08:10 -0800):
Only missing change for the eventlog runtime to be complete would be
adjusting the configure step in search of `clock_gettime` (as the
instrumented runtime was doing), since the macos and windows platforms
use a different set of functions, I will need to revisit it.
Likely the simplest approach would be to leave this outside of
autoconf checks and have the runtime default to a dummy
`caml_time_counter` function returning 0, associated with a
preprocessor warning. Eventlog would build on each platforms nicely,
but would generate slightly unusable traces (counters to zero would
not introduce bad behaviors however).
Well, if your work gets integrated, then I think `configure` should make
its best to make things work on every platform we support. Let me know
if you need help with this later.
|
tools/Makefile
Outdated
# Eventlog metadata file | ||
|
||
eventlog_metadata: $(EVENTLOG_METADATA) | ||
cat eventlog_metadata.in | sed "s/@endianness/$(EVENTLOG_ENDIANNESS)/g" > eventlog_metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unnecessary use of cat
. You can use sed "s/@endianness/$(EVENTLOG_ENDIANNESS)/g" eventlog_metadata.in > eventlog_metadata
instead?
Hello, I spent a bit of time cleaning up the PR and rebasing it on trunk. I also exchanged with @shindere and @dra27 to ready the eventlog runtime. (thank you very much!) I think @gasche wanted specifically another serious pass at reviewing the runtime code, although last time it wasn't yet a separated runtime. Some more details should be discussed (mostly configure time decision making, and naming), but I think this can wait after the runtime part was validated and considered sane enough. |
Engil (2020/04/03 06:36 -0700):
As this new runtime is meant to replace the instrumented runtime
currently in `trunk` (as discussed in December), maybe @damiendoligez
wants to take a look at it and see if any piece is missing and not
being removed?
Some more details should be discussed (mostly configure time decision
making, and naming), but I think this can wait after the runtime part
was validated and considered sane enough.
In particular, we should decide whether the runtime should be
renamed (current implementation) or whether we should stick to
the former name.
Inmy opinion it would be better to keep the name we have currently
(instrumented) and not change the letter used to designate it from 'i'
to 'e'. I think @ Engil thinks differently, but in an off-line
discussion @damiendoligez seemed to agree with keeping 'instrumented".
Well that needs to be discussed among all of us, I guess.
|
I would prefer keeping the name |
KC Sivaramakrishnan (2020/04/06 07:17 -0700):
I would prefer keeping the name `instrumented_runtime` as the name is
more meaningful than `eventlog_runtime`; instrumented tells me that I
should expect some overhead due to instrumentation. `eventlog` doesn't
offer any such clue.
Agreed.
Also, I suspect that there are not many users of
instrumented runtime, so anyone caught unaware may only be pleasantly
surprised with the additional information that eventlog offers.
Unless, of course, there are some incompatibilities, but I'm not sure
that's a strong enough reason to change the name just to make people's
code break at configure-time rather thanlater.
|
I think it is fine to keep the instrumented runtime name then since it is the most popular opinion out there. |
Engil (2020/04/06 07:23 -0700):
I think it is fine to keep the instrumented runtime name then since it is the most popular opinion out there.
While my thinking was that the generated format was different, it was
better to change the name accordingly, the feature itself was perhaps
not widely spread enough to warrant a new word in the build
terminology. :)
Well if we wanted to have both runtimes coexist in the codebase then
yes, one could ocnsider two names.
I will revert the name to "instrumented".
Great, thanks!
Don't hesitate to leave a note here when you're done please so that we
know that we can start reviewing.
|
4d379a7
to
7eef707
Compare
Hello,
As discussed yesterday, I reverted the name of the runtime to "instrumented". Changes were squashed for proper history. Have a great day. |
I'm working on this at the moment - sorry, I thought this had been through precheck. The filename isn't known, so I'm not sure how check output actions could be used? |
Well, it feels to me one reason why the hook is complicated is that I'm sorry I didn't catch it rarlier, but to me there is something More precisely, the user has no control on the In addition, given that this suffix is enforced, I find the We can wait to see what other developers think, but I really think If you woul'nt add the suffix, you would not even need |
David Allsopp (2020/04/30 07:40 -0700):
The filename isn't known, so I'm not sure how check output actions
could be used?
See my previous comment. I think it's bad that we can't know the name of
the file. I believe this should be modified to be under user's full
ocntrol.
|
Thank you @dra27 !
While I agree that it is indeed not optimal to not know the file, and may lead to a few hacks to be able to properly test it, I don't think we should hand full control over to the user. For now the instrumented runtime indeed only outputs one tracefile. However the Common Trace Format was chosen partly because it allows to define traces from multiple streams (and multiple files or data points, possibly). When Multicore domains are available, they could each output their own tracefile, which can then be demuxed by the CTF codec. I think the current behavior is also a pretty common one among software generating logfiles, embedding timestamps, hashes or pids in filenames, and avoids overwriting by inadvertence, or fights over a trace filename. |
David Allsopp (2020/04/30 07:54 -0700):
It's nice if it works, because at least trunk won't be broken on CI, but
in my opinion, the problem I reporte remains and I'd find it nice if the
fact that "it works" does not prevent us from sdiscussing what I
reported.
|
@shindere - indeed, I'm not disputing that discussion can and should continue, just fixing trunk/4.11 CI in the soonest possible time. |
The behaviour of the log file feels perfectly natural to me (and believe that I didn't spend several hours fighting with Cygwin to get the test to be able to detect without considering whether changing the filename might be easier!). I agree that perhaps the name of the environment variable could be altered? |
Indeed the name I picked was perhaps not the wisest option and was misleading, I'm sorry about this. |
Engil (2020/04/30 08:18 -0700):
While I agree that it is indeed not optimal to not know the file, and may lead to a few hacks to be able to properly test it, I don't think we should hand full control over to the user.
For now the instrumented runtime indeed only outputs one tracefile. However the Common Trace Format was chosen partly because it allows to define traces from multiple streams (and multiple files or data points, possibly). When Multicore domains are available, they could each output their own tracefile, which can then be demuxed by the CTF codec.
I think this versatility is an interesting feature point, and I don't see an easy and non-hacky way to allow user full control over the names of domains-generated tracefiles.
I think the current behavior is also a pretty common one among
software generating logfiles, embedding timestamps, hashes or pids in
filenames, and avoids overwriting by inadvertence, or fights over a
trace filename.
Well even if that is kept, I think that at least the environemnt
variable could/should be renamed to get a less misleading name. For
instance one could add `_PRFIX` to its name.
And also, in the current test I think there will be only one file ending
with `.eventlog`, and this could be used to simplify the test script.
|
The problem is that to me sounds more like the |
David Allsopp (2020/04/30 08:20 -0700):
@shindere - indeed, I'm not disputing that discussion can and should
continue, just fixing trunk/4.11 CI in the soonest possible time.
Yeah yeah, the remark was not addressed to you in particular, rather to
all of us.
|
Engil (2020/04/30 08:27 -0700):
Indeed the name I picked was perhaps not the wisest option and was misleading, I'm sorry about this.
Maybe it could be replaced by `OCAML_EVENTLOG_PREFIX` ?
That looks better to me, yes. Or `OCAML_EVENTLOG_FILENAME_PREFIX`...
|
Given that it now works, I think further alterations to the test should be about improving what it tests, not polishing how it tests... |
David Allsopp (2020/04/30 08:39 -0700):
`_ocamltest` directories are not always cleared on Windows,
But normally the test directory itself is, by ocamltset. No?
|
Normally they're deleted yes - but not, for example, when you're working on the test and have a terminal in the directory, or other (log) files open. There was certainly an instance I saw of the previous files still being there, so given that the test is checking exactly for the presence of a new file, it defensively ensures that they are not there in the first place. |
(amusingly, it is an example of why |
Initially ocamltest didn't delete directories, but now it does, see
main.ml line 160:
```
(Filename.quote_command "rm" ["-rf"; test_build_directory_prefix]))
```
But then if Windows is not reliable in deleting directories, well that's
sad. :)
|
Reading back the original post of this PR (to review the #9541 documentation PR, on which more eyes are welcome), I notice that it suggested a "mean overhead of 1%". However later during the review people estimated that the overhead would be too high, and decided to move the instrumentation to a separate runtime. Does this mean that the overhead is more than 1%? I would like to understand this better:
|
Thank you very much for your review. To note, a few trace points were added to the runtime during this PR, some in a particularly hot loop within the major gc code. The methodology applied to check for regressions during further development (after opening this PR) was to check against well known "worst offenders". However, with the addition of further counters (imported from what was traced by the instrumented runtime), the performance profile of eventlog changed, and with it, the results. As for the actual numbers, I have a few profiles extracted from various sandmark runs. As it is further developed (with additions to what is traced), more work will be done on understanding and improving its performance profile, which was the main motivation behind pushing it as a separate runtime. I am sorry to not have made any more visible updates on the performance issues encountered along the way outside of the December meeting. |
@Engil proposed documentation as promised as a separate PR, #9541. For now I have been the only one doing the work of reading the proposal and giving feedback. It would be nice if other people who pushed for the implementation part of the feature (@dra27 ?) could also help reviewing the documentation. |
Gabriel Scherer (2020/06/02 05:20 -0700):
@Engil proposed documentation as promised as a separate PR, #9541. For
now I have been the only one doing the work of reading the proposal
and giving feedback. It would be nice if other people who pushed for
the implementation part of the feature ***@***.*** ?) could also help
reviewing the documentation.
Sure, I'll review when it's ready.
|
Just to clarify: it is ready to review now. |
Gabriel Scherer (2020/06/02 08:38 -0700):
Just to clarify: it is ready to review now.
Well my opinion is that it would be better to review it once the code
that goes with the documentaion has been included, too.
|
I just pushed the changes onto the documentation branch. |
[RFC] Eventlog tracing system
OCaml currently ships with an instrumented version of the runtime which collects counters and timings for GC events and outputs logs in a custom text format.
Eventlog is a proposed replacement framework, which preserves the existing metrics and adds support for GC trace events in a low-overhead implementation based on a standardized binary trace format (CTF).
Based on an initial design used in OCaml Multicore (inspired by GHC's ThreadScope), this proposal includes an implementation providing:
We provide as well a script to convert such traces to the catapult event format.
A catapult trace take the form of a
json
file (see the format description),that can then be read using Google Chrome's own trace viewer,
chrome://tracing
.Screenshot from
chrome://tracing
A zoomed-in view on a particular major collection event
Duration stats on major collections during the programs lifetime
Overview of the trace viewer window on the
zarith
sandmark benchmarkOverview of the trace viewer window on the
js_of_ocaml
sandmark benchmarkThe statistics gathered, coupled with the available tooling (
chrome://tracing
),enable core OCaml and application developers to profile the runtime activity: said profile can then be correlated with an application profile to get a full overview of an application's lifetime.
An overview of the garbage collector's timeline also proved very useful to the
Multicore OCaml developers by enabling ways to visualize contention points, relationship between GC phases, and collection event timings.
The implementation is further discussed in the Implementation section.
The Tooling section provides samples and screenshots of the tooling available with our implementation.
We then discuss the future of our developments on this project in the last section.
Implementation
Our eventlog port to trunk remains in spirit the same as the one found in OCaml Multicore.
This PR provide our initial implementation of the eventlog framework.
Overview
The event tracing system adds functions to feed an event buffer that is flushed
to disk once full, and at exit.
In our reference implementation, event tracing functions usually take an
event id
defined in
caml/eventlog.h
and counters where due.An entry is added into the buffer, and if the buffer is full, a flush is triggered.
Events are flushed in a single CTF stream.
The initial target was to collect enough information to be able to match insight
provided by the instrumented runtime.
As such, basic tooling was provided in the spirit of the helpers found
with the source distribution, to be used with the instrumented runtime. (
ocaml-instr-graph
andocaml-instr-report
)Minor/major collections distribution over the
lifetime of the program, metrics on allocations, and timing on the various
phases within GC cycles can be easily visualized with the evailable tooling.
Format
The main difference will be that instead of emitting traces in the aforementioned
Catapult format, the Common Trace Format is used in our current prototype.
We decided to work with CTF for the following reasons:
on the serializing front.
This approach maps well with OCaml Multicore (where each domain could stream its own set of events).
The notion of stream is also unrelated to the transport mechanism chosen by the implementor.
As such, it is possible to have a tracer communicate over the network with very low overhead.
(as a MirageOS program would do, for example.)
C reference implementation bundled with a simple to use Python library.
CTF is a straightforward and generic format. Several users reported using it
successfully within their own projects.
We could attempt at providing a unified tracing user experience for the OCaml system.
Our initial implementation aims to stay simple and as such the runtime will output those traces in files.
However we could evolve the tracing infrastructure in the future and provide application-level control over trace distribution or a pub-sub system available from the runtime itself.
OCaml CTF metadata
In CTF, a metadata stream is required in order to allow a CTF codec to parse
the various binary streams composing a trace.
This metadata stream is itself written using CTF's own
TSDL
description format.An annotated version can be found at this address: https://github.com/Engil/gc-tracing/blob/master/metadata
Some questions are still open in our implementation:
A complete CTF trace should contain the metadata stream as well.
In its current form (and implementation), its definition is not fully static: the
byte_order
metadata field depends on which platform the runtime has been executed on.
It has not been decided yet which approach should be taken to distribute the metadata file.
Proposed solutions are to have the runtime generate metadata files, or distribute them as a part of the
compiler installation.
Control plane
Our implementation provides only simple controls over eventlog's behavior.
Any program built with the OCaml compiler on this branch can be traced by executing the said program with
OCAML_EVENTLOG_ENABLED=1
.This will output the trace in a
caml-eventlog-pid
file in the running directory. The user can also specify an alternative name for the file using theOCAML_EVENTLOG_FILENAME
env variable.Two primitives are exposed by the runtime to pause and resume tracing:
Gc.eventlog_pause
andGc.eventlog_resume
.Performance measurements
Sandmark was employed to measure the performance impact of eventlog on the runtime. The report can be found here.
The associated Jupyter notebook can be found as well for further analysis in this repository. A Docker image can be built for simpler setup.
Early numbers showed a mean slowdown of 1% (comparing an active traced program and the current
trunk
) and a mean slowdown of 0.01% (comparing an non actively traced program with the currenttrunk
).Tooling
The instrumented runtime ships with two scripts allowing to display and graph various datapoints from the GC's lifetime.
We provide a set of scripts that can be used as a base to run data analysis on our implementation's traces.
ctf_to_catapult.py
acts as a converter from our trace format to Chrome's eventlog format. Once converted, such traces can then be loaded inside Chrome'schrome://tracing
.We provide as well an example
Jupyter
notebook(rendered PDF here).This notebook showcases basic interactions with our implementation's trace format using the babeltrace library.
We made a few example accessible in the gc-tracing repository.
json
files can be loaded into Chrome and are extraced from the related traces found in the variousctf
subdirectories.The scripts are prototypes written by reusing the existing ecosystem for CTF traces analysis, mainly using the reference implementation for CTF,
babeltrace
.An OCaml binding to the
babeltrace
library is planned and would enable trace processing and analysis from OCaml programs.Eventlog future timeline
Deprecating the instrumented runtime
We aim to merge the tracing runtime with a feature level at least matching the instrumented runtime's.
We believe the tracing runtime can have a sufficiently low overhead so that it
can sit in the regular runtime without overly impacting performances in programs
that do not use the feature.
Removing it would as well speed up OCaml's compilation, by not having to build an extra runtime.
A more straightforward approach to statistics gathering sitting in the main runtime would also simplify maintenance:
While working on porting the
eventlog
framework to trunk we found a long standinginconsistency within the instrumented runtime's output. (see GPR #9004)
The inclusion of the eventlog framework adds as well a bit of duplicate noise in the codebase, as it trace similar codepaths as the instrumented runtime.
As such we will submit a second PR to deprecate the instrumented runtime from the OCaml distribution.
Future: user-based events, user-controlled streaming mechanisms
We plan to extend the eventlog framework further by adding support for user defined events and extra mechanism for traces extraction.
User defined events would allow developers to implement their own set of events from the application level. Such events could then be easily correlated with the runtime's own set of events.
We also had discussions about opening new extraction channels for eventlog's streams. This would prove useful for production systems or Mirage application (extracting traces via network or a serial port... etc).
The timeline for these features is undefined as discussions about their respective designs are still ongoing.