Custom events for eventring#11474
Conversation
18d4379 to
8bf0787
Compare
c-cube
left a comment
There was a problem hiding this comment.
Wow, I really like this.
An example of project where I would potentially use this, long term, is catapult. It allows the user to instrument manually the code and output traces in the chrome tracing format (a json thing). This is the same format that dune uses, and also maybe multicore (I feel like it was demonstrated in some monthly update)?
Currently it uses several different techniques to implement the "backend" responsible for actually collecting data, either into a file (not process-safe), or via a network protocol with a demon (safe but cumbersome). With this, I think another domain or process could obtain the custom events corresponding to user traces and just emit them, which is lovely.
| (* the data structure is primarily managed in C *) | ||
| type[@warning "-unused-field"] 'a custom = { | ||
| serialize: 'a -> bytes; | ||
| deserialize: bytes -> 'a; |
There was a problem hiding this comment.
should this perhaps return a result?
There was a problem hiding this comment.
deserialize is supposed to work in the same sense as Marshal.from_string, meaning that failure should be exceptional and only occur as the result of a version mismatch between the produced and the consumer. As a consequence an Error value would be quite rare and can be implemented using exceptions.
It's not a strong opinion and I'm all ears for use case of the result type.
sadiqj
left a comment
There was a problem hiding this comment.
This is great, thanks @TheLortex
First on the general design, I think it's along the right lines though one of the properties I tried to have of the existing design is that there's a C API underneath and the OCaml API sits on top. That means if people want to write their own even lower overhead bindings they can do so but it also enables you to externally monitor an OCaml process without you necessarily needing to have an OCaml runtime going.
I think with the design as it stands we lose that. The user_event callback only works if there's an OCaml runtime going. We may be able to change where caml_runtime_events_user_resolve happens to later on in the OCaml bindings to achieve this.
Also given we know the format, for the C bindings it might be good to just have user_counter, user_event, user_custom. The latter could just give you a buffer.
Smaller details:
- I think at the moment
user_resolvegets slower the more user events you have registered since it's iterating through a linked list on each resolve. I sawskiplist.hwas imported but not used, I'm guessing the intention was to use that there - we'd need to protect it with a mutex. - The on-disk format was organise in the way it was so that if you didn't use
MAX_DOMAINSthen most of the file was sparse and actual disk usage was very low. We should double check that this is still the case if we put the custom event ids at the end of the file, if not we might want to put them before the ring buffers. - Is there a reason we're setting this while holding the mutex? I'm unsure if there's some concurrent subtlety I'm missing or whether it's a mistake.
This is a really good start, thanks. Am happy to provide help moving it along.
Makes sense, I'll update the PR with that constraint in mind.
Indeed my plan was to cache the
I'm not familiar with file sparsity optimisations for disk usage. Is that a file-system feature ? |
Yes, on Linux it's supported on most of the commonly used filesystems. I think APFS on macOS does as well. |
|
The PR is updated with some changes:
About the |
54a39c4 to
cf98783
Compare
9e8dc27 to
da9c7e5
Compare
|
I have finally figured out something for the ID => event resolution. This structure is part of the opaque |
abbysmal
left a comment
There was a problem hiding this comment.
I was asked to take a look at the OCaml part of the PR: I feel this is a pretty straightforward API and I don't have much to say about it.
I feel like however another review of the overall PR (specifically the C side) from a non-Tarides developer would be much welcomed, if someone around have time.
|
I will do another pass of review this week, and I'd like to ping people who took a look at the C side in the original eventring PR at #10964, if they have time. :-) (Maybe @RichardWarburton, @xavierleroy or @avsm can get a look?) @TheLortex I see CI is failing on MacOS, do you have a fix? |
I'm going to rebase the PR and fix CI issues this week. |
76ff6c5 to
d589bf7
Compare
|
I think I'd really like to get this merged by the end of next week as there's already tooling that relies on the implementation in this PR (e.g https://github.com/patricoferris/meio and https://github.com/sadiqj/runtime_events_tools has a WIP). If anyone else is planning a review or has feedback on the API (which will be more of a pain to change post-merge) then please let us know. |
and separate span function in C API
a3d6764 to
71c9c90
Compare
|
Thanks @TheLortex ! |
This PR exposes features to enable libraries to provide additional events containing arbitrary payloads. This will be useful to build libraries and tools to facilitate exploring the runtime behavior of OCaml programs.
Custom events
Eventring is the ring-buffer based tracing system introduced by @sadiqj in OCaml 5. The goal of this PR is to expose in the
runtime_eventslibrary additional features to enable libraries to provide additional events of containing arbitrary payloads.This will be useful to build libraries and tools to facilitate exploring the runtime behavior of OCaml programs, such as eio-console.
Constraints
Identifying events
Producer and consumer programs are not necessarily built with the same set of libraries, so consumer and producer have to agree on what they mean by "event id #13" in the serialization format. To do that @sadiqj suggested that the user has to provide a string as a globally unique ID (
eio.context-switch,cohttp.request, ...) when declaring an event.Unknown events
Consumers should be able to consume events that they don't know about (for example eio events even if the consumer hasn't been built with eio), as long as they know the data type (event, counter, span). This way we can for example build a monitor for all the counters that a program provides.
Design
The wire format for events is not changed. Indeed an
ev_categorybit was already set up with custom events in mind. However the ring buffer structure is extended to include a mapping from event index to its globally unique name. I introduce for this purpose a 1MB table containing character sequences of maximum size 128, meaning that at most 8192 custom events can be registered. This number is arbitrary and can obviously be discussed.Most of the work in this PR is about finding a nice API to describe events and event types, and provide C APIs for encoding and decoding these events.
API
Event types
The
Typemodule provides common types (int and unit) and a way to create a custom event type by providing a serializer and a deserializer.The consumer can then choose to listen for events of a specific type.
Events
The
Usermodule provides theregister : string -> 'a tag -> 'a Type.t -> 'a tfunction for user events. User events have an extensible tag field, it's useful to match against it when one wants to listen for a specific user event, filtering other events of the same type.write : 'a t -> 'a -> unitcan then be used to send data for the event.Callbacks
The
add : 'a Type.t -> (int -> Timestamp.t -> 'a User.t -> 'a -> unit) -> unitfunction mutates the callback structure to register an additional listener for a chosen custom event type.Example
Disclaimer
This is a first working example ! Now I think it's useful to gather feedback and fix the TODOs.