Skip to content

Custom events for eventring#11474

Merged
sadiqj merged 27 commits into
ocaml:trunkfrom
TheLortex:custom-events
Jan 17, 2023
Merged

Custom events for eventring#11474
sadiqj merged 27 commits into
ocaml:trunkfrom
TheLortex:custom-events

Conversation

@TheLortex

@TheLortex TheLortex commented Aug 3, 2022

Copy link
Copy Markdown
Contributor

Custom events

Eventring is the ring-buffer based tracing system introduced by @sadiqj in OCaml 5. The goal of this PR is to expose in the runtime_events library additional features to enable libraries to provide additional events of containing arbitrary payloads.

This will be useful to build libraries and tools to facilitate exploring the runtime behavior of OCaml programs, such as eio-console.

Constraints

Identifying events

Producer and consumer programs are not necessarily built with the same set of libraries, so consumer and producer have to agree on what they mean by "event id #13" in the serialization format. To do that @sadiqj suggested that the user has to provide a string as a globally unique ID (eio.context-switch, cohttp.request, ...) when declaring an event.

Unknown events

Consumers should be able to consume events that they don't know about (for example eio events even if the consumer hasn't been built with eio), as long as they know the data type (event, counter, span). This way we can for example build a monitor for all the counters that a program provides.

Design

The wire format for events is not changed. Indeed an ev_category bit was already set up with custom events in mind. However the ring buffer structure is extended to include a mapping from event index to its globally unique name. I introduce for this purpose a 1MB table containing character sequences of maximum size 128, meaning that at most 8192 custom events can be registered. This number is arbitrary and can obviously be discussed.

Most of the work in this PR is about finding a nice API to describe events and event types, and provide C APIs for encoding and decoding these events.

API

Event types

The Type module provides common types (int and unit) and a way to create a custom event type by providing a serializer and a deserializer.

The consumer can then choose to listen for events of a specific type.

Events

The User module provides the register : string -> 'a tag -> 'a Type.t -> 'a t function for user events. User events have an extensible tag field, it's useful to match against it when one wants to listen for a specific user event, filtering other events of the same type.

write : 'a t -> 'a -> unit can then be used to send data for the event.

Callbacks

The add : 'a Type.t -> (int -> Timestamp.t -> 'a User.t -> 'a -> unit) -> unit function mutates the callback structure to register an additional listener for a chosen custom event type.

Example

open Runtime_events

(* let's register a custom event for http requests *)
type User.tag += Request : User.tag

let request_event = User.register "cohttp.request" Request Type.int

(* producer *)

let () =
  start ();
  User.write request_event 404;
  User.write request_event 200;
  User.write request_event 200

(* consumer *)

let event_handler domain_id ts event code =
  (* we're only interested in our Request event *)
  match User.tag event with
  | Request -> Printf.printf "Request resulted in %d\n" code
  | _ -> ()

let () =
  let cursor = create_cursor None in
  let callbacks = 
    Callbacks.create ()
    |> Callbacks.add Type.int event_handler
  in
  while true do
    ignore(read_poll cursor callbacks None)
  done

Disclaimer

This is a first working example ! Now I think it's useful to gather feedback and fix the TODOs.

@TheLortex TheLortex force-pushed the custom-events branch 3 times, most recently from 18d4379 to 8bf0787 Compare August 4, 2022 09:06
Comment thread runtime/runtime_events.c Outdated

@c-cube c-cube left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, I really like this.

An example of project where I would potentially use this, long term, is catapult. It allows the user to instrument manually the code and output traces in the chrome tracing format (a json thing). This is the same format that dune uses, and also maybe multicore (I feel like it was demonstrated in some monthly update)?

Currently it uses several different techniques to implement the "backend" responsible for actually collecting data, either into a file (not process-safe), or via a network protocol with a demon (safe but cumbersome). With this, I think another domain or process could obtain the custom events corresponding to user traces and just emit them, which is lovely.

(* the data structure is primarily managed in C *)
type[@warning "-unused-field"] 'a custom = {
serialize: 'a -> bytes;
deserialize: bytes -> 'a;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this perhaps return a result?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deserialize is supposed to work in the same sense as Marshal.from_string, meaning that failure should be exceptional and only occur as the result of a version mismatch between the produced and the consumer. As a consequence an Error value would be quite rare and can be implemented using exceptions.

It's not a strong opinion and I'm all ears for use case of the result type.

Comment thread otherlibs/runtime_events/runtime_events.mli Outdated
Comment thread otherlibs/runtime_events/runtime_events.ml Outdated
@sadiqj sadiqj self-assigned this Aug 8, 2022

@sadiqj sadiqj left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks @TheLortex

First on the general design, I think it's along the right lines though one of the properties I tried to have of the existing design is that there's a C API underneath and the OCaml API sits on top. That means if people want to write their own even lower overhead bindings they can do so but it also enables you to externally monitor an OCaml process without you necessarily needing to have an OCaml runtime going.

I think with the design as it stands we lose that. The user_event callback only works if there's an OCaml runtime going. We may be able to change where caml_runtime_events_user_resolve happens to later on in the OCaml bindings to achieve this.

Also given we know the format, for the C bindings it might be good to just have user_counter, user_event, user_custom. The latter could just give you a buffer.

Smaller details:

  • I think at the moment user_resolve gets slower the more user events you have registered since it's iterating through a linked list on each resolve. I saw skiplist.h was imported but not used, I'm guessing the intention was to use that there - we'd need to protect it with a mutex.
  • The on-disk format was organise in the way it was so that if you didn't use MAX_DOMAINS then most of the file was sparse and actual disk usage was very low. We should double check that this is still the case if we put the custom event ids at the end of the file, if not we might want to put them before the ring buffers.
  • Is there a reason we're setting this while holding the mutex? I'm unsure if there's some concurrent subtlety I'm missing or whether it's a mistake.

This is a really good start, thanks. Am happy to provide help moving it along.

@TheLortex

Copy link
Copy Markdown
Contributor Author

First on the general design, I think it's along the right lines though one of the properties I tried to have of the existing design is that there's a C API underneath and the OCaml API sits on top. That means if people want to write their own even lower overhead bindings they can do so but it also enables you to externally monitor an OCaml process without you necessarily needing to have an OCaml runtime going.

Makes sense, I'll update the PR with that constraint in mind.

I think at the moment user_resolve gets slower the more user events you have registered since it's iterating through a linked list on each resolve. I saw skiplist.h was imported but not used, I'm guessing the intention was to use that there - we'd need to protect it with a mutex.

Indeed my plan was to cache the user_resolve function results by maintaining a skip-list. I'll include that in my PR update !

The on-disk format was organise in the way it was so that if you didn't use MAX_DOMAINS then most of the file was sparse and actual disk usage was very low. We should double check that this is still the case if we put the custom event ids at the end of the file, if not we might want to put them before the ring buffers.

I'm not familiar with file sparsity optimisations for disk usage. Is that a file-system feature ?

@sadiqj

sadiqj commented Aug 23, 2022

Copy link
Copy Markdown
Contributor

I'm not familiar with file sparsity optimisations for disk usage. Is that a file-system feature ?

Yes, on Linux it's supported on most of the commonly used filesystems. I think APFS on macOS does as well.

@TheLortex

Copy link
Copy Markdown
Contributor Author

The PR is updated with some changes:

  • C / OCaml API separation
  • labels for custom types declaration
  • reuse buffers for performance

About the user_resolve function, it's a bit more complex than expected as the skiplist structure cannot store OCaml values.
Instead we probably need an intermediate array containing the values, and skiplist contains a mapping from either string hashes of event names or local event id to indices in the array.

@TheLortex

Copy link
Copy Markdown
Contributor Author

I have finally figured out something for the ID => event resolution.
My last commits adds the implementation of a basic memoization system using an array of Event.t values.
Indices in the array directly maps to indices of the producer. The array is resized dynamically as larger event IDs are received, but it cannot grow too big as there is currently a maximum of 8192 events (RUNTIME_EVENTS_MAX_CUSTOM_EVENTS).

This structure is part of the opaque cursor type that is managed in C.

Comment thread otherlibs/runtime_events/runtime_events.mli Outdated
Comment thread otherlibs/runtime_events/runtime_events.mli Outdated

@abbysmal abbysmal left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was asked to take a look at the OCaml part of the PR: I feel this is a pretty straightforward API and I don't have much to say about it.

I feel like however another review of the overall PR (specifically the C side) from a non-Tarides developer would be much welcomed, if someone around have time.

@abbysmal

abbysmal commented Dec 5, 2022

Copy link
Copy Markdown
Contributor

I will do another pass of review this week, and I'd like to ping people who took a look at the C side in the original eventring PR at #10964, if they have time. :-) (Maybe @RichardWarburton, @xavierleroy or @avsm can get a look?)

@TheLortex I see CI is failing on MacOS, do you have a fix?

@TheLortex

Copy link
Copy Markdown
Contributor Author

@TheLortex I see CI is failing on MacOS, do you have a fix?

I'm going to rebase the PR and fix CI issues this week.

@TheLortex TheLortex force-pushed the custom-events branch 2 times, most recently from 76ff6c5 to d589bf7 Compare December 5, 2022 17:46
@sadiqj

sadiqj commented Dec 9, 2022

Copy link
Copy Markdown
Contributor

I think I'd really like to get this merged by the end of next week as there's already tooling that relies on the implementation in this PR (e.g https://github.com/patricoferris/meio and https://github.com/sadiqj/runtime_events_tools has a WIP).

If anyone else is planning a review or has feedback on the API (which will be more of a pain to change post-merge) then please let us know.

Comment thread otherlibs/runtime_events/runtime_events.mli Outdated
Comment thread otherlibs/runtime_events/runtime_events.mli Outdated
Comment thread otherlibs/runtime_events/runtime_events.mli Outdated
Comment thread otherlibs/runtime_events/runtime_events.mli Outdated
Comment thread otherlibs/runtime_events/runtime_events.mli Outdated
Comment thread otherlibs/runtime_events/runtime_events.mli Outdated
@sadiqj sadiqj merged commit e64bafd into ocaml:trunk Jan 17, 2023
@sadiqj

sadiqj commented Jan 17, 2023

Copy link
Copy Markdown
Contributor

Thanks @TheLortex !

TheLortex pushed a commit to TheLortex/ocaml that referenced this pull request Mar 1, 2023
This PR exposes features to enable libraries to provide additional events containing arbitrary payloads. This will be useful to build libraries and tools to facilitate exploring the runtime behavior of OCaml programs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants