Skip to content

Conversation

@OlivierNicole
Copy link
Contributor

@OlivierNicole OlivierNicole commented Dec 18, 2023

This PR intends to be the new #12335, taking over @TheLortex and @sadiqj to address reviewers’ requests.

It contains the same commits as the original PR, with the following changes in the last two commits:

  • As suggested by @sadiqj, I moved the description of custom events to the end, after the documentation of built-in events. I find that it makes the presentation more progressive and clearer.
  • I attempted to clarify and justify the API of custom runtime events by explaining what is stored, clarifying the role of tags, and expliciting the sequence of steps involved in registering and using custom types and/or custom events.
  • Nit: I uniformized section labels to make them consistent with the rest of the manual.

(Since there was discussion about possible future improvements of the API: having done my best to clarify it, I have the impression that there may be room for simplification regarding tags: I am not sure what is the interest of exposing them, rather than the event names directly.)

@OlivierNicole
Copy link
Contributor Author

I also replaced the term “probe” with the more self-explanatory “event source” as suggested by @xavierleroy, but I will revert it if it is deemed unnecessary.

Copy link
Member

@dra27 dra27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking this on - it's looking very good. I didn't review the original PR, so only did a brief diff of it against the original. I liked the text on "vanilla" reading of the diff, and I then liked the re-structuring (and the new summary clarification), having compared it to the original.

Most of my comments are language or other nits - the only major part for me is to rework the explanation in the summary when talking about add_user_event to relate it to Runtime_events.Callbacks.create and that that is then passed with a cursor to consume the events. At the moment, the text read to me as though the callback is attached to the cursor, but the types told me otherwise 🙂

Copy link
Contributor

@tmcgilchrist tmcgilchrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some general impressions and questions from reading this document:

  • Ring buffer per domain, Is there always a domain present aka root domain? What is the correct name for it?

    There is a unique ring buffer for every running domain and, on domain termination, ring buffers may be re-used for newly spawned domains.

Maybe some more context on what domain is present at startup, or a pointer to the domain behaviour section in the manual? I expect there is a default domain started initially but the manual section Chapter 9 Parallel programming and Domain module documentation don't have much to say about this topic.

  • Can I connect to the events API after an OCaml process has started? Perhaps enabling runtime events via signals so it can be turned on dynamically? Future work?

To receive events that are only available in the instrumented runtime, the OCaml program needs to be compiled and linked against the instrumented runtime.

Does this apply for user defined events?

General question about spans and timestamps, and their properties. Are timestamps strictly monotonic or regular monotonic? Taking this definition of monotonic:

Strict monotonic counters or clocks are guaranteed to return always increasing values (or always decreasing values). The sequence 1, 2, 3, 4, 5 is strictly monotonic.
Regular (non-strict) monotonic counters otherwise only require to return non-decreasing values (or non-increasing values). The sequence 1, 2, 2, 2, 3, 4 is monotonic, but not strictly monotonic.
definition from https://learnyousomeerlang.com/time

What is the relationship between domains and their respective timestamps? If I have the same timestamp value for two different Domains, did the event happen at the same time? Can I infer that one timestamp from Domain 1 is before or after a second timestamp on Domain 2.

How do I observe the events for a particular domain vs all domains? This is implied in the signature for span_event_handler and int_event_handler but might be worth mentioning explicitly.

Comment on lines 347 to 348
To summarize, to emit and consume custom events of a custom type (for example),
the user needs to:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be simplified as:

Suggested change
To summarize, to emit and consume custom events of a custom type (for example),
the user needs to:
For a user to emit and consume custom events using a custom type they need to:

or

To emit and consume custom events using a custom type:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I integrated your suggestion in 9d1768b. However I kept “To summarize” because I feel that it informs clearly that the list of steps is merely a more condensed recap of previously given information.

@kayceesrk
Copy link
Contributor

It would be good to get a review from a current user of custom events. @talex5, would you have time to review this?

Copy link
Contributor

@talex5 talex5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to get a review from a current user of custom events. @talex5, would you have time to review this?

I think my earlier comment still applies: something should be said about thread-safety. e.g.

  • Is it OK if my encoder function allocates (possibly triggering a GC event)?
  • What happens if two sys-threads write an event at the same time?
  • Does my decoder need to be robust against the event being overwritten as it's being decoded?

Regarding the API itself:

  • It would be good to have a fast (noalloc) function to check whether tracing is on. That would allow us to avoid some work in the common case of no tracing.
  • A lot of the complexity in the API and in the documentation is due to the module providing a marshalling system for user types. I think it would be simpler just to let users attach arbitrary bytes to a custom event.

@kayceesrk
Copy link
Contributor

Thanks @talex5

@OlivierNicole
Copy link
Contributor Author

OlivierNicole commented Dec 20, 2023

Thanks @talex5 for your review.

something should be said about thread-safety. e.g.

* Is it OK if my encoder function allocates (possibly triggering a GC event)?

Yes, calls to serializers/deserializers are ordinary callback into OCaml with properly registered GC roots.

* What happens if two sys-threads write an event at the same time?

This should be fine I think, writing to the ring buffer seems designed to be thread-safe, but maybe @sadiqj should fact-check me.

* Does my decoder need to be robust against the event being overwritten as it's being decoded?

No, the byte string given to the decoder is a copy.

Should these points be mentioned in the manual? I would of the opinion to stick to the implicit convention that if the manual is silent about these issues (thread safety, authorization to allocate, etc.), it means that the API is safe (thread-safe, GC-safe, etc.).

@sadiqj
Copy link
Contributor

sadiqj commented Dec 20, 2023

  • What happens if two sys-threads write an event at the same time?

This should be fine I think, writing to the ring buffer seems designed to be thread-safe, but maybe @sadiqj should fact-check me.

The ring-buffers are all single-writer multiple-reader so you must be holding the runtime lock in order to write events to the ring buffer. It would be bad if two systhreads on the same domain wrote simultaneously (without holding the runtime lock).

@OlivierNicole
Copy link
Contributor Author

Some general impressions and questions from reading this document:

* Ring buffer per domain, Is there always a domain present aka root domain? What is the correct name for it?
  > There is a unique ring buffer for every running domain and, on domain termination, ring buffers may be re-used for newly spawned domains.

Maybe some more context on what domain is present at startup, or a pointer to the domain behaviour section in the manual? I expect there is a default domain started initially but the manual section Chapter 9 Parallel programming and Domain module documentation don't have much to say about this topic.

There is indeed a main domain created at startup. I attempt to clarify the situation in 858fc6a.

* Can I connect to the events API after an OCaml process has started? Perhaps enabling runtime events via signals so it can be turned on dynamically? Future work?

Yes, via Runtime_events.start. It’s part of the example and also discussed at the end of the “With OCaml APIs” section.

To receive events that are only available in the instrumented runtime, the OCaml program needs to be compiled and linked against the instrumented runtime.

Does this apply for user defined events?

No. Clarification attempt: 8a29ebb.

General question about spans and timestamps, and their properties. Are timestamps strictly monotonic or regular monotonic? Taking this definition of monotonic:

Strict monotonic counters or clocks are guaranteed to return always increasing values (or always decreasing values). The sequence 1, 2, 3, 4, 5 is strictly monotonic.
Regular (non-strict) monotonic counters otherwise only require to return non-decreasing values (or non-increasing values). The sequence 1, 2, 2, 2, 3, 4 is monotonic, but not strictly monotonic.
definition from https://learnyousomeerlang.com/time

The clock is caml_time_counter, which I believe is “whatever best clock the system provides”, so I assume it will be strictly monotonic on modern systems? @sadiqj would be best placed to confirm this.

How do I observe the events for a particular domain vs all domains? This is implied in the signature for span_event_handler and int_event_handler but might be worth mentioning explicitly.

I agree. Clarified in e847dfe.

@OlivierNicole
Copy link
Contributor Author

The ring-buffers are all single-writer multiple-reader so you must be holding the runtime lock in order to write events to the ring buffer. It would be bad if two systhreads on the same domain wrote simultaneously (without holding the runtime lock).

I see, so no problem can arise from OCaml code (the runtime lock is always held). I propose to document this in the C API (08d3b4a).

@sadiqj
Copy link
Contributor

sadiqj commented Dec 20, 2023

The ring-buffers are all single-writer multiple-reader so you must be holding the runtime lock in order to write events to the ring buffer. It would be bad if two systhreads on the same domain wrote simultaneously (without holding the runtime lock).

I see, so no problem can arise from OCaml code (the runtime lock is always held). I propose to document this in the C API (08d3b4a).

This is a good idea, thanks.

The clock is caml_time_counter, which I believe is “whatever best clock the system provides”, so I assume it will be strictly monotonic on modern systems? @sadiqj would be best placed to confirm this.

This I actually don't know.

@OlivierNicole
Copy link
Contributor Author

(And thanks to @tmcgilchrist as well for the reviewing work)

\end{itemize}

Additional events can be declared and
consumed, providing higher-level dynamic introspection capabilities to OCaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I would say "monitoring" (or maybe "observability") rather than "introspection", and I am not sure what "dynamic" means and brings in this context. (Also there is a weird space at the beginning of line 18?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 4538ac4.

\texttt{int}, \texttt{span}) and user-defined types. To understand the
manipulation of custom events, it is useful to know how they are transported and
stored: their representation consists of a name string (in fact, an index into
an array of all custom names) and an arbitrary array of bytes. Custom event
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"array of bytes": does this means a byte array? If it is just bytes, then "an arbitrary byte sequence" would feel more natural to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in 4538ac4.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that there are other occurrences of "array of bytes" in the documentation that probably need to be fixed as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. 155849b

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy with the current state of the manual: I did a reading pass and I did not get any major unresolved question (except the one about shared/duplicated declaration of tags and events, which has been clarified in words and in code), which suggests that to me it is reasonable documentation. The other reviewers (thanks!) also seemed to have some form comments but no blocker.

It would be nice if we could articulate the value of tags (I think that there is some on the programming side: unlike string literals, tags will warn you if you make a typo, etc.), and maybe make some of the design choices more regular or clearly justified. But I think that this is fine for a first iteration of a manual for the feature as-is.

I propose to wait until @OlivierNicole considers that he is done integrating all the feedback, and then merge.

The commit history is not great with a lot of minor back-and-forth commits. I think that @OlivierNicole could squash together the commits of each author -- so we would have three commits, one for each different person who took this PR for a ride. Or maybe some finer split of commits if you think they make sense. Could you take care of this and force-push?

Thanks!

perform steps 1 to 3 to register custom events and custom event types (if any).
Note that the tag values need not be the same in both programs; the only value
that should be identical the same is the name.
that should match are the names.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only values that should match (apologies for not catching this earlier)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. 773fbc4

Lucas Pluvinage and others added 2 commits December 21, 2023 17:16
Co-authored-by: Olivier Nicole <olivier.github.com@chnik.fr>
@OlivierNicole OlivierNicole force-pushed the custom-events-manual branch 2 times, most recently from 2522c46 to 6170eba Compare December 21, 2023 16:19
@OlivierNicole
Copy link
Contributor Author

Thank you for your review. I squashed as suggested and added a Changes entry.

@OlivierNicole
Copy link
Contributor Author

I don’t really understand what the Hygiene failure message means.

@gasche
Copy link
Member

gasche commented Dec 21, 2023

The CI failure message says:

custom-events-manual branched from trunk at: 6be361ffd75a2fce5f4daa21fd5073b9a18ce31c
The parsetree has been modified.
Please assign the label parsetree-change to the PR

This appears to be a bug in the CI script deciding if a PR needs the parsetree-change label. This is highly likely to come from the recent Github Actions churn by @dra27 that I diligently approved today. Let's wait for @dra27 to look at it.

@gasche
Copy link
Member

gasche commented Dec 21, 2023

(The bug appears to come from the branch-point / most-recent-common-ancestor computation.)

@dra27
Copy link
Member

dra27 commented Dec 21, 2023

The churn on trunk wasn't in any of the hygiene jobs! Regardless, this was already failing from as soon as the PR was opened, which is before I started messing around. I'd noticed it from when the PR was first opened. I will try to have a look at it tomorrow.

@dra27
Copy link
Member

dra27 commented Dec 21, 2023

(the bug was two-fold - the parsetree-change check is failing, but the no-change-entry-needed was passing even when there was no Changes entry, but that's unsurprising because they share the same analysis of the commits)

@dra27 dra27 merged commit 2433ab3 into ocaml:trunk Dec 22, 2023
@dra27
Copy link
Member

dra27 commented Dec 22, 2023

Thank you all!

@Octachron
Copy link
Member

As a documentation update, don't forget to cherry-pick it on 5.2 .

dra27 added a commit that referenced this pull request Dec 22, 2023
 manual: update runtime tracing chapter for custom events (ex #12335)

(cherry picked from commit 2433ab3)
dra27 added a commit that referenced this pull request Dec 23, 2023
 manual: update runtime tracing chapter for custom events (ex #12335)

(cherry picked from commit 2433ab3)
@dra27
Copy link
Member

dra27 commented Dec 23, 2023

I pushed it to 5.1 as well, as it allows the possibility for the manual on ocaml.org to be recompiled with the updated chapter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants