Multicore statmemprof API #12381

NickBarnes · 2023-07-17T13:00:27Z

This is a small change to the statmemprof API, part of preparing it for multicore (#11911).

For details, see the comments in gc.mli, or my comment here.

I am breaking the large statmemprof PR (#12379) into a number of small separate ones, some of which are wholly independent. This is the first.

jhjourdan

This API change is a breaking API change. This is fine, since we declared this API as experimental, but still, I would like to understand why it is necessary to add this profile type only for discard. Also, since this is a breaking change, I would prefer not presenting it as a "minor API change", but rather make explicit in the CHANGES file that the function Gc.Memprof.start has changed its type.

stdlib/gc.mli

jhjourdan · 2023-07-17T14:43:42Z

stdlib/gc.mli


-        All the already tracked blocks are discarded. If there are
-        pending postponed callbacks, they may be discarded.
+    val discard : t -> unit


This API is somewhat incoherent, in that it requires a profile value while all the other functions use the "current profile". Any reason for this? Why would discard not work with the current profile?

I have to speculate on how this API is implemented, but reasoning in terms of ownership I would expect stop to transfer ownership of the callback state, rather than start; i.e.

val start : [...] -> unit val stop : unit -> t

This would address your point. Again this is making assumptions on what is natural from an implementation standpoint but we don't have the implementation.

Clearly more documentation and/or a better interface is required here. The intention is to allow uses such as a sequence of start/stop pairs, followed by subsequent discard operations on some or all of the resulting profiles. "The current profile" is only meaningful between start and stop.

For example, some highly-phased system (such as a traditional compiler) which is interested in the long-term fate of objects allocated during some particular phase or phases (say: objects allocated during some specific compiler optimization phase). It will call start and stop around the phase, but may not call discard until much later, giving time for allocated objects to be promoted and eventually GCed.

In such a use case, discard is still necessary to give the profiling tool author more control (before discard, as callbacks may run, that profile may cause overhead; after discard, it will not).

The primary motivation of this change is to allow callbacks to run after sampling has stopped. The rest of this design follows from that. (I agree with @gadmm that stop could return the t. Would that be better?)

Speculation isn't really necessary; #12379 (although still messy and buggy) fleshes this interface out. t is a disguised integer, the serial number of an underlying C structure which links to domain- and thread-specific data structures (and inherits objects from them if they terminate).

I had looked at #12379 but I find it hard to read at the moment, and having an overview is helpful anyway.

I have more questions and possible suggestions but I'd normally look at the implementation first. Can you please pinpoint the relevant commits implementing this change? (either now or after you finish your cleanup of #12379, there is no rush.) (I am thinking about another strategy to fully decouple stopping sampling and stopping running callbacks: implementing t as an atomic flag stored on the OCaml heap. The callbacks are not actually removed from the queues but simply ignored if the flag is set. Then it even makes sense to call discard before calling stop. But I might be missing some details at the moment.)

jhjourdan · 2023-07-17T15:27:52Z

One thing which is unclear with this new API: when spawning a new domain, does the new domain share the profile with the parent, or does it have a copy of it? It's not clear when reading the documentation, and should be clearly stated. In particular, what happens if we stop/discard on a profile inherited from a parent domain? Is this stopping/discarding operation also performed on the parent?

If it only has a copy of it, then it should be clearly stated, and this is a bit weird because there is no way to end up in this "profile sharing" situation if the profile is not created before spawning. Also, this means that a profile should be a memory-thread-safe data structure (with atomics, etc...), which seems to somewhat lower the point of domain-local profiles.

gadmm · 2023-07-17T15:42:23Z

@jhjourdan I found the wording clear that a newly-spawned domain has the same "profile", and therefore stopping the profile stops the sampling on both domains. (If not, there is a problem with the wording!) I agree with the implication in terms of thread-safe data structures, it is a bit hard to evaluate the API without the implementation.

gadmm · 2023-07-17T16:07:33Z

IIUC, once Memprof is running in a domain, it is not possible to spawn a domain that would be tracked with a different profile. This is not unlike the global aspects of Memprof in OCaml 4 but worth noting (and perhaps unavoidable).

@NickBarnes I think it has been helpful to go piecewise with this PR, but we're stuck with evaluating the API choices wrt. implementation strategies. Could you please give us an overall description of how the implementation of this design would go?

NickBarnes · 2023-07-17T16:15:59Z

@jhjourdan I found the wording clear that a newly-spawned domain has the same "profile", and therefore stopping the profile stops the sampling on both domains. (If not, there is a problem with the wording!) I agree with the implication in terms of thread-safe data structures, it is a bit hard to evaluate the API without the implementation.

This is correct.

NickBarnes · 2023-07-17T16:23:27Z

IIUC, once Memprof is running in a domain, it is not possible to spawn a domain that would be tracked with a different profile. This is not unlike the global aspects of Memprof in OCaml 4 but worth noting (and perhaps unavoidable).

This is a really interesting observation, thank you! The current implementation would allow the API to be broadened to enable use cases like that. Perhaps this:

  current: unit -> t option
  resume: t -> unit

Or this would also be sufficient:

   switch: t option -> t option (* new -> old *)

NickBarnes · 2023-07-17T16:26:16Z

One thing which is unclear with this new API: when spawning a new domain, does the new domain share the profile with the parent, or does it have a copy of it?

Yes, it shares the profile. Yes, this means there are some atomics etc in the implementation (which is doubtless where most of the remaining bugs are lurking).

NickBarnes · 2023-07-17T16:27:27Z

One thing which is unclear with this new API: when spawning a new domain, does the new domain share the profile with the parent, or does it have a copy of it?

Yes, it shares the profile. Yes, this means there are some atomics etc in the implementation (which is doubtless where most of the remaining bugs are lurking).

(although most of the dynamic data is kept in per-domain and per-thread data structures, for as long as the domains and threads exist, so there's not much contention on the profile objects themselves).

NickBarnes · 2023-07-17T16:29:16Z

   switch: t option -> t option (* new -> old *)

Note: this interface would also provide another route to get to the profile-sharing state mentioned by @jhjourdan above.

gadmm · 2023-07-17T17:14:01Z

The current implementation would allow the API to be broadened to enable use cases like that. Perhaps this:
  current: unit -> t option
  resume: t -> unit

I am thinking about how memprof-limits could be implemented on the new API. Currently the programmer is required to start memprof at the very top-level of the program, but having a function to query the current state as above could help make it more on-demand. (It is not a goal of my review of the present PR to have an API ready for memprof-limits, but I'll think about it over the course of the several PRs you are going to introduce.)

gasche · 2023-07-18T08:23:56Z

I see how having a value to denote the statmemprof state can be useful. In the future the state could even incorporate some user-specific state by having a parametrized type 's Statmemprof.t, where start takes an 's value which is passed along to all callbacks.

.

My understanding of the stop/discard discussion is that you are trying to separate two different things:

tracking the cohort: statmemprof follows a cohort of memory words, over their lifetime, with various callbacks for significant life events
sampling allocations: statmemprof grows its cohort by sampling new allocations at a given rate

The 4.x statmemprof API forces the tracking period and the sampling period to be the same. You are proposing an API where one may sample only for a short time, at the beginning of the tracking period, and keep tracking for longer. Natural extensions include:

pausing and resuming the sampling, to have several sampling periods within a tracking period
thinking a bit more generally about what "profiles" (probes?) are sampling which allocations (currently we are discussing spatial separations, where separate probes/profiles would sample separate domains; would it make sense for several profiles to be sampling the same allocations?)

NickBarnes · 2023-07-18T08:43:45Z

My understanding of the stop/discard discussion is that you are trying to separate two different things:
* tracking the cohort: statmemprof follows a cohort of memory words, over their lifetime, with various callbacks for significant life events

* sampling allocations: statmemprof grows its cohort by sampling new allocations at a given rate

Yes, this is exactly it.

Natural extensions include:

Yes, lots of extensions are possible and pretty easy, once we are content with the underlying system.

would it make sense for several profiles to be sampling the same allocations?

I thought about this but decided it was a step too far for the implementation at this stage (several simultaneous Bernouilli processes). Once we have the abstraction of "a profile", we could revisit this.

NickBarnes · 2023-07-18T09:17:21Z

Build / extra (debug) (pull_request) Failing after 19m Details

For the record, I don't believe this failure (in ephe-weak-final/finaliser_handover (native) can be caused by this PR:

> ### begin stdout ###
> [03] file runtime/domain.c; line 1746 ### Assertion failed: caml_gc_phase == Phase_sweep_and_mark_main
> ### end stdout ###

Although other statmemprof PRs will affect this code; this PR does not.

kayceesrk · 2023-07-18T09:25:41Z

Finaliser handover is known to be flaky: #12345.

gadmm · 2023-07-19T16:18:26Z

I wonder if you intend this as a definitive PR to discuss the Memprof API for OCaml 5. If it is the case, it might be the right place to discuss making type allocation_source future-proof for possible additions (e.g. by making it an extensible variant, in order to force the handling the default case).

(I have in mind the tracking of stack usage: in OCaml 5 the stack is now allocated like normal memory, so there is an opportunity to profile it like normal memory allocation. For context, we have established that running callbacks from the stack extension code would be a bit of work, however it would be easy as a first implementation to queue allocation callbacks to track the stack allocation at the next safe point.)

If you want, for the moment I would simply like to know what's the best course of action to discuss such additional changes to the API. (Once there is a consensus I can work on a PR myself.)

jhjourdan · 2023-07-20T13:03:15Z

Ok, so now I better understand this API change, and I agree that given the proposed extensions, it is better if start returns the profile rather than stop. So, as far as I'm concerned, I'm happy with this PR.

jhjourdan · 2023-07-20T13:04:52Z

I see how having a value to denote the statmemprof state can be useful. In the future the state could even incorporate some user-specific state by having a parametrized type 's Statmemprof.t, where start takes an 's value which is passed along to all callbacks.

Well, you know, in functional programming languages, we have a feature called "closures" for that. ;)

gasche · 2023-07-20T13:08:39Z

I agree, but I wonder if making this existential state explicit in the API could make it easier for users to think about.

NickBarnes · 2023-07-20T15:19:46Z

I wonder if you intend this as a definitive PR to discuss the Memprof API for OCaml 5.

I'm hoping this will be a small (and easy-to-merge) PR as one step towards getting statmemprof for OCaml 5 merged at all. Further refinements to the API, which are certainly possible, I'd like us to postpone until after we have trunk statmemprof. For example, I think we should defer work on functions along the lines of current, switch, and resume. So although I'm not opposed to allocation_source becoming an extensible variant type, it shouldn't happen in this PR. The discussions are interesting, and it's OK for them to take place here, as long as they don't turn into wishlists for this PR to address.

gadmm

Returning t from start rather than stop makes sense to me for now and further discussion can happen when reviewing the implementation. Therefore this looks good to go, as a step forward rather than a final decision on the Memprof API.

NickBarnes mentioned this pull request Jul 17, 2023

Statmemprof for multicore [do not merge] #12379

Closed

jhjourdan reviewed Jul 17, 2023

View reviewed changes

kayceesrk added the statmemprof PRs and issues related to statmemprof label Jul 18, 2023

gasche added the flaky-ci-failure label Jul 18, 2023

gasche mentioned this pull request Jul 18, 2023

tests/weak-ephe-final/finaliser_handover is flaky #12345

Closed

Multicore statmemprof API.

ffdd06b

NickBarnes force-pushed the nick-statmemprof-api branch from 39031a8 to ffdd06b Compare July 18, 2023 14:13

jhjourdan self-requested a review July 20, 2023 14:29

jhjourdan approved these changes Jul 20, 2023

View reviewed changes

Merge remote-tracking branch 'origin/trunk' into nick-statmemprof-api

10310e1

gadmm approved these changes Jul 22, 2023

View reviewed changes

Merge remote-tracking branch 'upstream/trunk' into nick-statmemprof-api

bbc47d9

Restore blank line.

2dd9dfe

kayceesrk merged commit 41ccfb7 into ocaml:trunk Aug 4, 2023
9 checks passed

NickBarnes mentioned this pull request Jan 19, 2024

Statmemprof resurrected #12923

Merged

mbarbin mentioned this pull request May 14, 2024

Release v0.17 of Core and dependencies ocaml/opam-repository#25818

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multicore statmemprof API #12381

Multicore statmemprof API #12381

NickBarnes commented Jul 17, 2023

jhjourdan left a comment

jhjourdan Jul 17, 2023

gadmm Jul 17, 2023 •

edited

NickBarnes Jul 17, 2023

gadmm Jul 17, 2023

jhjourdan commented Jul 17, 2023

gadmm commented Jul 17, 2023

gadmm commented Jul 17, 2023

NickBarnes commented Jul 17, 2023

NickBarnes commented Jul 17, 2023

NickBarnes commented Jul 17, 2023

NickBarnes commented Jul 17, 2023

NickBarnes commented Jul 17, 2023

gadmm commented Jul 17, 2023

gasche commented Jul 18, 2023

NickBarnes commented Jul 18, 2023

NickBarnes commented Jul 18, 2023

kayceesrk commented Jul 18, 2023

gadmm commented Jul 19, 2023

jhjourdan commented Jul 20, 2023

jhjourdan commented Jul 20, 2023

gasche commented Jul 20, 2023

NickBarnes commented Jul 20, 2023

gadmm left a comment

Multicore statmemprof API #12381

Multicore statmemprof API #12381

Conversation

NickBarnes commented Jul 17, 2023

jhjourdan left a comment

Choose a reason for hiding this comment

jhjourdan Jul 17, 2023

Choose a reason for hiding this comment

gadmm Jul 17, 2023 • edited

Choose a reason for hiding this comment

NickBarnes Jul 17, 2023

Choose a reason for hiding this comment

gadmm Jul 17, 2023

Choose a reason for hiding this comment

jhjourdan commented Jul 17, 2023

gadmm commented Jul 17, 2023

gadmm commented Jul 17, 2023

NickBarnes commented Jul 17, 2023

NickBarnes commented Jul 17, 2023

NickBarnes commented Jul 17, 2023

NickBarnes commented Jul 17, 2023

NickBarnes commented Jul 17, 2023

gadmm commented Jul 17, 2023

gasche commented Jul 18, 2023

NickBarnes commented Jul 18, 2023

NickBarnes commented Jul 18, 2023

kayceesrk commented Jul 18, 2023

gadmm commented Jul 19, 2023

jhjourdan commented Jul 20, 2023

jhjourdan commented Jul 20, 2023

gasche commented Jul 20, 2023

NickBarnes commented Jul 20, 2023

gadmm left a comment

Choose a reason for hiding this comment

gadmm Jul 17, 2023 •

edited