Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multicore statmemprof API #12381

Merged
merged 4 commits into from
Aug 4, 2023
Merged

Conversation

NickBarnes
Copy link
Contributor

This is a small change to the statmemprof API, part of preparing it for multicore (#11911).

For details, see the comments in gc.mli, or my comment here.

I am breaking the large statmemprof PR (#12379) into a number of small separate ones, some of which are wholly independent. This is the first.

Copy link
Contributor

@jhjourdan jhjourdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API change is a breaking API change. This is fine, since we declared this API as experimental, but still, I would like to understand why it is necessary to add this profile type only for discard. Also, since this is a breaking change, I would prefer not presenting it as a "minor API change", but rather make explicit in the CHANGES file that the function Gc.Memprof.start has changed its type.

stdlib/gc.mli Outdated Show resolved Hide resolved

All the already tracked blocks are discarded. If there are
pending postponed callbacks, they may be discarded.
val discard : t -> unit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API is somewhat incoherent, in that it requires a profile value while all the other functions use the "current profile". Any reason for this? Why would discard not work with the current profile?

Copy link
Contributor

@gadmm gadmm Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to speculate on how this API is implemented, but reasoning in terms of ownership I would expect stop to transfer ownership of the callback state, rather than start; i.e.

val start : [...] -> unit
val stop : unit -> t

This would address your point. Again this is making assumptions on what is natural from an implementation standpoint but we don't have the implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clearly more documentation and/or a better interface is required here. The intention is to allow uses such as a sequence of start/stop pairs, followed by subsequent discard operations on some or all of the resulting profiles. "The current profile" is only meaningful between start and stop.

For example, some highly-phased system (such as a traditional compiler) which is interested in the long-term fate of objects allocated during some particular phase or phases (say: objects allocated during some specific compiler optimization phase). It will call start and stop around the phase, but may not call discard until much later, giving time for allocated objects to be promoted and eventually GCed.

In such a use case, discard is still necessary to give the profiling tool author more control (before discard, as callbacks may run, that profile may cause overhead; after discard, it will not).

The primary motivation of this change is to allow callbacks to run after sampling has stopped. The rest of this design follows from that. (I agree with @gadmm that stop could return the t. Would that be better?)

Speculation isn't really necessary; #12379 (although still messy and buggy) fleshes this interface out. t is a disguised integer, the serial number of an underlying C structure which links to domain- and thread-specific data structures (and inherits objects from them if they terminate).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had looked at #12379 but I find it hard to read at the moment, and having an overview is helpful anyway.

I have more questions and possible suggestions but I'd normally look at the implementation first. Can you please pinpoint the relevant commits implementing this change? (either now or after you finish your cleanup of #12379, there is no rush.) (I am thinking about another strategy to fully decouple stopping sampling and stopping running callbacks: implementing t as an atomic flag stored on the OCaml heap. The callbacks are not actually removed from the queues but simply ignored if the flag is set. Then it even makes sense to call discard before calling stop. But I might be missing some details at the moment.)

@jhjourdan
Copy link
Contributor

One thing which is unclear with this new API: when spawning a new domain, does the new domain share the profile with the parent, or does it have a copy of it? It's not clear when reading the documentation, and should be clearly stated. In particular, what happens if we stop/discard on a profile inherited from a parent domain? Is this stopping/discarding operation also performed on the parent?

If it only has a copy of it, then it should be clearly stated, and this is a bit weird because there is no way to end up in this "profile sharing" situation if the profile is not created before spawning. Also, this means that a profile should be a memory-thread-safe data structure (with atomics, etc...), which seems to somewhat lower the point of domain-local profiles.

@gadmm
Copy link
Contributor

gadmm commented Jul 17, 2023

@jhjourdan I found the wording clear that a newly-spawned domain has the same "profile", and therefore stopping the profile stops the sampling on both domains. (If not, there is a problem with the wording!) I agree with the implication in terms of thread-safe data structures, it is a bit hard to evaluate the API without the implementation.

@gadmm
Copy link
Contributor

gadmm commented Jul 17, 2023

IIUC, once Memprof is running in a domain, it is not possible to spawn a domain that would be tracked with a different profile. This is not unlike the global aspects of Memprof in OCaml 4 but worth noting (and perhaps unavoidable).

@NickBarnes I think it has been helpful to go piecewise with this PR, but we're stuck with evaluating the API choices wrt. implementation strategies. Could you please give us an overall description of how the implementation of this design would go?

@NickBarnes
Copy link
Contributor Author

@jhjourdan I found the wording clear that a newly-spawned domain has the same "profile", and therefore stopping the profile stops the sampling on both domains. (If not, there is a problem with the wording!) I agree with the implication in terms of thread-safe data structures, it is a bit hard to evaluate the API without the implementation.

This is correct.

@NickBarnes
Copy link
Contributor Author

IIUC, once Memprof is running in a domain, it is not possible to spawn a domain that would be tracked with a different profile. This is not unlike the global aspects of Memprof in OCaml 4 but worth noting (and perhaps unavoidable).

This is a really interesting observation, thank you! The current implementation would allow the API to be broadened to enable use cases like that. Perhaps this:

  current: unit -> t option
  resume: t -> unit

Or this would also be sufficient:

   switch: t option -> t option (* new -> old *)

@NickBarnes
Copy link
Contributor Author

One thing which is unclear with this new API: when spawning a new domain, does the new domain share the profile with the parent, or does it have a copy of it?

Yes, it shares the profile. Yes, this means there are some atomics etc in the implementation (which is doubtless where most of the remaining bugs are lurking).

@NickBarnes
Copy link
Contributor Author

One thing which is unclear with this new API: when spawning a new domain, does the new domain share the profile with the parent, or does it have a copy of it?

Yes, it shares the profile. Yes, this means there are some atomics etc in the implementation (which is doubtless where most of the remaining bugs are lurking).

(although most of the dynamic data is kept in per-domain and per-thread data structures, for as long as the domains and threads exist, so there's not much contention on the profile objects themselves).

@NickBarnes
Copy link
Contributor Author

   switch: t option -> t option (* new -> old *)

Note: this interface would also provide another route to get to the profile-sharing state mentioned by @jhjourdan above.

@gadmm
Copy link
Contributor

gadmm commented Jul 17, 2023

The current implementation would allow the API to be broadened to enable use cases like that. Perhaps this:

  current: unit -> t option
  resume: t -> unit

I am thinking about how memprof-limits could be implemented on the new API. Currently the programmer is required to start memprof at the very top-level of the program, but having a function to query the current state as above could help make it more on-demand. (It is not a goal of my review of the present PR to have an API ready for memprof-limits, but I'll think about it over the course of the several PRs you are going to introduce.)

@kayceesrk kayceesrk added the statmemprof PRs and issues related to statmemprof label Jul 18, 2023
@gasche
Copy link
Member

gasche commented Jul 18, 2023

I see how having a value to denote the statmemprof state can be useful. In the future the state could even incorporate some user-specific state by having a parametrized type 's Statmemprof.t, where start takes an 's value which is passed along to all callbacks.

.

My understanding of the stop/discard discussion is that you are trying to separate two different things:

  • tracking the cohort: statmemprof follows a cohort of memory words, over their lifetime, with various callbacks for significant life events
  • sampling allocations: statmemprof grows its cohort by sampling new allocations at a given rate

The 4.x statmemprof API forces the tracking period and the sampling period to be the same. You are proposing an API where one may sample only for a short time, at the beginning of the tracking period, and keep tracking for longer. Natural extensions include:

  • pausing and resuming the sampling, to have several sampling periods within a tracking period
  • thinking a bit more generally about what "profiles" (probes?) are sampling which allocations (currently we are discussing spatial separations, where separate probes/profiles would sample separate domains; would it make sense for several profiles to be sampling the same allocations?)

@NickBarnes
Copy link
Contributor Author

My understanding of the stop/discard discussion is that you are trying to separate two different things:

* tracking the cohort: statmemprof follows a cohort of memory words, over their lifetime, with various callbacks for significant life events

* sampling allocations: statmemprof grows its cohort by sampling new allocations at a given rate

Yes, this is exactly it.

Natural extensions include:

Yes, lots of extensions are possible and pretty easy, once we are content with the underlying system.

would it make sense for several profiles to be sampling the same allocations?

I thought about this but decided it was a step too far for the implementation at this stage (several simultaneous Bernouilli processes). Once we have the abstraction of "a profile", we could revisit this.

@NickBarnes
Copy link
Contributor Author

Build / extra (debug) (pull_request) Failing after 19m Details

For the record, I don't believe this failure (in ephe-weak-final/finaliser_handover (native) can be caused by this PR:

> ### begin stdout ###
> [03] file runtime/domain.c; line 1746 ### Assertion failed: caml_gc_phase == Phase_sweep_and_mark_main
> ### end stdout ###

Although other statmemprof PRs will affect this code; this PR does not.

@kayceesrk
Copy link
Contributor

Finaliser handover is known to be flaky: #12345.

@gadmm
Copy link
Contributor

gadmm commented Jul 19, 2023

I wonder if you intend this as a definitive PR to discuss the Memprof API for OCaml 5. If it is the case, it might be the right place to discuss making type allocation_source future-proof for possible additions (e.g. by making it an extensible variant, in order to force the handling the default case).

(I have in mind the tracking of stack usage: in OCaml 5 the stack is now allocated like normal memory, so there is an opportunity to profile it like normal memory allocation. For context, we have established that running callbacks from the stack extension code would be a bit of work, however it would be easy as a first implementation to queue allocation callbacks to track the stack allocation at the next safe point.)

If you want, for the moment I would simply like to know what's the best course of action to discuss such additional changes to the API. (Once there is a consensus I can work on a PR myself.)

@jhjourdan
Copy link
Contributor

Ok, so now I better understand this API change, and I agree that given the proposed extensions, it is better if start returns the profile rather than stop. So, as far as I'm concerned, I'm happy with this PR.

@jhjourdan
Copy link
Contributor

I see how having a value to denote the statmemprof state can be useful. In the future the state could even incorporate some user-specific state by having a parametrized type 's Statmemprof.t, where start takes an 's value which is passed along to all callbacks.

Well, you know, in functional programming languages, we have a feature called "closures" for that. ;)

@gasche
Copy link
Member

gasche commented Jul 20, 2023

I agree, but I wonder if making this existential state explicit in the API could make it easier for users to think about.

@jhjourdan jhjourdan self-requested a review July 20, 2023 14:29
@NickBarnes
Copy link
Contributor Author

I wonder if you intend this as a definitive PR to discuss the Memprof API for OCaml 5.

I'm hoping this will be a small (and easy-to-merge) PR as one step towards getting statmemprof for OCaml 5 merged at all. Further refinements to the API, which are certainly possible, I'd like us to postpone until after we have trunk statmemprof. For example, I think we should defer work on functions along the lines of current, switch, and resume. So although I'm not opposed to allocation_source becoming an extensible variant type, it shouldn't happen in this PR. The discussions are interesting, and it's OK for them to take place here, as long as they don't turn into wishlists for this PR to address.

Copy link
Contributor

@gadmm gadmm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning t from start rather than stop makes sense to me for now and further discussion can happen when reviewing the implementation. Therefore this looks good to go, as a step forward rather than a final decision on the Memprof API.

@kayceesrk kayceesrk merged commit 41ccfb7 into ocaml:trunk Aug 4, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-ci-failure statmemprof PRs and issues related to statmemprof
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants