-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multicore statmemprof API #12381
Multicore statmemprof API #12381
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API change is a breaking API change. This is fine, since we declared this API as experimental, but still, I would like to understand why it is necessary to add this profile type only for discard
. Also, since this is a breaking change, I would prefer not presenting it as a "minor API change", but rather make explicit in the CHANGES file that the function Gc.Memprof.start
has changed its type.
|
||
All the already tracked blocks are discarded. If there are | ||
pending postponed callbacks, they may be discarded. | ||
val discard : t -> unit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API is somewhat incoherent, in that it requires a profile value while all the other functions use the "current profile". Any reason for this? Why would discard not work with the current profile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to speculate on how this API is implemented, but reasoning in terms of ownership I would expect stop
to transfer ownership of the callback state, rather than start
; i.e.
val start : [...] -> unit
val stop : unit -> t
This would address your point. Again this is making assumptions on what is natural from an implementation standpoint but we don't have the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clearly more documentation and/or a better interface is required here. The intention is to allow uses such as a sequence of start
/stop
pairs, followed by subsequent discard
operations on some or all of the resulting profiles. "The current profile" is only meaningful between start
and stop
.
For example, some highly-phased system (such as a traditional compiler) which is interested in the long-term fate of objects allocated during some particular phase or phases (say: objects allocated during some specific compiler optimization phase). It will call start
and stop
around the phase, but may not call discard
until much later, giving time for allocated objects to be promoted and eventually GCed.
In such a use case, discard
is still necessary to give the profiling tool author more control (before discard
, as callbacks may run, that profile may cause overhead; after discard
, it will not).
The primary motivation of this change is to allow callbacks to run after sampling has stopped. The rest of this design follows from that. (I agree with @gadmm that stop
could return the t
. Would that be better?)
Speculation isn't really necessary; #12379 (although still messy and buggy) fleshes this interface out. t
is a disguised integer, the serial number of an underlying C structure which links to domain- and thread-specific data structures (and inherits objects from them if they terminate).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had looked at #12379 but I find it hard to read at the moment, and having an overview is helpful anyway.
I have more questions and possible suggestions but I'd normally look at the implementation first. Can you please pinpoint the relevant commits implementing this change? (either now or after you finish your cleanup of #12379, there is no rush.) (I am thinking about another strategy to fully decouple stopping sampling and stopping running callbacks: implementing t
as an atomic flag stored on the OCaml heap. The callbacks are not actually removed from the queues but simply ignored if the flag is set. Then it even makes sense to call discard
before calling stop
. But I might be missing some details at the moment.)
One thing which is unclear with this new API: when spawning a new domain, does the new domain share the profile with the parent, or does it have a copy of it? It's not clear when reading the documentation, and should be clearly stated. In particular, what happens if we stop/discard on a profile inherited from a parent domain? Is this stopping/discarding operation also performed on the parent? If it only has a copy of it, then it should be clearly stated, and this is a bit weird because there is no way to end up in this "profile sharing" situation if the profile is not created before spawning. Also, this means that a profile should be a memory-thread-safe data structure (with atomics, etc...), which seems to somewhat lower the point of domain-local profiles. |
@jhjourdan I found the wording clear that a newly-spawned domain has the same "profile", and therefore stopping the profile stops the sampling on both domains. (If not, there is a problem with the wording!) I agree with the implication in terms of thread-safe data structures, it is a bit hard to evaluate the API without the implementation. |
IIUC, once Memprof is running in a domain, it is not possible to spawn a domain that would be tracked with a different profile. This is not unlike the global aspects of Memprof in OCaml 4 but worth noting (and perhaps unavoidable). @NickBarnes I think it has been helpful to go piecewise with this PR, but we're stuck with evaluating the API choices wrt. implementation strategies. Could you please give us an overall description of how the implementation of this design would go? |
This is correct. |
This is a really interesting observation, thank you! The current implementation would allow the API to be broadened to enable use cases like that. Perhaps this:
Or this would also be sufficient:
|
Yes, it shares the profile. Yes, this means there are some atomics etc in the implementation (which is doubtless where most of the remaining bugs are lurking). |
(although most of the dynamic data is kept in per-domain and per-thread data structures, for as long as the domains and threads exist, so there's not much contention on the profile objects themselves). |
Note: this interface would also provide another route to get to the profile-sharing state mentioned by @jhjourdan above. |
I am thinking about how memprof-limits could be implemented on the new API. Currently the programmer is required to start memprof at the very top-level of the program, but having a function to query the current state as above could help make it more on-demand. (It is not a goal of my review of the present PR to have an API ready for memprof-limits, but I'll think about it over the course of the several PRs you are going to introduce.) |
I see how having a value to denote the statmemprof state can be useful. In the future the state could even incorporate some user-specific state by having a parametrized type . My understanding of the stop/discard discussion is that you are trying to separate two different things:
The 4.x statmemprof API forces the tracking period and the sampling period to be the same. You are proposing an API where one may sample only for a short time, at the beginning of the tracking period, and keep tracking for longer. Natural extensions include:
|
Yes, this is exactly it.
Yes, lots of extensions are possible and pretty easy, once we are content with the underlying system.
I thought about this but decided it was a step too far for the implementation at this stage (several simultaneous Bernouilli processes). Once we have the abstraction of "a profile", we could revisit this. |
For the record, I don't believe this failure (in
Although other statmemprof PRs will affect this code; this PR does not. |
Finaliser handover is known to be flaky: #12345. |
39031a8
to
ffdd06b
Compare
I wonder if you intend this as a definitive PR to discuss the Memprof API for OCaml 5. If it is the case, it might be the right place to discuss making (I have in mind the tracking of stack usage: in OCaml 5 the stack is now allocated like normal memory, so there is an opportunity to profile it like normal memory allocation. For context, we have established that running callbacks from the stack extension code would be a bit of work, however it would be easy as a first implementation to queue allocation callbacks to track the stack allocation at the next safe point.) If you want, for the moment I would simply like to know what's the best course of action to discuss such additional changes to the API. (Once there is a consensus I can work on a PR myself.) |
Ok, so now I better understand this API change, and I agree that given the proposed extensions, it is better if |
Well, you know, in functional programming languages, we have a feature called "closures" for that. ;) |
I agree, but I wonder if making this existential state explicit in the API could make it easier for users to think about. |
I'm hoping this will be a small (and easy-to-merge) PR as one step towards getting statmemprof for OCaml 5 merged at all. Further refinements to the API, which are certainly possible, I'd like us to postpone until after we have trunk statmemprof. For example, I think we should defer work on functions along the lines of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning t
from start
rather than stop makes sense to me for now and further discussion can happen when reviewing the implementation. Therefore this looks good to go, as a step forward rather than a final decision on the Memprof API.
This is a small change to the statmemprof API, part of preparing it for multicore (#11911).
For details, see the comments in gc.mli, or my comment here.
I am breaking the large statmemprof PR (#12379) into a number of small separate ones, some of which are wholly independent. This is the first.