-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restructure thread event handling #1796
Conversation
Great. No test failed, another evidence for the correctness, in addition to the logical reasoning. Let me write some more commits to do a larger scale rewriting for the thread event module... |
Restructured the thread event handling logic. The key change is the last commit. On a high level, the event update logic is now completely internal to the thread event module, and it's only executed at event handling time. Therefore I can "unzip" the event handling logic -
|
77ede4e
to
8c08340
Compare
Slightly improved the last commit, plus stacked two more commits. Of the two new commits, the first is a slight refactoring and the second is the real change: for prof sample event, whenever we want to postpone (meaning |
8c08340
to
c32a253
Compare
Well, strictly speaking, the correctness is still not guaranteed: we're losing samples when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the fundamental claim here is right (i.e. that wait times belong in thread_event and not in their constituent parts). The right division of responsibilities seems like:
- thread_event manages the logic for determining which callbacks to invoke
- Individual events choose what to do and when to do it
- Some caller actually invokes the callbacks.
By analogy, the ticker_t doesn't know what event it's ticking down to; the thing tracking the ticker does. The thread_event stuff is really just a multi-ticker (i.e. it maintains some set of tickers, and counts them all down at the same rate).
src/thread_event.c
Outdated
static void | ||
prof_sample_threshold_update(tsd_t *tsd) { | ||
static uint64_t | ||
prof_sample_new_event_wait(tsd_t *tsd) { | ||
#ifdef JEMALLOC_PROF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better in these sorts of things is just a cassert(config_prof)
if we can get away with it.
(In general, we try to avoid #define
-ing unused functionality away as much as we can, since it means that we don't find compilation breakages until we send stuff away to CI).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you meant. Were you suggesting getting rid of the #ifdef JEMALLOC_PROF
and have a cassert()
? I was just copying things. I thought what the comment above meant was that the #ifdef JEMALLOC_PROF
was necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. I didn't see the comment. That makes sense.
Yes, that makes sense. I was also slightly leaning towards this way, but I chose the easier alternative, because all events except the prof sampling event have the event timing within the thread event module. I'll do some restructuring. |
Rebase, and pushed the "when" part into the modules owning each event.
The caller is still the thread event module itself. There doesn't seem to me to be an appealing reason why the thread event module should surface up which events should be triggered and let the caller trigger them. Instead of the ticker, my analogy is the buffered writer - it should just flush itself when needed instead of asking the caller to flush for it. If we really want a neat structure, we can completely get rid of I haven't yet got an answer regarding your point on the |
c32a253
to
60bf461
Compare
(Ideally I should have also pushed the triggering functions to individual modules, but some of them are dependent on some thread event counters so I kept these triggering functions in thread event for now.) |
Sorry - just removing the commit d2a19dd in the middle: figured that it would better be handled in a later stack. All other commits stay the same. |
60bf461
to
7f10580
Compare
I think I'm now more determined to push the triggering functions to individual modules. How about this: the function will have a signature of |
Added a couple of more commits. This PR should be in its final shape now. The first new commit pulls the event handler logic into their constituent modules, so the thread event handler module only contains thread event specific content. It's possible that we could further do some restructuring tricks, but things are now in a sufficiently satisfactory shape to me. The next few commits ensure proper counter initialization in all cases. Without these commits, we previous failed to initialize the counters when a new thread deallocates before making any allocation. We were lucky that the deallocation path only had one event, the tcache GC event, which does no harm; otherwise every single deallocation event would be triggered on the very first deallocation call. My current approach is to initialize both the allocation counters and the deallocation counters in TSD full init, and only initialize the deallocation counters in the TSD minimal init. Alternatively, I could also avoid calling any counter initialization at all in the TSD init functions, and instead put it right before the counters are changed (in A few other minor changes / benefits:
|
It seems getting rid of the TSD static initializer for the counters didn't work for the background thread. Let me look into it... |
There's something that I still don't understand, but the background thread can be in reincarnated state and it allocates without initializing the counters. I end up just initializing both the allocation counters and the deallocation counters in both TSC full init and TSD minimal init. The counter init is so cheap anyway. Also got rid of the other commits trying to divide allocation and deallocation. |
0fef298
to
2f34ff6
Compare
Realized that it was not entirely right: the TSD minimal init labels reentrancy on the TSD, so the events would not be triggered. However, there does exist downside, perhaps a more serious one: the thread event logic would keep postponing the event to the next deallocation call, ending up routing all subsequent calls to the slow path (until an allocation call comes). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stamping because I think this is an improvement relative to the status quo, but I think this has left me even more convinced that we ought to push more of the invocation logic into the callers of this module and have some sort of multi-ticker abstraction. This feels very event-loop-y to me in the sense that I have a very hard time tracking the logic of what gets invoked when and why.
Rebase on top of #1819 and simplify. |
2f34ff6
to
df98954
Compare
Rebase. |
df98954
to
c1c237d
Compare
Figured that a7c27fd is not entirely correct: to be more rigorous the prng seed needs to be initialized before |
c1c237d
to
0512336
Compare
On a logical level, the sample wait time belongs to the thread event module, and should not be determined according to the state of the prof
tdata
. It is a "prof -> thread event" dependency that we should remove. Thetdata
holds the temporary storage space for the backtracing result, so we have to have a validtdata
when we determine that we should sample. This is their only relationship, and it's an "allocation caller -> thread event lookahead" dependency followed by an "allocation caller ->tdata
" dependency, and does not involve any "prof -> thread event" dependency.However, this dependency was present before the refactoring of #1779, so it could either be truly unnecessary, or it tried to solve some edge cases I failed to consider. Let's see if any tests fail...