Restructure thread event handling #1796

yinan1048576 · 2020-03-23T22:16:43Z

On a logical level, the sample wait time belongs to the thread event module, and should not be determined according to the state of the prof tdata. It is a "prof -> thread event" dependency that we should remove. The tdata holds the temporary storage space for the backtracing result, so we have to have a valid tdata when we determine that we should sample. This is their only relationship, and it's an "allocation caller -> thread event lookahead" dependency followed by an "allocation caller -> tdata" dependency, and does not involve any "prof -> thread event" dependency.

However, this dependency was present before the refactoring of #1779, so it could either be truly unnecessary, or it tried to solve some edge cases I failed to consider. Let's see if any tests fail...

yinan1048576 · 2020-03-24T00:01:14Z

Great. No test failed, another evidence for the correctness, in addition to the logical reasoning. Let me write some more commits to do a larger scale rewriting for the thread event module...

yinan1048576 · 2020-03-25T17:18:44Z

Restructured the thread event handling logic.

The key change is the last commit. On a high level, the event update logic is now completely internal to the thread event module, and it's only executed at event handling time. Therefore I can "unzip" the event handling logic -

Previously, we iterate over all events, where for each event we first update the wait time, then recompute the global threshold, and finally trigger the event; now we first update the wait time for all events, then recompute the global threshold, and then trigger the event for all events.
The main effect is: now we're able to finalize all the counters before any actual events are triggered. Event triggering has chances of reentrancy, and that was the reason why we had to deal with the possibility of the allocation counter having already jumped above the threshold at the time of event time update (via the delay_event trick).
A by-product of the "unzipping": we only recompute the global threshold once per trigger, rather than once per event.

yinan1048576 · 2020-03-25T23:31:02Z

Slightly improved the last commit, plus stacked two more commits.

Of the two new commits, the first is a slight refactoring and the second is the real change: for prof sample event, whenever we want to postpone (meaning tsd_nominal(tsd) && tsd_reentrancy_level_get(tsd) == 0 failed), instead of always sampling the immediate next allocation, we draw a fresh new wait time. This is to avoid any sampling bias, so that we can guarantee the correctness of our solution to #1751.

yinan1048576 · 2020-03-25T23:43:42Z

Well, strictly speaking, the correctness is still not guaranteed: we're losing samples when tsd_nominal(tsd) fails, but such cases are very rare.

davidtgoldblatt

I don't think the fundamental claim here is right (i.e. that wait times belong in thread_event and not in their constituent parts). The right division of responsibilities seems like:

thread_event manages the logic for determining which callbacks to invoke
Individual events choose what to do and when to do it
Some caller actually invokes the callbacks.

By analogy, the ticker_t doesn't know what event it's ticking down to; the thing tracking the ticker does. The thread_event stuff is really just a multi-ticker (i.e. it maintains some set of tickers, and counts them all down at the same rate).

davidtgoldblatt · 2020-04-13T19:29:22Z

src/thread_event.c

-static void
-prof_sample_threshold_update(tsd_t *tsd) {
+static uint64_t
+prof_sample_new_event_wait(tsd_t *tsd) {
 #ifdef JEMALLOC_PROF


Better in these sorts of things is just a cassert(config_prof) if we can get away with it.

(In general, we try to avoid #define-ing unused functionality away as much as we can, since it means that we don't find compilation breakages until we send stuff away to CI).

I'm not sure what you meant. Were you suggesting getting rid of the #ifdef JEMALLOC_PROF and have a cassert()? I was just copying things. I thought what the comment above meant was that the #ifdef JEMALLOC_PROF was necessary?

Ah, I see. I didn't see the comment. That makes sense.

yinan1048576 · 2020-04-14T21:44:52Z

Yes, that makes sense. I was also slightly leaning towards this way, but I chose the easier alternative, because all events except the prof sampling event have the event timing within the thread event module. I'll do some restructuring.

yinan1048576 · 2020-04-16T20:54:49Z

Rebase, and pushed the "when" part into the modules owning each event.

Some caller actually invokes the callbacks.

The caller is still the thread event module itself. There doesn't seem to me to be an appealing reason why the thread event module should surface up which events should be triggered and let the caller trigger them. Instead of the ticker, my analogy is the buffered writer - it should just flush itself when needed instead of asking the caller to flush for it.

If we really want a neat structure, we can completely get rid of tsd_te_init(), and instead let each module register its triggering and timing callbacks in their individual init functions, but it seems we'd need to store the callback pointers in the TSD in such a case.

I haven't yet got an answer regarding your point on the cassert(config_prof) stuff. It's anyway not quite related to this PR. We can discuss more later.

yinan1048576 · 2020-04-16T21:03:31Z

(Ideally I should have also pushed the triggering functions to individual modules, but some of them are dependent on some thread event counters so I kept these triggering functions in thread event for now.)

yinan1048576 · 2020-04-16T22:45:28Z

Sorry - just removing the commit d2a19dd in the middle: figured that it would better be handled in a later stack. All other commits stay the same.

yinan1048576 · 2020-04-16T23:01:41Z

I think I'm now more determined to push the triggering functions to individual modules. How about this: the function will have a signature of void event#_event_handler(tsd_t *, uint64_t), where the second parameter is the accumulated bytes from the last event of the same type?

yinan1048576 · 2020-04-17T18:00:41Z

Added a couple of more commits. This PR should be in its final shape now.

The first new commit pulls the event handler logic into their constituent modules, so the thread event handler module only contains thread event specific content. It's possible that we could further do some restructuring tricks, but things are now in a sufficiently satisfactory shape to me.

The next few commits ensure proper counter initialization in all cases. Without these commits, we previous failed to initialize the counters when a new thread deallocates before making any allocation. We were lucky that the deallocation path only had one event, the tcache GC event, which does no harm; otherwise every single deallocation event would be triggered on the very first deallocation call.

My current approach is to initialize both the allocation counters and the deallocation counters in TSD full init, and only initialize the deallocation counters in the TSD minimal init.

Alternatively, I could also avoid calling any counter initialization at all in the TSD init functions, and instead put it right before the counters are changed (in te_event_advance()). However, I realized that this would invalidate the look-ahead calls - they may be seeing a completely uninitialized set of counters.

A few other minor changes / benefits:

I also divided the assertions into the allocation related and deallocation related parts - they are conceptually distinct.
The counter fields in TSD no longer need to be properly initialized statically. The backward TSD -> thread event dependency can finally be removed.

yinan1048576 · 2020-04-17T20:06:46Z

It seems getting rid of the TSD static initializer for the counters didn't work for the background thread. Let me look into it...

yinan1048576 · 2020-04-17T21:19:41Z

There's something that I still don't understand, but the background thread can be in reincarnated state and it allocates without initializing the counters.

I end up just initializing both the allocation counters and the deallocation counters in both TSC full init and TSD minimal init. The counter init is so cheap anyway. Also got rid of the other commits trying to divide allocation and deallocation.

yinan1048576 · 2020-04-18T00:58:30Z

Without these commits, we previous failed to initialize the counters when a new thread deallocates before making any allocation. We were lucky that the deallocation path only had one event, the tcache GC event, which does no harm; otherwise every single deallocation event would be triggered on the very first deallocation call.

Realized that it was not entirely right: the TSD minimal init labels reentrancy on the TSD, so the events would not be triggered. However, there does exist downside, perhaps a more serious one: the thread event logic would keep postponing the event to the next deallocation call, ending up routing all subsequent calls to the slow path (until an allocation call comes).

davidtgoldblatt

Stamping because I think this is an improvement relative to the status quo, but I think this has left me even more convinced that we ought to push more of the invocation logic into the callers of this module and have some sort of multi-ticker abstraction. This feels very event-loop-y to me in the sense that I have a very hard time tracking the logic of what gets invoked when and why.

yinan1048576 · 2020-04-28T00:02:00Z

Rebase on top of #1819 and simplify.

yinan1048576 · 2020-05-11T19:30:28Z

Rebase.

yinan1048576 · 2020-05-11T21:44:02Z

Figured that a7c27fd is not entirely correct: to be more rigorous the prng seed needs to be initialized before tsd_te_init() is called. Fixed it.

yinan1048576 force-pushed the prof-tdata-reinit branch from 77ede4e to 8c08340 Compare March 25, 2020 17:18

yinan1048576 changed the title ~~Do not reset sample wait time when re-initing tdata~~ Restructure thread event handling Mar 25, 2020

yinan1048576 force-pushed the prof-tdata-reinit branch from 8c08340 to c32a253 Compare March 25, 2020 23:31

davidtgoldblatt reviewed Apr 14, 2020

View reviewed changes

yinan1048576 changed the title ~~Restructure thread event handling~~ [In progress] Restructure thread event handling Apr 14, 2020

yinan1048576 force-pushed the prof-tdata-reinit branch from c32a253 to 60bf461 Compare April 16, 2020 20:54

yinan1048576 changed the title ~~[In progress] Restructure thread event handling~~ Restructure thread event handling Apr 16, 2020

yinan1048576 force-pushed the prof-tdata-reinit branch from 60bf461 to 7f10580 Compare April 16, 2020 22:45

yinan1048576 changed the title ~~Restructure thread event handling~~ [In progress] Restructure thread event handling Apr 17, 2020

yinan1048576 force-pushed the prof-tdata-reinit branch from 0fef298 to 2f34ff6 Compare April 17, 2020 21:19

yinan1048576 changed the title ~~[In progress] Restructure thread event handling~~ Restructure thread event handling Apr 17, 2020

davidtgoldblatt approved these changes Apr 23, 2020

View reviewed changes

yinan1048576 force-pushed the prof-tdata-reinit branch from 2f34ff6 to df98954 Compare April 28, 2020 00:02

yinan1048576 added 3 commits May 11, 2020 12:29

Remove outdated comments in thread event

27c8bb5

Do not reset sample wait time when re-initing tdata

bacfe8c

Extract out per event new wait time fetching

0555898

yinan1048576 added 6 commits May 11, 2020 12:29

Move thread event wait time update to local

65acce1

Break down event init and handler functions

4deeb31

Only compute thread event threshold once per trigger

a9ab60d

Extract out per event postponed wait time fetching

f310760

Treat postponed prof sample event as new event

6d6bcfc

Push event handlers to constituent modules

b585654

yinan1048576 force-pushed the prof-tdata-reinit branch from df98954 to c1c237d Compare May 11, 2020 19:30

yinan1048576 added 2 commits May 11, 2020 14:39

Always initialize TE counters in TSD init

fecabf4

Get rid of TSD -> thread event dependency

0512336

yinan1048576 force-pushed the prof-tdata-reinit branch from c1c237d to 0512336 Compare May 11, 2020 21:44

yinan1048576 merged commit dcea2c0 into jemalloc:dev May 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restructure thread event handling #1796

Restructure thread event handling #1796

yinan1048576 commented Mar 23, 2020

yinan1048576 commented Mar 24, 2020

yinan1048576 commented Mar 25, 2020

yinan1048576 commented Mar 25, 2020

yinan1048576 commented Mar 25, 2020

davidtgoldblatt left a comment

davidtgoldblatt Apr 13, 2020

yinan1048576 Apr 15, 2020

davidtgoldblatt Apr 16, 2020

yinan1048576 commented Apr 14, 2020

yinan1048576 commented Apr 16, 2020

yinan1048576 commented Apr 16, 2020

yinan1048576 commented Apr 16, 2020

yinan1048576 commented Apr 16, 2020 •

edited

yinan1048576 commented Apr 17, 2020

yinan1048576 commented Apr 17, 2020

yinan1048576 commented Apr 17, 2020

yinan1048576 commented Apr 18, 2020

davidtgoldblatt left a comment

yinan1048576 commented Apr 28, 2020

yinan1048576 commented May 11, 2020

yinan1048576 commented May 11, 2020

Restructure thread event handling #1796

Restructure thread event handling #1796

Conversation

yinan1048576 commented Mar 23, 2020

yinan1048576 commented Mar 24, 2020

yinan1048576 commented Mar 25, 2020

yinan1048576 commented Mar 25, 2020

yinan1048576 commented Mar 25, 2020

davidtgoldblatt left a comment

Choose a reason for hiding this comment

davidtgoldblatt Apr 13, 2020

Choose a reason for hiding this comment

yinan1048576 Apr 15, 2020

Choose a reason for hiding this comment

davidtgoldblatt Apr 16, 2020

Choose a reason for hiding this comment

yinan1048576 commented Apr 14, 2020

yinan1048576 commented Apr 16, 2020

yinan1048576 commented Apr 16, 2020

yinan1048576 commented Apr 16, 2020

yinan1048576 commented Apr 16, 2020 • edited

yinan1048576 commented Apr 17, 2020

yinan1048576 commented Apr 17, 2020

yinan1048576 commented Apr 17, 2020

yinan1048576 commented Apr 18, 2020

davidtgoldblatt left a comment

Choose a reason for hiding this comment

yinan1048576 commented Apr 28, 2020

yinan1048576 commented May 11, 2020

yinan1048576 commented May 11, 2020

yinan1048576 commented Apr 16, 2020 •

edited