Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure thread event handling #1796

Merged
merged 11 commits into from
May 12, 2020

Conversation

yinan1048576
Copy link
Contributor

On a logical level, the sample wait time belongs to the thread event module, and should not be determined according to the state of the prof tdata. It is a "prof -> thread event" dependency that we should remove. The tdata holds the temporary storage space for the backtracing result, so we have to have a valid tdata when we determine that we should sample. This is their only relationship, and it's an "allocation caller -> thread event lookahead" dependency followed by an "allocation caller -> tdata" dependency, and does not involve any "prof -> thread event" dependency.

However, this dependency was present before the refactoring of #1779, so it could either be truly unnecessary, or it tried to solve some edge cases I failed to consider. Let's see if any tests fail...

@yinan1048576
Copy link
Contributor Author

Great. No test failed, another evidence for the correctness, in addition to the logical reasoning. Let me write some more commits to do a larger scale rewriting for the thread event module...

@yinan1048576
Copy link
Contributor Author

Restructured the thread event handling logic.

The key change is the last commit. On a high level, the event update logic is now completely internal to the thread event module, and it's only executed at event handling time. Therefore I can "unzip" the event handling logic -

  • Previously, we iterate over all events, where for each event we first update the wait time, then recompute the global threshold, and finally trigger the event; now we first update the wait time for all events, then recompute the global threshold, and then trigger the event for all events.
  • The main effect is: now we're able to finalize all the counters before any actual events are triggered. Event triggering has chances of reentrancy, and that was the reason why we had to deal with the possibility of the allocation counter having already jumped above the threshold at the time of event time update (via the delay_event trick).
  • A by-product of the "unzipping": we only recompute the global threshold once per trigger, rather than once per event.

@yinan1048576 yinan1048576 changed the title Do not reset sample wait time when re-initing tdata Restructure thread event handling Mar 25, 2020
@yinan1048576
Copy link
Contributor Author

Slightly improved the last commit, plus stacked two more commits.

Of the two new commits, the first is a slight refactoring and the second is the real change: for prof sample event, whenever we want to postpone (meaning tsd_nominal(tsd) && tsd_reentrancy_level_get(tsd) == 0 failed), instead of always sampling the immediate next allocation, we draw a fresh new wait time. This is to avoid any sampling bias, so that we can guarantee the correctness of our solution to #1751.

@yinan1048576
Copy link
Contributor Author

Well, strictly speaking, the correctness is still not guaranteed: we're losing samples when tsd_nominal(tsd) fails, but such cases are very rare.

Copy link
Member

@davidtgoldblatt davidtgoldblatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the fundamental claim here is right (i.e. that wait times belong in thread_event and not in their constituent parts). The right division of responsibilities seems like:

  • thread_event manages the logic for determining which callbacks to invoke
  • Individual events choose what to do and when to do it
  • Some caller actually invokes the callbacks.

By analogy, the ticker_t doesn't know what event it's ticking down to; the thing tracking the ticker does. The thread_event stuff is really just a multi-ticker (i.e. it maintains some set of tickers, and counts them all down at the same rate).

static void
prof_sample_threshold_update(tsd_t *tsd) {
static uint64_t
prof_sample_new_event_wait(tsd_t *tsd) {
#ifdef JEMALLOC_PROF
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better in these sorts of things is just a cassert(config_prof) if we can get away with it.

(In general, we try to avoid #define-ing unused functionality away as much as we can, since it means that we don't find compilation breakages until we send stuff away to CI).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you meant. Were you suggesting getting rid of the #ifdef JEMALLOC_PROF and have a cassert()? I was just copying things. I thought what the comment above meant was that the #ifdef JEMALLOC_PROF was necessary?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. I didn't see the comment. That makes sense.

@yinan1048576
Copy link
Contributor Author

Yes, that makes sense. I was also slightly leaning towards this way, but I chose the easier alternative, because all events except the prof sampling event have the event timing within the thread event module. I'll do some restructuring.

@yinan1048576 yinan1048576 changed the title Restructure thread event handling [In progress] Restructure thread event handling Apr 14, 2020
@yinan1048576
Copy link
Contributor Author

Rebase, and pushed the "when" part into the modules owning each event.

Some caller actually invokes the callbacks.

The caller is still the thread event module itself. There doesn't seem to me to be an appealing reason why the thread event module should surface up which events should be triggered and let the caller trigger them. Instead of the ticker, my analogy is the buffered writer - it should just flush itself when needed instead of asking the caller to flush for it.

If we really want a neat structure, we can completely get rid of tsd_te_init(), and instead let each module register its triggering and timing callbacks in their individual init functions, but it seems we'd need to store the callback pointers in the TSD in such a case.

I haven't yet got an answer regarding your point on the cassert(config_prof) stuff. It's anyway not quite related to this PR. We can discuss more later.

@yinan1048576
Copy link
Contributor Author

(Ideally I should have also pushed the triggering functions to individual modules, but some of them are dependent on some thread event counters so I kept these triggering functions in thread event for now.)

@yinan1048576 yinan1048576 changed the title [In progress] Restructure thread event handling Restructure thread event handling Apr 16, 2020
@yinan1048576
Copy link
Contributor Author

Sorry - just removing the commit d2a19dd in the middle: figured that it would better be handled in a later stack. All other commits stay the same.

@yinan1048576
Copy link
Contributor Author

yinan1048576 commented Apr 16, 2020

I think I'm now more determined to push the triggering functions to individual modules. How about this: the function will have a signature of void event#_event_handler(tsd_t *, uint64_t), where the second parameter is the accumulated bytes from the last event of the same type?

@yinan1048576
Copy link
Contributor Author

Added a couple of more commits. This PR should be in its final shape now.

The first new commit pulls the event handler logic into their constituent modules, so the thread event handler module only contains thread event specific content. It's possible that we could further do some restructuring tricks, but things are now in a sufficiently satisfactory shape to me.

The next few commits ensure proper counter initialization in all cases. Without these commits, we previous failed to initialize the counters when a new thread deallocates before making any allocation. We were lucky that the deallocation path only had one event, the tcache GC event, which does no harm; otherwise every single deallocation event would be triggered on the very first deallocation call.

My current approach is to initialize both the allocation counters and the deallocation counters in TSD full init, and only initialize the deallocation counters in the TSD minimal init.

Alternatively, I could also avoid calling any counter initialization at all in the TSD init functions, and instead put it right before the counters are changed (in te_event_advance()). However, I realized that this would invalidate the look-ahead calls - they may be seeing a completely uninitialized set of counters.

A few other minor changes / benefits:

  • I also divided the assertions into the allocation related and deallocation related parts - they are conceptually distinct.
  • The counter fields in TSD no longer need to be properly initialized statically. The backward TSD -> thread event dependency can finally be removed.

@yinan1048576
Copy link
Contributor Author

It seems getting rid of the TSD static initializer for the counters didn't work for the background thread. Let me look into it...

@yinan1048576 yinan1048576 changed the title Restructure thread event handling [In progress] Restructure thread event handling Apr 17, 2020
@yinan1048576
Copy link
Contributor Author

There's something that I still don't understand, but the background thread can be in reincarnated state and it allocates without initializing the counters.

I end up just initializing both the allocation counters and the deallocation counters in both TSC full init and TSD minimal init. The counter init is so cheap anyway. Also got rid of the other commits trying to divide allocation and deallocation.

@yinan1048576 yinan1048576 changed the title [In progress] Restructure thread event handling Restructure thread event handling Apr 17, 2020
@yinan1048576
Copy link
Contributor Author

Without these commits, we previous failed to initialize the counters when a new thread deallocates before making any allocation. We were lucky that the deallocation path only had one event, the tcache GC event, which does no harm; otherwise every single deallocation event would be triggered on the very first deallocation call.

Realized that it was not entirely right: the TSD minimal init labels reentrancy on the TSD, so the events would not be triggered. However, there does exist downside, perhaps a more serious one: the thread event logic would keep postponing the event to the next deallocation call, ending up routing all subsequent calls to the slow path (until an allocation call comes).

Copy link
Member

@davidtgoldblatt davidtgoldblatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamping because I think this is an improvement relative to the status quo, but I think this has left me even more convinced that we ought to push more of the invocation logic into the callers of this module and have some sort of multi-ticker abstraction. This feels very event-loop-y to me in the sense that I have a very hard time tracking the logic of what gets invoked when and why.

@yinan1048576
Copy link
Contributor Author

Rebase on top of #1819 and simplify.

@yinan1048576
Copy link
Contributor Author

Rebase.

@yinan1048576
Copy link
Contributor Author

Figured that a7c27fd is not entirely correct: to be more rigorous the prng seed needs to be initialized before tsd_te_init() is called. Fixed it.

@yinan1048576 yinan1048576 merged commit dcea2c0 into jemalloc:dev May 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants