-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement callback CQ at C++ layer using shared thread pool if non-background iomgr #25169
Conversation
c43f110
to
97bfef8
Compare
a5ecca2
to
846199e
Compare
846199e
to
b22b0f7
Compare
This is now ready for review! Passes 1.2 million client_callback_end2end_tests without flakes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments, looks good to me overall
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for the review! I've addressed all your comments.
eb644a2
to
8e04628
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good!
Thanks for the review! I'm going to squash and merge and then work on performance and other designs in subsequent PRs. |
8e04628
to
dba8bb5
Compare
// Use the raw Core next function rather than the C++ Next since | ||
// Next incorporates FinalizeResult and we actually want that | ||
// called from the callback functor itself. | ||
// TODO(vjpai): Migrate below to next without a timeout or idle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @vjpai. I know you aren't at G anymore, but do you have some more context on this comment? I would like to remove the idle phase, but I don't know what you were running into that caused you to leave this comment here. Any pointers at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the TODO
is still addressed to me, so I don't mind the q. The context is that my goal was to get something working even it didn't have awesome performance because G's internal performance doesn't depend on this code (uses the EM) and I knew that this section of code would be removed soon once @drfloob et al get the EventEngine fully integrated (an OSS EM in some ways). That was with a pretty strong defn of working fwiw - @jacobsa created this great tool caled blase
which runs a test 100K times in each of 12 operating modes, and I didn't consider this working until it passed that for client_callback_end2end_test
and end2end_test
without flakes, leaks, or races. So at some points I would see a rare (like 2/1.2M) flake caused by a 10s timeout in the resolver which indicated that some backup polling was getting kicked off. So I put in this timeout to try to avoid that, but then lock contention got noticeable, so I also added an idle phase. I also tried at least a few dozen different attempts to try to improve the performance (tuning the lengths of these phases, trying to go in at a deeper level below the core CQ, trying to remove this timeout altogether, etc), but none of them really proved effective. And then I didn't put much more into it because we had decided that performance wouldn't gate the de-experimentalization and again because I knew this code was hopefully short-lived anyway. HTH!
This PR enables the C++ callback API without a background iomgr by wrapping a ref-counted nextable CQ and having multiple threads polling it. Functionally correct, no promises re performance at the current time.