Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement callback CQ at C++ layer using shared thread pool if non-background iomgr #25169

Merged
merged 1 commit into from
Feb 17, 2021

Conversation

vjpai
Copy link
Member

@vjpai vjpai commented Jan 14, 2021

This PR enables the C++ callback API without a background iomgr by wrapping a ref-counted nextable CQ and having multiple threads polling it. Functionally correct, no promises re performance at the current time.

@vjpai vjpai added lang/c++ release notes: no Indicates if PR should not be in release notes labels Jan 14, 2021
@vjpai vjpai changed the title Implemnent callback CQ at C++ layer using shared thread pool if needed Implement callback CQ at C++ layer using shared thread pool if needed Jan 15, 2021
@vjpai vjpai changed the title Implement callback CQ at C++ layer using shared thread pool if needed Implement callback CQ at C++ layer using shared thread pool if non-background iomgr Feb 12, 2021
@vjpai
Copy link
Member Author

vjpai commented Feb 12, 2021

This is now ready for review! Passes 1.2 million client_callback_end2end_tests without flakes

@vjpai vjpai marked this pull request as ready for review February 12, 2021 23:18
@vjpai vjpai requested review from yang-g and ananda1066 and removed request for yang-g February 12, 2021 23:18
Copy link
Contributor

@ananda1066 ananda1066 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments, looks good to me overall

src/cpp/common/completion_queue_cc.cc Show resolved Hide resolved
src/cpp/common/completion_queue_cc.cc Outdated Show resolved Hide resolved
src/cpp/common/completion_queue_cc.cc Outdated Show resolved Hide resolved
src/cpp/common/completion_queue_cc.cc Show resolved Hide resolved
Copy link
Member Author

@vjpai vjpai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the review! I've addressed all your comments.

Copy link
Contributor

@ananda1066 ananda1066 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good!

@vjpai
Copy link
Member Author

vjpai commented Feb 17, 2021

Thanks for the review! I'm going to squash and merge and then work on performance and other designs in subsequent PRs.

@vjpai vjpai moved this from In progress to Done in C++ callback API Feb 25, 2021
// Use the raw Core next function rather than the C++ Next since
// Next incorporates FinalizeResult and we actually want that
// called from the callback functor itself.
// TODO(vjpai): Migrate below to next without a timeout or idle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @vjpai. I know you aren't at G anymore, but do you have some more context on this comment? I would like to remove the idle phase, but I don't know what you were running into that caused you to leave this comment here. Any pointers at all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the TODO is still addressed to me, so I don't mind the q. The context is that my goal was to get something working even it didn't have awesome performance because G's internal performance doesn't depend on this code (uses the EM) and I knew that this section of code would be removed soon once @drfloob et al get the EventEngine fully integrated (an OSS EM in some ways). That was with a pretty strong defn of working fwiw - @jacobsa created this great tool caled blase which runs a test 100K times in each of 12 operating modes, and I didn't consider this working until it passed that for client_callback_end2end_test and end2end_test without flakes, leaks, or races. So at some points I would see a rare (like 2/1.2M) flake caused by a 10s timeout in the resolver which indicated that some backup polling was getting kicked off. So I put in this timeout to try to avoid that, but then lock contention got noticeable, so I also added an idle phase. I also tried at least a few dozen different attempts to try to improve the performance (tuning the lengths of these phases, trying to go in at a deeper level below the core CQ, trying to remove this timeout altogether, etc), but none of them really proved effective. And then I didn't put much more into it because we had decided that performance wouldn't gate the de-experimentalization and again because I knew this code was hopefully short-lived anyway. HTH!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang/c++ release notes: no Indicates if PR should not be in release notes
Projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants