Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unifying error handling within core #6352

Merged
merged 206 commits into from
Jun 18, 2016
Merged

Unifying error handling within core #6352

merged 206 commits into from
Jun 18, 2016

Conversation

ctiller
Copy link
Member

@ctiller ctiller commented Apr 28, 2016

Fixes #49
Built on #6911, #6949

CC @a11r @dklempner @dgquintas @makdharma @markdroth @nicolasnoble @sreecha @vjpai @yang-g @y-zeng

Folks, I'd like some considered feedback on this, and potentially a volunteer or two.

We have a real problem in core right now in that it's hard to diagnose faults.
This is not pretending to be a complete solution, but a building block towards that.

The big idea is to define some exception type grpc_error, that can be propagated up the stack and (importantly) combined with other grpc_error objects to form a trace of events that prevented some operation from occurring.

From there we can (for instance) pretty-print some trace information into trailing metadata received on the client, or log actually interesting data to the console - with it all bound together.

I've sketched out an initial interface to grpc_error, and converted socket utils, tcp_client_posix and tcp_posix over to using it.

I've also proposed changes to closure_list and exec_ctx in this PR: the single success bit would transform into a grpc_error object, with a special GRPC_ERROR_NONE value for success.

What remains: Before going to far, I'd like to get feedback on how this looks to people (too hard to use? doesn't capture everything we'd need? performance concerns?)

I'd also like to identify a volunteer or two to help start converting the codebase over one module at a time. I suspect for an approach like this to work we need to go all the way with it.

This would be an excellent project for someone that wants to start piecing together the entire codebase in their head.

@ctiller
Copy link
Member Author

ctiller commented Apr 29, 2016

Things that are likely missing:

  • capturing file/line information in grpc_error_create() and similar functions
  • capturing timestamps when errors are thrown (I'd like to be able to say that this RPC failed because this channel failed to connect 10 seconds ago)

@jboeuf
Copy link
Contributor

jboeuf commented Jun 16, 2016

Looks like some unrelated stuff were added to this PR such as a timeout in the handshake. I don't think that it is such a big deal but on principle, it would be better not to add unrelated things (unless I'm mistaken) to a PR that is already huge.

@ctiller
Copy link
Member Author

ctiller commented Jun 16, 2016

Timeout on handshake actually addresses a bug that showed up testing this.
We're still not done too, but the final problems ought to be able to be
tackled separately.

On Thu, Jun 16, 2016, 6:18 AM jboeuf notifications@github.com wrote:

Looks like some unrelated stuff were added to this PR such as a timeout in
the handshake. I don't think that it is such a big deal but on principle,
it would be better not to add unrelated things (unless I'm mistaken) to a
PR that is already huge.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#6352 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AJpudcdLBMc3i22eDquiZAqGi7hJYaVSks5qMU0tgaJpZM4ISWXL
.

@dgquintas
Copy link
Contributor

@jcanizales
Copy link
Contributor

@ctiller for changes that modify interfaces used by the code that's duplicated internally, please file a bug in the "import pony saddlebag" per go/grpc-import-pony. Otherwise the import gets delayed unnecessarily.

Is @bogdandrutu active on GitHub?

@lock lock bot locked as resolved and limited conversation to collaborators Jan 27, 2019
@lock lock bot unassigned dgquintas and sreecha Jan 27, 2019
@thelinuxfoundation
Copy link

Thank you for your pull request. Before we can look at your contribution, we need to ensure all contributors are covered by a Contributor License Agreement.

After the following items are addressed, please respond with a new comment here, and the automated system will re-verify.

Regards,
CLA GitHub bot

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.