Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grpc provides uninformative error messages, even when set to "block" #3406

Closed
sethp-nr opened this issue Feb 28, 2020 · 5 comments
Closed

grpc provides uninformative error messages, even when set to "block" #3406

sethp-nr opened this issue Feb 28, 2020 · 5 comments

Comments

@sethp-nr
Copy link
Contributor

What version of gRPC are you using?

v1.23.1, but am open to other versions

What version of Go are you using (go version)?

go version go1.13.5 darwin/amd64

What operating system (Linux, Windows, …) and version?

MacOS and Linux

What did you do?

I misconfigured my client to expect a certificate that matched in content but not in key with the CA that signed my server's certificate. The error I got back from grpc-go was:

context deadline exceeded

What did you expect to see?

When I modified the very end of DialContext to return any connection errors that occurred:

// A blocking dial blocks until the clientConn is ready.
if cc.dopts.block {
	for {
		s := cc.GetState()
		if s == connectivity.Ready {
			break
		} else if cc.dopts.copts.FailOnNonTempDialError && s == connectivity.TransientFailure {
			if err = cc.blockingpicker.connectionError(); err != nil {
				terr, ok := err.(interface {
					Temporary() bool
				})
				if ok && !terr.Temporary() {
					return nil, err
				}
			}
		}
		if err = cc.blockingpicker.connectionError(); err != nil {
			return nil, err
		}

		if !cc.WaitForStateChange(ctx, s) {
			// ctx got timeout or canceled.
			if err = cc.blockingpicker.connectionError(); err != nil {
				return nil, err
			}
			return nil, ctx.Err()
		}
	}
}

I got the much more helpful error message:

connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")"

But, that breaks retries. Is there some other way that I could be retrieving the underlying connection error after my timeout occurs?

@sethp-nr
Copy link
Contributor Author

Related to: kubernetes-sigs/cluster-api#2454

@easwars
Copy link
Contributor

easwars commented Mar 2, 2020

Couple of questions:

  1. If you actually did a non-blocking dial and tried an RPC (with and without failfast), does the RPC failure surface the underlying handhsake error?
  2. What do you mean by But, that breaks retries?

@sethp-nr
Copy link
Contributor Author

sethp-nr commented Mar 6, 2020

  1. No, it does not. I see a log message with the underlying error, but the actual error I'm returned is context deadline exceeded (of type context.deadlineExceededError)

  2. In the sample code I was returning eagerly, as soon as any connection error occurred. That turned out to be necessary because there was a defer earlier in the function that would overwrite any returned error with the context deadline exceeded message if the dial timeout expired before returning. In my PR client: surface connection errors to callers #3412 I addressed that earlier defer so that it's possible to retry up to the timeout, and then return the last connection error.

@sethp-nr
Copy link
Contributor Author

sethp-nr commented Mar 6, 2020

For what it's worth – I'm currently testing with a lot of moving parts, but I see pretty similar behavior with

ctx, cancel := context.WithTimeout(context.Background(), 1*time.Second)
defer cancel()
grpc.DialContext(ctx, "", grpc.WithInsecure(), grpc.WithContextDialer(func(_ context.Context, _ string) (net.Conn, error) {
    return nil, errors.New("hello")
}))

@dfawley
Copy link
Member

dfawley commented Mar 6, 2020

This is essentially a duplicate of #2031.

Please read this comment: #2031 (comment), and let's continue further discussion in that issue. Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants