Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd3/watcher: cancelling context shouldn't return error #24638

Merged
merged 1 commit into from
Apr 22, 2016

Conversation

hongchaodeng
Copy link
Contributor

@hongchaodeng hongchaodeng commented Apr 21, 2016

Fixes #24528

@hongchaodeng
Copy link
Contributor Author

This is blocking other people running PR tests. Please prioritize it.

@xiang90
Copy link
Contributor

xiang90 commented Apr 21, 2016

LGTM

@xiang90 xiang90 added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Apr 21, 2016
@k8s-github-robot k8s-github-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. release-note-label-needed labels Apr 21, 2016
@gmarek gmarek added lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-label-needed labels Apr 21, 2016
@xiang90 xiang90 assigned gmarek and unassigned dchen1107 Apr 21, 2016
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 22, 2016
@hongchaodeng
Copy link
Contributor Author

hongchaodeng commented Apr 22, 2016

Updated the test in the PR.
The reason is to reproduce the failure without the fix:

  • The previous change is probabilistic to fail.
  • Well, actually the new test is also probabilistic by itself, but more probable.
    The probability happens in
    select {
    case wc.errChan <- err:
    case <-wc.ctx.Done():
    }

But in the new test I can repro the failure more easily (without the fix, of course).

@xiang90
Copy link
Contributor

xiang90 commented Apr 22, 2016

@hongchaodeng Can we make the test itself reliable?

@k8s-bot
Copy link

k8s-bot commented Apr 22, 2016

GCE e2e build/test passed for commit a77ddadc1c0ba3347af2f5bde20c6140517cb25a.

@k8s-bot
Copy link

k8s-bot commented Apr 22, 2016

GCE e2e build/test passed for commit dcc3cb0773b99c2ead2d12d68c2187fc5009c70f.

@hongchaodeng
Copy link
Contributor Author

Seems like context.Canceled isn't the expected error. gRPC wraps it:

https://github.com/grpc/grpc-go/blob/61e92eacc3bc5ec3b227ed739d691cfb5034b8a4/rpc_util.go#L319-L327

@hongchaodeng
Copy link
Contributor Author

@xiang90
We can't get the code unless we parse the string, which is an internal impl.

@@ -276,6 +281,11 @@ func parseError(err error) *watch.Event {
}

func (wc *watchChan) sendError(err error) {
// Context.canceled is an expected behavior.
// We should just stop all goroutines in watchChan without returning error.
if err == context.Canceled {
Copy link
Contributor

@xiang90 xiang90 Apr 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably

  1. type assert on grpc error and then compare the code.
  2. if not grpc, compare with context.Canceled.

I think this is kind of etcd/clientv3 issue. We should not leak gRPC error. But this is P1 for k8s, so let's fix this in k8s for now as a work around.

/cc @heyitsanthony

@hongchaodeng
Copy link
Contributor Author

Ah.. I think we can use grpc.Code(err)

@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 22, 2016
// Context.canceled is an expected behavior.
// We should just stop all goroutines in watchChan without returning error.
// TODO: etcd client should return context.Canceled instead of grpc specific error.
if grpc.Code(err) == codes.Canceled {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if grpc.Code(err) == codes.Canceled || err == context.Canceled

@k8s-bot
Copy link

k8s-bot commented Apr 22, 2016

GCE e2e build/test passed for commit b0eddb8e835f7d7bdcf4a2d5edf5c1f7e8b53c91.

@k8s-bot
Copy link

k8s-bot commented Apr 22, 2016

GCE e2e build/test passed for commit 4dcb017f46bc6f73e6cdaee6fef6908882745c97.

@xiang90 xiang90 added lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-merge labels Apr 22, 2016
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 22, 2016
@k8s-bot
Copy link

k8s-bot commented Apr 22, 2016

GCE e2e build/test passed for commit 4dcb017f46bc6f73e6cdaee6fef6908882745c97.

@k8s-bot
Copy link

k8s-bot commented Apr 22, 2016

GCE e2e build/test passed for commit b0f4517.

// Context.canceled is an expected behavior.
// We should just stop all goroutines in watchChan without returning error.
// TODO: etcd client should return context.Canceled instead of grpc specific error.
if grpc.Code(err) == codes.Canceled || err == context.Canceled {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need grpc dependency?
We are still not using grpc anywhere in kubernetes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment.

Currently etcd client returns grpc error. It shouldn't though. We will fix it later by return "context.Canceled"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment explains the problem (which I understand) not why you are using grpc.

Currently etcd client returns grpc error. It shouldn't though. We will fix it later by return "context.Canceled"

ok - that makes sense.

@wojtek-t wojtek-t assigned wojtek-t and unassigned gmarek Apr 22, 2016
@hongchaodeng
Copy link
Contributor Author

@wojtek-t @gmarek
Can you help take a look and re-apply 'lgtm'? Also re-trigger the Travis :P

@wojtek-t wojtek-t added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 22, 2016
@wojtek-t
Copy link
Member

LGTM

Travis seems to be broken today. I'm not sure we can do much about it.

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented Apr 22, 2016

GCE e2e build/test passed for commit b0f4517.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants