Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix `kubectl get` watches #74717

Closed
wants to merge 4 commits into from

Conversation

@tnozicka
Copy link
Contributor

commented Feb 28, 2019

What type of PR is this?
/kind bug

What this PR does / why we need it:
Watches need to restart on WATCH call timeouts or closed connections.

Which issue(s) this PR fixes:
if you know the related issues already filled, feel free to link them

Special notes for your reviewer:
Split from #50102

Requires:

/hold

Does this PR introduce a user-facing change?:

`kubectl get --watch` now handles timeouts and closed connections

@kubernetes/sig-cli-bugs

// will return an initial watch event. Starting form ~now, rather
// the rv of the object will insure that we start the watch from
// inside the watch window, which the rv of the object might not be.
rv := "0"

This comment has been minimized.

Copy link
@tnozicka

tnozicka Feb 28, 2019

Author Contributor

fyi @lavalamp @wojtek-t @liggitt one RV "0" down

This comment has been minimized.

Copy link
@tnozicka

tnozicka Feb 28, 2019

Author Contributor

for others we should never use watch calls from RV "0", see e.g. #74022

@k8s-ci-robot k8s-ci-robot requested review from eparis and smarterclayton Feb 28, 2019
@tnozicka tnozicka force-pushed the tnozicka:fix-kubectl-get-watches branch from 8b28a68 to 5f5f1ee Mar 5, 2019
@tnozicka tnozicka force-pushed the tnozicka:fix-kubectl-get-watches branch from 5f5f1ee to ffbac76 Mar 5, 2019
@@ -759,8 +772,8 @@ func TestGetMixedGenericObjects(t *testing.T) {
case "/namespaces/test/pods":
return &http.Response{StatusCode: 200, Header: cmdtesting.DefaultHeader(), Body: cmdtesting.ObjBody(codec, structuredObj)}, nil
default:
t.Fatalf("request url: %#v,and request: %#v", req.URL, req)

This comment has been minimized.

Copy link
@tnozicka

tnozicka Mar 5, 2019

Author Contributor

t.Fataf is not threadsafe

@tnozicka tnozicka force-pushed the tnozicka:fix-kubectl-get-watches branch from ffbac76 to ab084a3 Mar 5, 2019
@tnozicka tnozicka changed the title [WIP] - Fix `kubectl get` watches Fix `kubectl get` watches Mar 5, 2019
@tnozicka

This comment has been minimized.

Copy link
Contributor Author

commented Mar 5, 2019

/retest

@tnozicka tnozicka force-pushed the tnozicka:fix-kubectl-get-watches branch from ab084a3 to 9c9263e Mar 5, 2019
@tnozicka tnozicka force-pushed the tnozicka:fix-kubectl-get-watches branch from 9c9263e to 332d1af Mar 5, 2019
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Mar 5, 2019

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tnozicka
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: smarterclayton

If they are not already assigned, you can assign the PR to them by writing /assign @smarterclayton in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tnozicka tnozicka force-pushed the tnozicka:fix-kubectl-get-watches branch 2 times, most recently from 31a7582 to fa99f26 Mar 5, 2019
@tnozicka

This comment has been minimized.

Copy link
Contributor Author

commented Mar 6, 2019

/retest

@tnozicka tnozicka force-pushed the tnozicka:fix-kubectl-get-watches branch 2 times, most recently from 553d751 to 08c7cc1 Mar 6, 2019
Copy link
Contributor

left a comment

/hold
I don't feel too comfortable landing this right before code freeze. I need some more time to think through some of the changes and hear back the answers to my questions.

@@ -48,7 +47,6 @@ func main() {
defer logs.FlushLogs()

if err := command.Execute(); err != nil {
fmt.Fprintf(os.Stderr, "%v\n", err)

This comment has been minimized.

Copy link
@soltysh

soltysh Mar 7, 2019

Contributor

Don't change this, this should stay as is.

This comment has been minimized.

Copy link
@tnozicka

tnozicka Mar 7, 2019

Author Contributor

RunE takes precedence and those changes are needed to make it behave the same way as it did

},
SuggestFor: []string{"list", "ps"},
RunE: func(cmd *cobra.Command, args []string) error {

This comment has been minimized.

Copy link
@soltysh

soltysh Mar 7, 2019

Contributor

Why this change, what's wrong with the previous approach?

This comment has been minimized.

Copy link
@tnozicka

tnozicka Mar 7, 2019

Author Contributor

unit tests can't run Run otherwise it calls the panic handler at the end. Given watch unit tests need to run Run in another thread and it doesn't support cancellation calling the error handler means panic

This comment has been minimized.

Copy link
@soltysh

soltysh Mar 7, 2019

Contributor

We should factor out parts of Run you need to unit test and test that. I'd rather leave the main parts as they were.

This comment has been minimized.

Copy link
@tnozicka

tnozicka Mar 8, 2019

Author Contributor

It is even testing setting the flags and all the other unit tests test the whole method so this would essentially mean creating and identical function and make the real function a dummy wrapper.

Shouldn't the whole kubectl move to RunE eventually? This seems like a first step in backwards compatible way.

@@ -1185,35 +1203,67 @@ func TestWatchLabelSelector(t *testing.T) {
NegotiatedSerializer: resource.UnstructuredPlusDefaultContentConfig().NegotiatedSerializer,
Client: fake.CreateHTTPClient(func(req *http.Request) (*http.Response, error) {
if req.URL.Query().Get(metav1.LabelSelectorQueryParam("v1")) != "a=b" {
t.Fatalf("request url: %#v,and request: %#v", req.URL, req)
t.Errorf("request url: %#v,and request: %#v", req.URL, req)

This comment has been minimized.

Copy link
@soltysh

soltysh Mar 7, 2019

Contributor

You're missing return if you're switching to t.Errorf

This comment has been minimized.

Copy link
@tnozicka

tnozicka Mar 7, 2019

Author Contributor

Isn't it caller's check that fails, not fake's failure? It will fail the test still.

@@ -1236,35 +1286,66 @@ func TestWatchFieldSelector(t *testing.T) {
NegotiatedSerializer: resource.UnstructuredPlusDefaultContentConfig().NegotiatedSerializer,
Client: fake.CreateHTTPClient(func(req *http.Request) (*http.Response, error) {
if req.URL.Query().Get(metav1.FieldSelectorQueryParam("v1")) != "a=b" {
t.Fatalf("unexpected request: %#v\n%#v", req.URL, req)
t.Errorf("unexpected request: %#v\n%#v", req.URL, req)

This comment has been minimized.

Copy link
@soltysh

soltysh Mar 7, 2019

Contributor

Missing return

cmd := NewCmdGet("kubectl", tf, streams)
cmd.SetOutput(buf)

cmd.Flags().Set("watch", "true")
cmd.Flags().Set("filename", "../../../../test/e2e/testing-manifests/statefulset/cassandra/controller.yaml")

This comment has been minimized.

Copy link
@soltysh

soltysh Mar 7, 2019

Contributor

Why you're changing test contents? I'd like to see tests verifying different types of objects, not just pods.

This comment has been minimized.

Copy link
@tnozicka

tnozicka Mar 7, 2019

Author Contributor

Returning Pods for .../namespaces/test/replicationcontrollers is a) wrong b) breaks the proper fakes actually checking GVK

This comment has been minimized.

Copy link
@tnozicka

tnozicka Mar 7, 2019

Author Contributor

It took me like and hour to fix the actual watch code and 2 days fixing the unit (and other) tests because of these things

default:
t.Fatalf("request url: %#v,and request: %#v", req.URL, req)
t.Errorf("request url: %#v,and request: %#v", req.URL, req)
return nil, nil

This comment has been minimized.

Copy link
@soltysh

soltysh Mar 7, 2019

Contributor

After change to t.Errorf you should return error, no?

This comment has been minimized.

Copy link
@tnozicka

tnozicka Mar 7, 2019

Author Contributor

here I should because the fake doesn't understand the request

)

func init() {
// Enable klog which is used in dependencies

This comment has been minimized.

Copy link
@soltysh

soltysh Mar 7, 2019

Contributor

Which test needs it, why not init in that particular one?

This comment has been minimized.

Copy link
@tnozicka

tnozicka Mar 7, 2019

Author Contributor
  • any test needs it so you can trace the events like go test -race ./pkg/kubectl/cmd/get -v -run='TestWatchOnlyList' -timeout=3s -args -logtostderr=true -v=9
  • setting a global var from the particular test function seems like a race, given they run in parallel

This comment has been minimized.

Copy link
@tnozicka

tnozicka Mar 7, 2019

Author Contributor

it use to be the default with glog, not with klog

@@ -292,7 +292,7 @@ run_non_native_resource_tests() {
# Start watcher in background with process substitution,
# so we can read from stdout asynchronously.
kube::log::status "Testing CustomResource watching"
exec 3< <(kubectl "${kube_flags[@]}" get bars --request-timeout=1m --watch-only -o name & echo $! ; wait)

This comment has been minimized.

Copy link
@soltysh

soltysh Mar 7, 2019

Contributor

Hmm.... this worries me, how is request-timeout handled with the new watches of yours? This is a significant API change.

This comment has been minimized.

Copy link
@soltysh

soltysh Mar 7, 2019

Contributor

request-timeout should be orthogonal to your timeout they both should work just fine and you should be able to pass both.

This comment has been minimized.

Copy link
@tnozicka

tnozicka Mar 7, 2019

Author Contributor

I'd not be worried. I don't think is api-change.

This test case is testing that kubectl get is broken, fixing it seems like desirable bug fix.

--request-timeout means The length of time to wait before giving up on a single server request.

Watching is not a single request process. Ending a WATCH call should have never ended up watching for resource changes. It will still timeout the single watch calls as it did before but now we can recover.

This comment has been minimized.

Copy link
@soltysh

soltysh Mar 7, 2019

Contributor

Have you verified behavior before and with your fix?

This comment has been minimized.

Copy link
@tnozicka

tnozicka Mar 8, 2019

Author Contributor

I did. Failed before because there was a bug, now it doesn't.

(I am not saying the behaviour doesn't differ, jut that changing a behaviour from buggy->working isn't api change.)

@tnozicka tnozicka force-pushed the tnozicka:fix-kubectl-get-watches branch from 08c7cc1 to ed6b28e Mar 12, 2019
@k8s-ci-robot k8s-ci-robot added area/test and removed needs-rebase labels Mar 12, 2019
tnozicka added 2 commits Feb 28, 2019
@tnozicka tnozicka force-pushed the tnozicka:fix-kubectl-get-watches branch from ed6b28e to 88fd9da Mar 12, 2019
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Apr 10, 2019

@tnozicka: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@fejta-bot

This comment has been minimized.

Copy link

commented Jul 9, 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@alex-slynko

This comment has been minimized.

Copy link

commented Jul 30, 2019

Hi @tnozicka
Is there any progress on fixing watches?

@fejta-bot

This comment has been minimized.

Copy link

commented Aug 29, 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@tnozicka

This comment has been minimized.

Copy link
Contributor Author

commented Sep 6, 2019

Is there any progress on fixing watches?

@alex-slynko I plan to get it rebased in few days and sync with @soltysh on the comments

// ThreadsafeBuffer is a goroutine safe bytes.Buffer
type ThreadsafeBuffer struct {
buffer bytes.Buffer
mutex sync.Mutex

This comment has been minimized.

Copy link
@cwdsuzhou

cwdsuzhou Sep 8, 2019

Member

RWMutex may be better

@fejta-bot

This comment has been minimized.

Copy link

commented Oct 8, 2019

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Oct 8, 2019

@fejta-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.