Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert the revert of #107456 - "apf: change controller to use SSA for patches" #108383

Merged
merged 1 commit into from
Mar 22, 2022

Conversation

tkashem
Copy link
Contributor

@tkashem tkashem commented Feb 28, 2022

What type of PR is this?

/kind bug

What this PR does / why we need it:

This brings back #107456 to see if it introduces flakes/timeouts in the integration tests

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 28, 2022
@tkashem
Copy link
Contributor Author

tkashem commented Feb 28, 2022

cc @MikeSpreitzer

@k8s-ci-robot k8s-ci-robot added area/test sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 28, 2022
@MadhavJivrajani
Copy link
Contributor

/retest

@tkashem
Copy link
Contributor Author

tkashem commented Feb 28, 2022

/test pull-kubernetes-integration

6 similar comments
@tkashem
Copy link
Contributor Author

tkashem commented Feb 28, 2022

/test pull-kubernetes-integration

@tkashem
Copy link
Contributor Author

tkashem commented Feb 28, 2022

/test pull-kubernetes-integration

@tkashem
Copy link
Contributor Author

tkashem commented Feb 28, 2022

/test pull-kubernetes-integration

@tkashem
Copy link
Contributor Author

tkashem commented Feb 28, 2022

/test pull-kubernetes-integration

@tkashem
Copy link
Contributor Author

tkashem commented Feb 28, 2022

/test pull-kubernetes-integration

@tkashem
Copy link
Contributor Author

tkashem commented Mar 1, 2022

/test pull-kubernetes-integration

@jpbetz
Copy link
Contributor

jpbetz commented Mar 1, 2022

/triage accepted
/cc @wojtek-t @MikeSpreitzer

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 1, 2022
@tkashem
Copy link
Contributor Author

tkashem commented Mar 1, 2022

/test pull-kubernetes-integration

@tkashem
Copy link
Contributor Author

tkashem commented Mar 2, 2022

8 tries for pull-kubernetes-integration, one failure https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/108383/pull-kubernetes-integration/1498771235025195008

/test pull-kubernetes-integration

@aojea
Copy link
Member

aojea commented Mar 2, 2022

8 tries for pull-kubernetes-integration, one failure https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/108383/pull-kubernetes-integration/1498771235025195008

/test pull-kubernetes-integration

the good thing is that we store now the number of leaked goroutines

grep goroutines junit_20220301-214450.stdout                                                                                                                                                                                                                                                                                            
{"Time":"2022-03-01T21:47:35.765303573Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/apiserver/tracing","Output":"I0301 21:47:35.765240  118920 etcd.go:198] unexpected number of goroutines: before: 3 after 1541\n"}
{"Time":"2022-03-01T21:47:36.042225892Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/apiserver/podlogs","Output":"I0301 21:47:36.042152  118909 etcd.go:198] unexpected number of goroutines: before: 3 after 987\n"}
{"Time":"2022-03-01T21:48:23.455810868Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/apimachinery","Output":"I0301 21:48:23.455757  118727 etcd.go:198] unexpected number of goroutines: before: 3 after 2999\n"}
{"Time":"2022-03-01T21:49:24.644900783Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/apiserver/certreload","Output":"I0301 21:49:24.644831  118887 etcd.go:198] unexpected number of goroutines: before: 3 after 4879\n"}
{"Time":"2022-03-01T21:49:34.396372627Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/certificates","Output":"I0301 21:49:34.396148  119243 etcd.go:198] unexpected number of goroutines: before: 3 after 3556\n"}
{"Time":"2022-03-01T21:49:39.848027866Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/configmap","Output":"I0301 21:49:39.840470  119384 etcd.go:198] unexpected number of goroutines: before: 3 after 1007\n"}
{"Time":"2022-03-01T21:49:46.230610039Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/apiserver/flowcontrol","Output":"I0301 21:49:46.230537  118898 etcd.go:198] unexpected number of goroutines: before: 3 after 3101\n"}
{"Time":"2022-03-01T21:50:01.919001956Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/cronjob","Output":"I0301 21:50:01.918935  119624 etcd.go:198] unexpected number of goroutines: before: 3 after 1024\n"}
{"Time":"2022-03-01T21:50:27.469369553Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/defaulttolerationseconds","Output":"I0301 21:50:27.469289  120470 etcd.go:198] unexpected number of goroutines: before: 3 after 1007\n"}
{"Time":"2022-03-01T21:51:04.646134767Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/apiserver/admissionwebhook","Output":"I0301 21:51:04.644256  118934 etcd.go:198] unexpected number of goroutines: before: 3 after 3059\n"}
{"Time":"2022-03-01T21:59:12.016035665Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/endpoints","Output":"I0301 21:59:12.015900  121066 etcd.go:198] unexpected number of goroutines: before: 3 after 2009\n"}
{"Time":"2022-03-01T21:59:16.159152794Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/disruption","Output":"I0301 21:59:16.158942  120640 etcd.go:198] unexpected number of goroutines: before: 3 after 1132\n"}
{"Time":"2022-03-01T21:59:21.79111696Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/dryrun","Output":"I0301 21:59:21.791056  121004 etcd.go:198] unexpected number of goroutines: before: 3 after 249\n"}
{"Time":"2022-03-01T21:59:26.547260113Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/events","Output":"I0301 21:59:26.547192  121184 etcd.go:198] unexpected number of goroutines: before: 3 after 250\n"}
{"Time":"2022-03-01T21:59:50.580302255Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/evictions","Output":"I0301 21:59:50.580239  121294 etcd.go:198] unexpected number of goroutines: before: 3 after 3014\n"}
{"Time":"2022-03-01T21:59:54.337125792Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/examples","Output":"I0301 21:59:54.337064  122051 etcd.go:198] unexpected number of goroutines: before: 3 after 1311\n"}
{"Time":"2022-03-01T21:59:54.535763682Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/endpointslice","Output":"I0301 21:59:54.535359  121085 etcd.go:198] unexpected number of goroutines: before: 3 after 7010\n"}
{"Time":"2022-03-01T22:00:08.67585337Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/etcd","Output":"I0301 22:00:08.675757  121107 etcd.go:198] unexpected number of goroutines: before: 3 after 4501\n"}
{"Time":"2022-03-01T22:00:12.982432894Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/kubelet","Output":"I0301 22:00:12.982373  123146 etcd.go:198] unexpected number of goroutines: before: 3 after 1799\n"}
{"Time":"2022-03-01T22:00:26.31239131Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/dualstack","Output":"I0301 22:00:26.312325  121056 etcd.go:198] unexpected number of goroutines: before: 3 after 13975\n"}
{"Time":"2022-03-01T22:00:27.118015838Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/namespace","Output":"I0301 22:00:27.117949  123196 etcd.go:198] unexpected number of goroutines: before: 3 after 2005\n"}
{"Time":"2022-03-01T22:00:32.315697862Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/network","Output":"I0301 22:00:32.315633  123279 etcd.go:198] unexpected number of goroutines: before: 3 after 1007\n"}
{"Time":"2022-03-01T22:00:36.265988295Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/metrics","Output":"I0301 22:00:36.265897  123156 etcd.go:198] unexpected number of goroutines: before: 3 after 4991\n"}
{"Time":"2022-03-01T22:00:39.954476249Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/objectmeta","Output":"I0301 22:00:39.951263  123394 etcd.go:198] unexpected number of goroutines: before: 3 after 1007\n"}
{"Time":"2022-03-01T22:00:56.681304173Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/scale","Output":"I0301 22:00:56.681230  123663 etcd.go:198] unexpected number of goroutines: before: 3 after 243\n"}
{"Time":"2022-03-01T22:01:08.592488254Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/quota","Output":"I0301 22:01:08.592421  123620 etcd.go:198] unexpected number of goroutines: before: 3 after 3029\n"}
{"Time":"2022-03-01T22:01:13.412467178Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/node","Output":"I0301 22:01:13.412396  123300 etcd.go:198] unexpected number of goroutines: before: 3 after 4072\n"}
{"Time":"2022-03-01T22:01:28.189577469Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/secrets","Output":"I0301 22:01:28.189512  123910 etcd.go:198] unexpected number of goroutines: before: 3 after 1007\n"}
{"Time":"2022-03-01T22:01:29.149558437Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/pods","Output":"I0301 22:01:29.149495  123592 etcd.go:198] unexpected number of goroutines: before: 3 after 5987\n"}
{"Time":"2022-03-01T22:02:22.638666535Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/service","Output":"I0301 22:02:22.637945  123951 etcd.go:198] unexpected number of goroutines: before: 3 after 9202\n"}
{"Time":"2022-03-01T22:02:23.662929701Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/serviceaccount","Output":"I0301 22:02:23.662866  124182 etcd.go:198] unexpected number of goroutines: before: 3 after 3995\n"}
{"Time":"2022-03-01T22:02:28.565856594Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/replicationcontroller","Output":"I0301 22:02:28.565782  123643 etcd.go:198] unexpected number of goroutines: before: 3 after 14109\n"}
{"Time":"2022-03-01T22:02:40.556993448Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/garbagecollector","Output":"I0301 22:02:40.556910  122677 etcd.go:198] unexpected number of goroutines: before: 3 after 3089\n"}
{"Time":"2022-03-01T22:02:41.715607952Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/replicaset","Output":"I0301 22:02:41.715575  123632 etcd.go:198] unexpected number of goroutines: before: 3 after 19133\n"}
{"Time":"2022-03-01T22:02:44.831681394Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/storageclasses","Output":"I0301 22:02:44.831625  126124 etcd.go:198] unexpected number of goroutines: before: 3 after 1007\n"}
{"Time":"2022-03-01T22:02:55.018113641Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/tls","Output":"I0301 22:02:55.018041  126293 etcd.go:198] unexpected number of goroutines: before: 3 after 124\n"}
{"Time":"2022-03-01T22:03:03.992507601Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/job","Output":"I0301 22:03:03.992386  123134 etcd.go:198] unexpected number of goroutines: before: 3 after 21023\n"}
{"Time":"2022-03-01T22:03:05.208920923Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/ttlcontroller","Output":"I0301 22:03:05.208626  126304 etcd.go:198] unexpected number of goroutines: before: 3 after 1007\n"}
{"Time":"2022-03-01T22:03:12.927648568Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/storageversion","Output":"I0301 22:03:12.927576  126146 etcd.go:198] unexpected number of goroutines: before: 3 after 729\n"}
{"Time":"2022-03-01T22:03:13.391092536Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/statefulset","Output":"I0301 22:03:13.391011  126114 etcd.go:198] unexpected number of goroutines: before: 3 after 4256\n"}
{"Time":"2022-03-01T22:03:23.847942205Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/serving","Output":"I0301 22:03:23.847813  126083 etcd.go:198] unexpected number of goroutines: before: 3 after 799\n"}
{"Time":"2022-03-01T22:05:17.670386047Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/volume","Output":"I0301 22:05:17.670298  126367 etcd.go:198] unexpected number of goroutines: before: 3 after 15019\n"}
{"Time":"2022-03-01T22:07:15.35259352Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/volumescheduling","Output":"I0301 22:07:15.352351  126410 etcd.go:198] unexpected number of goroutines: before: 3 after 14290\n"}
{"Time":"2022-03-01T22:10:04.216211003Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/scheduler","Output":"I0301 22:10:04.216110  123754 etcd.go:198] unexpected number of goroutines: before: 3 after 68597\n"}

after that no more tests are scheduled

tail -2 junit_20220301-214450.stdout                                                                                                                                                                                                                                                                                                         SIGINT 
{"Time":"2022-03-01T22:10:04.504166796Z","Action":"output","Package":"k8s.io/kubernetes/test/integration/scheduler","Output":"ok  \tk8s.io/kubernetes/test/integration/scheduler\t537.040s\n"}
{"Time":"2022-03-01T22:10:04.504214527Z","Action":"pass","Package":"k8s.io/kubernetes/test/integration/scheduler","Elapsed":537.108}

some tests timeout, but they run until the end

tail -25 build-log.txt 
W0301 21:58:37.319462  120640 reflector.go:442] k8s.io/client-go/informers/factory.go:134: watch of *v1.RuntimeClass ended with: very short watch: k8s.io/client-go/informers/factory.go:134: Unexpected watch close - watch lasted less than a second and no items received
W0301 21:58:37.319494  120640 reflector.go:442] k8s.io/client-go/informers/factory.go:134: watch of *v1.StorageClass ended with: very short watch: k8s.io/client-go/informers/factory.go:134: Unexpected watch close - watch lasted less than a second and no items received
W0301 21:58:37.320235  120640 cacher.go:150] Terminating all watchers from cacher *admissionregistration.MutatingWebhookConfiguration
W0301 21:58:37.320676  120640 reflector.go:442] k8s.io/client-go/informers/factory.go:134: watch of *v1.MutatingWebhookConfiguration ended with: very short watch: k8s.io/client-go/informers/factory.go:134: Unexpected watch close - watch lasted less than a second and no items received
    testserver.go:331: failed to launch server: failed to wait for default namespace to be created: timed out waiting for the condition
    --- FAIL: TestEmptySelector/v1_should_target_all_pods (423.64s)

=== FAIL: test/integration/disruption TestEmptySelector (431.70s)

DONE 3872 tests, 53 skipped, 11 failures in 83.247s
+++ [0301 22:11:29] Saved JUnit XML test report to /logs/artifacts/junit_20220301-214450.xml
make[1]: *** [Makefile:185: test] Error 1
!!! [0301 22:11:29] Call tree:
!!! [0301 22:11:29]  1: hack/make-rules/test-integration.sh:96 runTests(...)
+++ [0301 22:11:29] Cleaning up etcd
+++ [0301 22:11:29] Integration test cleanup complete
make: *** [Makefile:204: test-integration] Error 1
+ EXIT_VALUE=2
+ set +o xtrace
Cleaning up after docker in docker.
================================================================================
Cleaning up after docker
Stopping Docker: dockerProgram process in pidfile '/var/run/docker-ssd.pid', 1 process(es), refused to die.
================================================================================
Done cleaning up after docker in docker.

The job may not be able to parse the size output to reports the results

-rw-r--r--   1 aojea aojea 2,9G mar  2 16:59  junit_20220301-214450.stdout

All the test failures show problems related to grpc connecting to etcd 🤷 , but it is possible that a lot of those orphan goroutines are etcd connections lingering

@tkashem tkashem changed the title Revert "Merge pull request #107797 from tkashem/revert-107456" Revert the revert of #107456 - "apf: change controller to use SSA for patches" Mar 7, 2022
@tkashem
Copy link
Contributor Author

tkashem commented Mar 14, 2022

/assign @wojtek-t @MikeSpreitzer

Copy link
Member

@MikeSpreitzer MikeSpreitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/LGTM

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 18, 2022
@wojtek-t
Copy link
Member

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MikeSpreitzer, tkashem, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 21, 2022
@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@MikeSpreitzer
Copy link
Member

/test pull-kubernetes-dependencies

@MikeSpreitzer
Copy link
Member

/retest

@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

2 similar comments
@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@MadhavJivrajani
Copy link
Contributor

/retest

@k8s-ci-robot k8s-ci-robot merged commit 9a8defd into kubernetes:master Mar 22, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.24 milestone Mar 22, 2022
@liggitt
Copy link
Member

liggitt commented Mar 25, 2022

This reintroduced panics in integration tests:

#109026 (comment)

@liggitt
Copy link
Member

liggitt commented Mar 25, 2022

opened a revert. please don't reintroduce it until there's a clean integration test run without panics like "Observed a panic: FieldManager must be installed to run apply" in the log.

integration tests have been very flaky since this merged (and failed 3/18 runs on this PR = ~16% failure rate)... it looks like this is causing 40k+ panics per run, and a log size of 3gb

@tkashem
Copy link
Contributor Author

tkashem commented Mar 25, 2022

okay, reverting it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants