Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apf: change controller to use SSA for patches #110173

Merged
merged 1 commit into from Oct 17, 2022

Conversation

tkashem
Copy link
Contributor

@tkashem tkashem commented May 23, 2022

What type of PR is this?

/kind bug

What this PR does / why we need it:

This brings back #107456 to see if it introduces flakes/timeouts in the integration tests

  • reproduce the integration test timeout that this PR introduces
  • check the very large log file of the integration test and determine the root cause

Which issue(s) this PR fixes:

Fixes #107727

Special notes for your reviewer:

Does this PR introduce a user-facing change?

-->

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 23, 2022
@k8s-ci-robot k8s-ci-robot added area/apiserver area/test sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 23, 2022
@tkashem
Copy link
Contributor Author

tkashem commented May 23, 2022

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 23, 2022
@tkashem
Copy link
Contributor Author

tkashem commented May 23, 2022

suggestions:

if a.group.OpenAPIModels != nil && utilfeature.DefaultFeatureGate.Enabled(features.ServerSideApply) {
reqScope.FieldManager, err = fieldmanager.NewDefaultFieldManager(
a.group.TypeConverter,
a.group.UnsafeConvertor,
a.group.Defaulter,
a.group.Creater,
fqKindToRegister,
reqScope.HubGroupVersion,
subresource,
resetFields,
)
if err != nil {
return nil, nil, fmt.Errorf("failed to create field manager: %v", err)
}

we probably have to remove utilfeature.DefaultFeatureGate.Enabled(features.ServerSideApply from the conditional

OpenAPIV3: {Default: true, PreRelease: featuregate.Beta},

APIPriorityAndFairness: {Default: true, PreRelease: featuregate.Beta},

probably set LockToDefault:true

we will need to reproduce the issue first, apply these changes and show that it fixes the issue.

@tkashem
Copy link
Contributor Author

tkashem commented May 23, 2022

cc @aojea @MikeSpreitzer @wojtek-t

@tkashem
Copy link
Contributor Author

tkashem commented May 24, 2022

/test kubernetes-integration

@k8s-ci-robot
Copy link
Contributor

@tkashem: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test pull-kubernetes-conformance-kind-ga-only-parallel
  • /test pull-kubernetes-coverage-unit
  • /test pull-kubernetes-dependencies
  • /test pull-kubernetes-dependencies-go-canary
  • /test pull-kubernetes-e2e-gce
  • /test pull-kubernetes-e2e-gce-100-performance
  • /test pull-kubernetes-e2e-gce-big-performance
  • /test pull-kubernetes-e2e-gce-canary
  • /test pull-kubernetes-e2e-gce-network-proxy-http-connect
  • /test pull-kubernetes-e2e-gce-no-stage
  • /test pull-kubernetes-e2e-gce-scale-performance-manual
  • /test pull-kubernetes-e2e-gce-ubuntu-containerd
  • /test pull-kubernetes-e2e-gce-ubuntu-containerd-canary
  • /test pull-kubernetes-e2e-kind
  • /test pull-kubernetes-e2e-kind-ipv6
  • /test pull-kubernetes-files-remake
  • /test pull-kubernetes-integration
  • /test pull-kubernetes-integration-go-canary
  • /test pull-kubernetes-kubemark-e2e-gce-scale
  • /test pull-kubernetes-node-e2e-containerd
  • /test pull-kubernetes-typecheck
  • /test pull-kubernetes-unit
  • /test pull-kubernetes-unit-go-canary
  • /test pull-kubernetes-update
  • /test pull-kubernetes-verify
  • /test pull-kubernetes-verify-go-canary
  • /test pull-kubernetes-verify-govet-levee

The following commands are available to trigger optional jobs:

  • /test check-dependency-stats
  • /test pull-kubernetes-conformance-image-test
  • /test pull-kubernetes-conformance-kind-ga-only
  • /test pull-kubernetes-conformance-kind-ipv6-parallel
  • /test pull-kubernetes-cross
  • /test pull-kubernetes-e2e-capz-azure-disk
  • /test pull-kubernetes-e2e-capz-azure-disk-vmss
  • /test pull-kubernetes-e2e-capz-azure-file
  • /test pull-kubernetes-e2e-capz-azure-file-vmss
  • /test pull-kubernetes-e2e-capz-conformance
  • /test pull-kubernetes-e2e-capz-ha-control-plane
  • /test pull-kubernetes-e2e-capz-windows-containerd
  • /test pull-kubernetes-e2e-containerd-gce
  • /test pull-kubernetes-e2e-gce-alpha-features
  • /test pull-kubernetes-e2e-gce-correctness
  • /test pull-kubernetes-e2e-gce-csi-serial
  • /test pull-kubernetes-e2e-gce-device-plugin-gpu
  • /test pull-kubernetes-e2e-gce-kubetest2
  • /test pull-kubernetes-e2e-gce-network-proxy-grpc
  • /test pull-kubernetes-e2e-gce-storage-disruptive
  • /test pull-kubernetes-e2e-gce-storage-slow
  • /test pull-kubernetes-e2e-gce-storage-snapshot
  • /test pull-kubernetes-e2e-gci-gce-autoscaling
  • /test pull-kubernetes-e2e-gci-gce-ingress
  • /test pull-kubernetes-e2e-gci-gce-ipvs
  • /test pull-kubernetes-e2e-kind-canary
  • /test pull-kubernetes-e2e-kind-dual-canary
  • /test pull-kubernetes-e2e-kind-ipv6-canary
  • /test pull-kubernetes-e2e-kind-ipvs-dual-canary
  • /test pull-kubernetes-e2e-kind-multizone
  • /test pull-kubernetes-e2e-kops-aws
  • /test pull-kubernetes-e2e-ubuntu-gce-network-policies
  • /test pull-kubernetes-kubemark-e2e-gce-big
  • /test pull-kubernetes-local-e2e
  • /test pull-kubernetes-node-crio-cgrpv2-e2e
  • /test pull-kubernetes-node-crio-cgrpv2-e2e-kubetest2
  • /test pull-kubernetes-node-crio-e2e
  • /test pull-kubernetes-node-crio-e2e-kubetest2
  • /test pull-kubernetes-node-e2e-containerd-alpha-features
  • /test pull-kubernetes-node-e2e-containerd-features
  • /test pull-kubernetes-node-e2e-containerd-features-kubetest2
  • /test pull-kubernetes-node-e2e-containerd-kubetest2
  • /test pull-kubernetes-node-kubelet-credential-provider
  • /test pull-kubernetes-node-kubelet-serial-containerd
  • /test pull-kubernetes-node-kubelet-serial-containerd-kubetest2
  • /test pull-kubernetes-node-kubelet-serial-cpu-manager
  • /test pull-kubernetes-node-kubelet-serial-cpu-manager-kubetest2
  • /test pull-kubernetes-node-kubelet-serial-crio-cgroupv1
  • /test pull-kubernetes-node-kubelet-serial-crio-cgroupv2
  • /test pull-kubernetes-node-kubelet-serial-hugepages
  • /test pull-kubernetes-node-kubelet-serial-memory-manager
  • /test pull-kubernetes-node-kubelet-serial-topology-manager
  • /test pull-kubernetes-node-kubelet-serial-topology-manager-kubetest2
  • /test pull-kubernetes-node-memoryqos-cgrpv2
  • /test pull-kubernetes-node-swap-fedora
  • /test pull-kubernetes-node-swap-fedora-serial
  • /test pull-kubernetes-node-swap-ubuntu-serial
  • /test pull-kubernetes-unit-experimental
  • /test pull-publishing-bot-validate

Use /test all to run the following jobs that were automatically triggered:

  • pull-kubernetes-conformance-kind-ga-only-parallel
  • pull-kubernetes-conformance-kind-ipv6-parallel
  • pull-kubernetes-dependencies
  • pull-kubernetes-e2e-gce-100-performance
  • pull-kubernetes-e2e-gce-ubuntu-containerd
  • pull-kubernetes-e2e-kind
  • pull-kubernetes-e2e-kind-ipv6
  • pull-kubernetes-integration
  • pull-kubernetes-node-e2e-containerd
  • pull-kubernetes-typecheck
  • pull-kubernetes-unit
  • pull-kubernetes-verify
  • pull-kubernetes-verify-govet-levee

In response to this:

/test kubernetes-integration

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tkashem
Copy link
Contributor Author

tkashem commented May 24, 2022

/test pull-kubernetes-integration

1 similar comment
@tkashem
Copy link
Contributor Author

tkashem commented May 24, 2022

/test pull-kubernetes-integration

@wojtek-t
Copy link
Member

@tkashem - I don't think we should be trying to reproduce by running integration tests at this point.

IIRC, this PR was causing panics of kube-apiserver - can you please start with checking the logs that those are not happening (or maybe rather first confirm it was the case previously)?

@tkashem
Copy link
Contributor Author

tkashem commented May 24, 2022

IIRC, this PR was causing panics of kube-apiserver - can you please start with checking the logs that those are not happening (or maybe rather first confirm it was the case previously)

Yes, I already saw the panic yesterday from the first passing integration job. I am re-running the integration test to see if it has other effects.

@leilajal
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 24, 2022
@tkashem
Copy link
Contributor Author

tkashem commented May 29, 2022

/test pull-kubernetes-integration

1 similar comment
@tkashem
Copy link
Contributor Author

tkashem commented Jun 2, 2022

/test pull-kubernetes-integration

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 14, 2022
@tkashem
Copy link
Contributor Author

tkashem commented Sep 14, 2022

/assign @MikeSpreitzer

@@ -462,6 +458,37 @@ func (cfgCtlr *configController) digestConfigObjects(newPLs []*flowcontrol.Prior
return suggestedDelay, utilerrors.NewAggregate(errs)
}

func updateOrApply(client flowcontrolclient.FlowSchemaInterface, fsu fsStatusUpdate, asFieldManager string) error {
if utilfeature.DefaultFeatureGate.Enabled(features.ServerSideApply) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt - we're generally locking GA features to true and SSA is already GA, but not locked:
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/features/kube_features.go#L234

What was the reason for it?

[I'm asking because if it was locked, then we wouldn't need this conditional logic here.]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @apelisse - I'd be in favor of locking/dropping the conditional apply gate logic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #112748

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tkashem - this is now merged
Once the dependent PR are merged you can update this PR and get rid of this condition - this if will now always be true.

@wojtek-t wojtek-t self-assigned this Sep 14, 2022
@liggitt
Copy link
Member

liggitt commented Sep 15, 2022

I just noticed this PR referenced in #112306 (comment)

can we resolve the test flake in reset_fields_test.go and drop flowschemas from resetFieldsSkippedResources before re-enabling SSA in the APF controller?

@tkashem
Copy link
Contributor Author

tkashem commented Sep 16, 2022

I just noticed this PR referenced in #112306 (comment)

can we resolve the test flake in reset_fields_test.go and drop flowschemas from resetFieldsSkippedResources before re-enabling SSA in the APF controller?

@liggitt are you referring to adding the patchStrategy annotation in Condition slices in FlowSchemaStatus and PriorityLevelConfigurationStatus of v1beta3 so strategic merge patch does not replace the entire conditions slice? #107574

Please see - #104842 (comment)

@liggitt
Copy link
Member

liggitt commented Sep 16, 2022

@liggitt are you referring to adding the patchStrategy annotation in Condition slices in FlowSchemaStatus and PriorityLevelConfigurationStatus of v1beta3 so strategic merge patch does not replace the entire conditions slice? #107574

If that was the cause of and resolution to the patch test conflict/issue, then yes

would this order of operations make sense?

  1. v1beta3 type with strategic merge patch annotations and switch the controller to v1beta3 (add v1beta3 for Priority And Fairness #112306)
  2. remove flowschemas from resetFieldsSkippedResources (another PR)
  3. switch the controller to SSA (this PR)

@tkashem
Copy link
Contributor Author

tkashem commented Sep 19, 2022

@MikeSpreitzer
Copy link
Member

@tkashem : the reasons for the hold on this are gone, right?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 4, 2022
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Oct 8, 2022
@tkashem
Copy link
Contributor Author

tkashem commented Oct 8, 2022

the reasons for the hold on this are gone, right?

@MikeSpreitzer I wanted to make sure that #112575 resolves #104842

It's been a week, and https://storage.googleapis.com/k8s-triage/index.html?date=2022-10-01&job=unit&test=TestApplyResetFields yields zero result, so I am confident the flake has been resolved

Also with #112748 merged, SSA is always enabled, so I removed the conditiona updateOrApply method.

Copy link
Member

@MikeSpreitzer MikeSpreitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/LGTM
Thanks!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 10, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MikeSpreitzer, tkashem

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 10, 2022
@tkashem
Copy link
Contributor Author

tkashem commented Oct 17, 2022

/hold cancel

all dependencies have merged, and search in ci yields yields zero fake

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 17, 2022
@k8s-ci-robot k8s-ci-robot merged commit b87802b into kubernetes:master Oct 17, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.26 milestone Oct 17, 2022
@wojtek-t
Copy link
Member

wojtek-t commented Nov 2, 2022

for posterity - this LGTM.

@MikeSpreitzer
Copy link
Member

And thanks all for persistence in getting this done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
6 participants