Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to Kubernetes v1.28.3 #1776

Merged
merged 62 commits into from Oct 26, 2023
Merged

Conversation

bertinatto
Copy link
Member

pohly and others added 30 commits September 8, 2023 21:25
When some plugin was registered as "unschedulable" in some previous scheduling
attempt, it kept that attribute for a pod forever. When that plugin then later
failed with an error that requires backoff, the pod was incorrectly moved to the
"unschedulable" queue where it got stuck until the periodic flushing because
there was no event that the plugin was waiting for.

Here's an example where that happened:

     framework.go:1280: E0831 20:03:47.184243] Reserve/DynamicResources: Plugin failed err="Operation cannot be fulfilled on podschedulingcontexts.resource.k8s.io \"test-dragxd5c\": the object has been modified; please apply your changes to the latest version and try again" node="scheduler-perf-dra-7l2v2" plugin="DynamicResources" pod="test/test-dragxd5c"
    schedule_one.go:1001: E0831 20:03:47.184345] Error scheduling pod; retrying err="running Reserve plugin \"DynamicResources\": Operation cannot be fulfilled on podschedulingcontexts.resource.k8s.io \"test-dragxd5c\": the object has been modified; please apply your changes to the latest version and try again" pod="test/test-dragxd5c"
    ...
    scheduling_queue.go:745: I0831 20:03:47.198968] Pod moved to an internal scheduling queue pod="test/test-dragxd5c" event="ScheduleAttemptFailure" queue="Unschedulable" schedulingCycle=9576 hint="QueueSkip"

Pop still needs the information about unschedulable plugins to update the
UnschedulableReason metric. It can reset that information before returning the
PodInfo for the next scheduling attempt.
Signed-off-by: Rita Zhang <rita.z.zhang@gmail.com>
The status error was embedded inside the new error constructed by
WaitForPodsResponding's get function, but not wrapped. Therefore
`apierrors.IsServiceUnavailable(err)` didn't find it and returned false -> no
retries.

Wrapping fixes this and Gomega formatting of the error remains useful:

	err := &errors.StatusError{}
	err.ErrStatus.Code = 503
	err.ErrStatus.Message = "temporary failure"

	err2 := fmt.Errorf("Controller %s: failed to Get from replica pod %s:\n%w\nPod status:\n%s",
		"foo", "bar",
		err, "some status")
	fmt.Println(format.Object(err2, 1))
        fmt.Println(errors.IsServiceUnavailable(err2))

=>

    <*fmt.wrapError | 0xc000139340>:
    Controller foo: failed to Get from replica pod bar:
    temporary failure
    Pod status:
    some status
    {
        msg: "Controller foo: failed to Get from replica pod bar:\ntemporary failure\nPod status:\nsome status",
        err: <*errors.StatusError | 0xc0001a01e0>{
            ErrStatus: {
                TypeMeta: {Kind: "", APIVersion: ""},
                ListMeta: {
                    SelfLink: "",
                    ResourceVersion: "",
                    Continue: "",
                    RemainingItemCount: nil,
                },
                Status: "",
                Message: "temporary failure",
                Reason: "",
                Details: nil,
                Code: 503,
            },
        },
    }

    true
Change-Id: Id29b5b377989dcb5377316cfcdea367071a47365
Co-authored-by: Dave Chen <dave.chen@arm.com>
The Service API Rest implementation is complex and has to use different
hooks on the REST storage. The status store was making a shallow copy of
the storage before adding the hooks, so it was not inheriting the hooks.

The status store must have the same hooks as the rest store to be able
to handle correctly the allocation and deallocation of ClusterIPs and
nodePorts.

Change-Id: I44be21468d36017f0ec41a8f912b8490f8f13f55
Signed-off-by: Antonio Ojea <aojea@google.com>
…-of-#120559-origin-release-1.28

Automated cherry pick of kubernetes#120559: e2e pods: fix WaitForPodsResponding retry
…k-of-#119824-upstream-release-1.28

Automated cherry pick of kubernetes#119824: fix race on etcd client constructor for healthchecks
…ick-of-#120561-upstream-release-1.28

Automated cherry pick of kubernetes#120561: kubeadm: remove reference of
Change-Id: I7ed4b006faecf0a7e6e583c42b4d6bc4b786a164
Bumping govmomi to include an error check fix needed
to work with go1.20. We made this fix in the CI, but
were reliant on the text matching of error strings,
which is why it didn't catch the actual issue. This

Fix in vmware/govmomi@b4eac19
PR to bump govmomi in cloud-provider-vsphere: kubernetes/cloud-provider-vsphere#738

Signed-off-by: Madhav Jivrajani <madhav.jiv@gmail.com>
Signed-off-by: Madhav Jivrajani <madhav.jiv@gmail.com>
…p-1.28

[1.28][go1.20] .: bump govmomi to v0.30.6
- this function is used by other packages and  was mistakenly removed
  in 397cc73
- let resource quota controller use this constructor instead of an
  object instantiation
…-of-#120334-origin-release-1.28

Automated cherry pick of kubernetes#120334: scheduler: start scheduling attempt with clean
…-of-#120623-upstream-release-1.28

Automated cherry pick of kubernetes#120623: sync Service API status rest storage
…k-of-#120577-upstream-release-1.28

Automated cherry pick of kubernetes#120577: Increase range of job_sync_duration_seconds
…pick-of-#120777-upstream-release-1.28

Automated cherry pick of kubernetes#120777: reintroduce resourcequota.NewMonitor
…list of cronjobs

Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
…ry-pick-of-#119317-upstream-release-1.28

Automated cherry pick of kubernetes#119317: change rolling update logic to exclude sunsetting nodes
…port

kmsv2: reload metrics bug fix backport
…y-pick-of-#120649-origin-release-1.28

Automated cherry pick of kubernetes#120649: cronjob controller: ensure already existing jobs are added to
This change switches to using isCgroup2UnifiedMode locally to ensure
that any mocked function is also used when checking the swap controller
availability.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change bypasses all logic to set swap in the linux container
resources if a swap controller is not available on node. Failing
to do so may cause errors in runc when starting a container with
a swap configuration -- even if this is set to 0.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
…ck-of-#120784-upstream-release-1.28

Automated cherry pick of kubernetes#120784: Use local isCgroup2UnifiedMode consistently
Signed-off-by: cpanato <ctadeu@gmail.com>
k8s-ci-robot and others added 4 commits October 16, 2023 16:07
…-pick-of-#119732-upstream-release-1.28

Automated cherry pick of kubernetes#119732: Fix to honor PDB with an empty selector `{}`
…ated-cherry-pick-of-#121142-upstream-release-1.28

Automated cherry pick of kubernetes#121142: Modify test PVC to detect concurrent map write bug
Kubernetes official release v1.28.3
@openshift-ci-robot openshift-ci-robot added the backports/unvalidated-commits Indicates that not all commits come to merged upstream PRs. label Oct 19, 2023
@openshift-ci-robot
Copy link

@bertinatto: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@openshift-ci openshift-ci bot added the vendor-update Touching vendor dir or related files label Oct 19, 2023
UnauthenticatedHTTP2DOSMitigation: {Default: true, PreRelease: featuregate.Beta},

||||||| 89a4ea3e1e4
=======
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In upstream, this was disabled by kubernetes@0f33a62.

Also, switching the gate here wouldn't be enough. If this gate is to be enabled, it also needs to be switched in /pkg/features/kube_features.go

Update REBASE.openshift.md file with new RHEL 9 images.
This is a follow up commit to enable the UnauthenticatedHTTP2DOSMitigation
feature gate in pkg/features/kube_features.go, which hadn't been done in
the previous commit because the gate didn't exist in that file yet.
@openshift-ci-robot
Copy link

@bertinatto: the contents of this pull request could not be automatically validated.

The following commits are valid:

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@bertinatto
Copy link
Member Author

/test unit
/test e2e-aws-csi

@bertinatto
Copy link
Member Author

/test unit

Copy link
Member

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@soltysh
Copy link
Member

soltysh commented Oct 26, 2023

/override ci/prow/verify-commits
this never passes on k8s bump

@soltysh
Copy link
Member

soltysh commented Oct 26, 2023

/remove-label backports/unvalidated-commits
/label backports/validated-commits

@openshift-ci openshift-ci bot added backports/validated-commits Indicates that all commits come to merged upstream PRs. and removed backports/unvalidated-commits Indicates that not all commits come to merged upstream PRs. labels Oct 26, 2023
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 26, 2023
@openshift-ci
Copy link

openshift-ci bot commented Oct 26, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bertinatto, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 26, 2023
@openshift-ci
Copy link

openshift-ci bot commented Oct 26, 2023

@soltysh: Overrode contexts on behalf of soltysh: ci/prow/verify-commits

In response to this:

/override ci/prow/verify-commits
this never passes on k8s bump

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link

openshift-ci bot commented Oct 26, 2023

@bertinatto: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci openshift-ci bot merged commit fa9f909 into openshift:master Oct 26, 2023
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. backports/validated-commits Indicates that all commits come to merged upstream PRs. lgtm Indicates that a PR is ready to be merged. vendor-update Touching vendor dir or related files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet