Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep PLEG interruptions in a separate interface #113825

Merged
merged 1 commit into from Oct 17, 2023

Conversation

harche
Copy link
Contributor

@harche harche commented Nov 10, 2022

Signed-off-by: Harshal Patil harpatil@redhat.com

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Addresses some of the minor comments from the Evented PLEG PR, #111384, especially #111384 (comment)

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Mainly trying to address this comment, #111384 (comment)

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 10, 2022
@harche
Copy link
Contributor Author

harche commented Nov 10, 2022

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. area/kubelet and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 10, 2022
@bart0sh
Copy link
Contributor

bart0sh commented Dec 14, 2022

/priority important-soon
/triage accepted

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 14, 2022
@bart0sh bart0sh moved this from Triage to Needs Reviewer in SIG Node PR Triage Dec 14, 2022
Watch() chan *PodLifecycleEvent
Healthy() (bool, error)
PodLifecycleEventGeneratorHandler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harche Thank you for your PR!

As far as I understood @derekwaynecarr's comment PodLifecycleEventGenerator should not include PodLifecycleEventGeneratorHandler methods as this is all the rest of the kubelet needs to know about. I guess you should do it other way around - PodLifecycleEventGeneratorHandler should embed PodLifecycleEventGenerator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harche please, address this comment to proceed further with the PR, thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will think through this and reply here. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GenericPLEG had been an implementation of the PodLifecycleEventGenerator interface and so is recently added EventedPLEG.. I am trying to accomodate @derekwaynecarr's request to seperate the newly added methods from existing methods without bringing in any major changes to how is it designed (especially GenericPLEG) as it stands today.

What do you think of following approach where we reduce the scope of podLifecycleEventGeneratorHandler by making it private?

type PodLifecycleEventGenerator interface {
	Start()
	Watch() chan *PodLifecycleEvent
	Healthy() (bool, error)
	podLifecycleEventGeneratorHandler
}

type podLifecycleEventGeneratorHandler interface {
	Stop()
	Update(relistDuration *RelistDuration)
	Relist()
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the moment you put podLifecycleEventGeneratorHandler inside PodLifecycleEventGenerator it is no longer private.

Copy link
Contributor Author

@harche harche Apr 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bart0sh Just occurred to me that I could have used type assertion to achieve what we want to do instead of interface composition. I just pushed the changes, please let me know WDYT.

Copy link
Contributor Author

@harche harche Apr 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, it's not right. The latest changes don't implement the interface correctly. Let me see if we can fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the interdepencies between those 6 methods is such that, if you try to truly separate them in two different interfaces it starts to mess around with the simple design of the existing Generic PLEG. I am not sure if this refactor is worth that.

But again, may be I am overlooking a simpler approach, so I am very much open to discussing the alternative approach.

Copy link
Contributor

@bart0sh bart0sh Apr 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harche FWIW, here is what I came up with. Not sure if this would work, but it compiles and passes all Kubelet unit tests. WDYT?

diff --git a/pkg/kubelet/pleg/evented.go b/pkg/kubelet/pleg/evented.go
index 1dd176489a5..704e9e5390a 100644
--- a/pkg/kubelet/pleg/evented.go
+++ b/pkg/kubelet/pleg/evented.go
@@ -71,7 +71,7 @@ type EventedPLEG struct {
        // For testability.
        clock clock.Clock
        // GenericPLEG is used to force relist when required.
-       genericPleg PodLifecycleEventGenerator
+       genericPleg podLifecycleEventGeneratorHandler
        // The maximum number of retries when getting container events from the runtime.
        eventedPlegMaxStreamRetries int
        // Indicates relisting related parameters
@@ -93,7 +93,7 @@ func NewEventedPLEG(runtime kubecontainer.Runtime, runtimeService internalapi.Ru
                runtimeService:              runtimeService,
                eventChannel:                eventChannel,
                cache:                       cache,
-               genericPleg:                 genericPleg,
+               genericPleg:                 genericPleg.(podLifecycleEventGeneratorHandler),
                eventedPlegMaxStreamRetries: eventedPlegMaxStreamRetries,
                relistDuration:              relistDuration,
                clock:                       clock,
diff --git a/pkg/kubelet/pleg/pleg.go b/pkg/kubelet/pleg/pleg.go
index 2654f32d6fc..f972cc10f86 100644
--- a/pkg/kubelet/pleg/pleg.go
+++ b/pkg/kubelet/pleg/pleg.go
@@ -64,10 +64,14 @@ type PodLifecycleEvent struct {
 // PodLifecycleEventGenerator contains functions for generating pod life cycle events.
 type PodLifecycleEventGenerator interface {
        Start()
-       Stop()
-       Update(relistDuration *RelistDuration)
        Watch() chan *PodLifecycleEvent
        Healthy() (bool, error)
-       Relist()
        UpdateCache(*kubecontainer.Pod, types.UID) (error, bool)
 }
+
+type podLifecycleEventGeneratorHandler interface {
+       PodLifecycleEventGenerator
+       Stop()
+       Update(relistDuration *RelistDuration)
+       Relist()
+}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bart0sh Ah ha! I was trying to type cast while invoking the methods, e.genericPleg.(*GenericPLEG).Update(relistDuration) and got stuck in that. But your approach of casting Generic PLEG to podLifecycleEventGeneratorHandler works out better. Thanks for that pointer.

I verified that this change does not have any adverse effect on Evented or Generic PLEG and it does not export methods Stop(),Update(relistDuration *RelistDuration) and Relist() outside the pleg package.

PS: Considering type casting Generic PLEG to podLifecycleEventGeneratorHandler unblocked this PR, I feel it's a significant contribution and I have added you as a co-author in the commit. I hope that's alright by you.

eventedPlegRelistThreshold = time.Minute * 10
eventedPlegMaxStreamRetries = 5
genericPlegRelistPeriodWhenEventedPlegInUse = time.Second * 300
genericPlegRelistThresholdWhenEventedPlegInUse = time.Minute * 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point out where this renaming was requested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bart0sh
Copy link
Contributor

bart0sh commented Jan 14, 2023

/assign

@bart0sh bart0sh moved this from Needs Reviewer to Waiting on Author in SIG Node PR Triage Jan 17, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 20, 2023
@bart0sh
Copy link
Contributor

bart0sh commented Apr 23, 2023

/assign @SergeyKanzhelev @derekwaynecarr

@harche harche force-pushed the ep_comments branch 4 times, most recently from a6b98fc to b3e1693 Compare April 26, 2023 14:52
@@ -93,7 +93,7 @@ func NewEventedPLEG(runtime kubecontainer.Runtime, runtimeService internalapi.Ru
runtimeService: runtimeService,
eventChannel: eventChannel,
cache: cache,
genericPleg: genericPleg,
genericPleg: genericPleg.(podLifecycleEventGeneratorHandler),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for the sake of safety I'd suggest to add a casting check, e.g:

--- a/pkg/kubelet/pleg/evented.go
+++ b/pkg/kubelet/pleg/evented.go
@@ -87,17 +87,21 @@ type EventedPLEG struct {
 // NewEventedPLEG instantiates a new EventedPLEG object and return it.
 func NewEventedPLEG(runtime kubecontainer.Runtime, runtimeService internalapi.RuntimeService, eventChannel chan *PodLifecycleEvent,
        cache kubecontainer.Cache, genericPleg PodLifecycleEventGenerator, eventedPlegMaxStreamRetries int,
-       relistDuration *RelistDuration, clock clock.Clock) PodLifecycleEventGenerator {
+       relistDuration *RelistDuration, clock clock.Clock) (PodLifecycleEventGenerator, error) {
+       handler, ok := genericPleg.(podLifecycleEventGeneratorHandler)
+       if !ok {
+               return nil, fmt.Errorf("%v doesn't implement podLifecycleEventGeneratorHandler interface", genericPleg)
+       }
        return &EventedPLEG{
                runtime:                     runtime,
                runtimeService:              runtimeService,
                eventChannel:                eventChannel,
                cache:                       cache,
-               genericPleg:                 genericPleg,
+               genericPleg:                 handler,
                eventedPlegMaxStreamRetries: eventedPlegMaxStreamRetries,
                relistDuration:              relistDuration,
                clock:                       clock,
-       }
+       }, nil
 }

--- a/pkg/kubelet/kubelet.go
+++ b/pkg/kubelet/kubelet.go
@@ -734,8 +734,11 @@ func NewMainKubelet(kubeCfg *kubeletconfiginternal.KubeletConfiguration,
                        RelistPeriod:    genericPlegRelistPeriod,
                        RelistThreshold: genericPlegRelistThreshold,
                }
-               klet.eventedPleg = pleg.NewEventedPLEG(klet.containerRuntime, klet.runtimeService, eventChannel,
+               klet.eventedPleg, err = pleg.NewEventedPLEG(klet.containerRuntime, klet.runtimeService, eventChannel,
                        klet.podCache, klet.pleg, eventedPlegMaxStreamRetries, eventedRelistDuration, clock.RealClock{})
+               if err != nil {
+                       return nil, err
+               }
        } else {
                genericRelistDuration := &pleg.RelistDuration{
                        RelistPeriod:    genericPlegRelistPeriod,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bart0sh updated. Thanks.

@harche
Copy link
Contributor Author

harche commented Apr 26, 2023

/test

@k8s-ci-robot
Copy link
Contributor

@harche: The /test command needs one or more targets.
The following commands are available to trigger required jobs:

  • /test pull-cadvisor-e2e-kubernetes
  • /test pull-kubernetes-conformance-kind-ga-only-parallel
  • /test pull-kubernetes-coverage-unit
  • /test pull-kubernetes-dependencies
  • /test pull-kubernetes-dependencies-go-canary
  • /test pull-kubernetes-e2e-gce
  • /test pull-kubernetes-e2e-gce-100-performance
  • /test pull-kubernetes-e2e-gce-big-performance
  • /test pull-kubernetes-e2e-gce-canary
  • /test pull-kubernetes-e2e-gce-cos
  • /test pull-kubernetes-e2e-gce-cos-canary
  • /test pull-kubernetes-e2e-gce-cos-no-stage
  • /test pull-kubernetes-e2e-gce-network-proxy-http-connect
  • /test pull-kubernetes-e2e-gce-scale-performance-manual
  • /test pull-kubernetes-e2e-kind
  • /test pull-kubernetes-e2e-kind-ipv6
  • /test pull-kubernetes-integration
  • /test pull-kubernetes-integration-go-canary
  • /test pull-kubernetes-kubemark-e2e-gce-scale
  • /test pull-kubernetes-node-e2e-containerd
  • /test pull-kubernetes-typecheck
  • /test pull-kubernetes-unit
  • /test pull-kubernetes-unit-go-canary
  • /test pull-kubernetes-update
  • /test pull-kubernetes-verify
  • /test pull-kubernetes-verify-go-canary

The following commands are available to trigger optional jobs:

  • /test check-dependency-stats
  • /test pull-ci-kubernetes-unit-windows
  • /test pull-e2e-gce-cloud-provider-disabled
  • /test pull-kubernetes-conformance-image-test
  • /test pull-kubernetes-conformance-kind-ga-only
  • /test pull-kubernetes-conformance-kind-ipv6-parallel
  • /test pull-kubernetes-cos-cgroupv1-containerd-node-e2e
  • /test pull-kubernetes-cos-cgroupv1-containerd-node-e2e-features
  • /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e
  • /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-eviction
  • /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-features
  • /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-serial
  • /test pull-kubernetes-cross
  • /test pull-kubernetes-e2e-autoscaling-hpa-cm
  • /test pull-kubernetes-e2e-autoscaling-hpa-cpu
  • /test pull-kubernetes-e2e-capz-azure-disk
  • /test pull-kubernetes-e2e-capz-azure-disk-vmss
  • /test pull-kubernetes-e2e-capz-azure-file
  • /test pull-kubernetes-e2e-capz-azure-file-vmss
  • /test pull-kubernetes-e2e-capz-conformance
  • /test pull-kubernetes-e2e-capz-windows
  • /test pull-kubernetes-e2e-capz-windows-alpha-features
  • /test pull-kubernetes-e2e-capz-windows-serial-slow-hpa
  • /test pull-kubernetes-e2e-containerd-gce
  • /test pull-kubernetes-e2e-gce-correctness
  • /test pull-kubernetes-e2e-gce-cos-alpha-features
  • /test pull-kubernetes-e2e-gce-cos-kubetest2
  • /test pull-kubernetes-e2e-gce-csi-serial
  • /test pull-kubernetes-e2e-gce-device-plugin-gpu
  • /test pull-kubernetes-e2e-gce-network-proxy-grpc
  • /test pull-kubernetes-e2e-gce-serial
  • /test pull-kubernetes-e2e-gce-storage-disruptive
  • /test pull-kubernetes-e2e-gce-storage-slow
  • /test pull-kubernetes-e2e-gce-storage-snapshot
  • /test pull-kubernetes-e2e-gci-gce-autoscaling
  • /test pull-kubernetes-e2e-gci-gce-ingress
  • /test pull-kubernetes-e2e-gci-gce-ipvs
  • /test pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2
  • /test pull-kubernetes-e2e-kind-canary
  • /test pull-kubernetes-e2e-kind-dual-canary
  • /test pull-kubernetes-e2e-kind-ipv6-canary
  • /test pull-kubernetes-e2e-kind-ipvs-dual-canary
  • /test pull-kubernetes-e2e-kind-kms
  • /test pull-kubernetes-e2e-kind-multizone
  • /test pull-kubernetes-e2e-kops-aws
  • /test pull-kubernetes-e2e-kubelet-credential-provider
  • /test pull-kubernetes-e2e-ubuntu-gce-network-policies
  • /test pull-kubernetes-kind-dra
  • /test pull-kubernetes-kind-json-logging
  • /test pull-kubernetes-kind-text-logging
  • /test pull-kubernetes-kubemark-e2e-gce-big
  • /test pull-kubernetes-local-e2e
  • /test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
  • /test pull-kubernetes-node-crio-cgrpv2-e2e
  • /test pull-kubernetes-node-crio-cgrpv2-e2e-kubetest2
  • /test pull-kubernetes-node-crio-e2e
  • /test pull-kubernetes-node-crio-e2e-kubetest2
  • /test pull-kubernetes-node-e2e-containerd-alpha-features
  • /test pull-kubernetes-node-e2e-containerd-features
  • /test pull-kubernetes-node-e2e-containerd-features-kubetest2
  • /test pull-kubernetes-node-e2e-containerd-kubetest2
  • /test pull-kubernetes-node-e2e-containerd-sidecar-containers
  • /test pull-kubernetes-node-e2e-containerd-standalone-mode
  • /test pull-kubernetes-node-e2e-containerd-standalone-mode-all-alpha
  • /test pull-kubernetes-node-kubelet-credential-provider
  • /test pull-kubernetes-node-kubelet-serial-containerd
  • /test pull-kubernetes-node-kubelet-serial-containerd-kubetest2
  • /test pull-kubernetes-node-kubelet-serial-cpu-manager
  • /test pull-kubernetes-node-kubelet-serial-cpu-manager-kubetest2
  • /test pull-kubernetes-node-kubelet-serial-crio-cgroupv1
  • /test pull-kubernetes-node-kubelet-serial-crio-cgroupv2
  • /test pull-kubernetes-node-kubelet-serial-hugepages
  • /test pull-kubernetes-node-kubelet-serial-memory-manager
  • /test pull-kubernetes-node-kubelet-serial-pod-disruption-conditions
  • /test pull-kubernetes-node-kubelet-serial-topology-manager
  • /test pull-kubernetes-node-kubelet-serial-topology-manager-kubetest2
  • /test pull-kubernetes-node-memoryqos-cgrpv2
  • /test pull-kubernetes-node-swap-fedora
  • /test pull-kubernetes-node-swap-fedora-serial
  • /test pull-kubernetes-node-swap-ubuntu-serial
  • /test pull-kubernetes-unit-experimental
  • /test pull-kubernetes-verify-strict-lint
  • /test pull-publishing-bot-validate

Use /test all to run the following jobs that were automatically triggered:

  • pull-kubernetes-conformance-kind-ga-only-parallel
  • pull-kubernetes-dependencies
  • pull-kubernetes-e2e-gce
  • pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2
  • pull-kubernetes-e2e-kind
  • pull-kubernetes-e2e-kind-ipv6
  • pull-kubernetes-integration
  • pull-kubernetes-node-e2e-containerd
  • pull-kubernetes-typecheck
  • pull-kubernetes-unit
  • pull-kubernetes-verify

In response to this:

/test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@harche
Copy link
Contributor Author

harche commented Apr 26, 2023

/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e

Signed-off-by: Harshal Patil <harpatil@redhat.com>
Co-authored-by: Ed Bartosh <eduard.bartosh@intel.com>
@harche
Copy link
Contributor Author

harche commented Apr 26, 2023

/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e

@rphillips
Copy link
Member

/remove-lifecycle stale
/lgtm

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 26, 2023
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 26, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 7529e3827a88cc5df6a2e67f6a572c732ce09426

@rphillips
Copy link
Member

/assign @mrunalp

@bart0sh bart0sh moved this from Waiting on Author to Needs Approver in SIG Node PR Triage Apr 26, 2023
@bart0sh
Copy link
Contributor

bart0sh commented Aug 4, 2023

Oh, this is still pending approval!

/assign @mrunalp @dchen1107
for the final approval

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: harche, mrunalp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 16, 2023
@k8s-ci-robot k8s-ci-robot merged commit c5815fe into kubernetes:master Oct 17, 2023
12 of 13 checks passed
SIG Node PR Triage automation moved this from Needs Approver to Done Oct 17, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.29 milestone Oct 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

None yet

9 participants