Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature-gate SidecarContainers code in pkg/kubelet/kuberuntime #120281

Conversation

gjkim42
Copy link
Member

@gjkim42 gjkim42 commented Aug 31, 2023

What type of PR is this?

/kind bug
/kind regression

What this PR does / why we need it:

This feature-gates SidecarContainers related code in pkg/kubelet/kuberuntime to prevent possible regressions from v1.27.

This will address the #120247 with feature gate disabled.
The case with feature gate enabled will be addressed by #120269 after this PR.

Which issue(s) this PR fixes:

Fixes #120247

Special notes for your reviewer:

$ git diff upstream/release-1.27 -- pkg/kubelet/kuberuntime/kuberuntime_container.go pkg/kubelet/kuberuntime/kuberuntime_manager.go
diff --git a/pkg/kubelet/kuberuntime/kuberuntime_container.go b/pkg/kubelet/kuberuntime/kuberuntime_container.go
index 3fd849e4ae2..97abc229335 100644
--- a/pkg/kubelet/kuberuntime/kuberuntime_container.go
+++ b/pkg/kubelet/kuberuntime/kuberuntime_container.go
@@ -52,6 +52,7 @@ import (
 	kubecontainer "k8s.io/kubernetes/pkg/kubelet/container"
 	"k8s.io/kubernetes/pkg/kubelet/cri/remote"
 	"k8s.io/kubernetes/pkg/kubelet/events"
+	proberesults "k8s.io/kubernetes/pkg/kubelet/prober/results"
 	"k8s.io/kubernetes/pkg/kubelet/types"
 	"k8s.io/kubernetes/pkg/kubelet/util/format"
 	"k8s.io/kubernetes/pkg/util/tail"
@@ -136,7 +137,6 @@ func calcRestartCountByLogDir(path string) (int, error) {
 	if _, err := os.Stat(path); err != nil {
 		return 0, nil
 	}
-	restartCount := int(0)
 	files, err := os.ReadDir(path)
 	if err != nil {
 		return 0, err
@@ -144,6 +144,7 @@ func calcRestartCountByLogDir(path string) (int, error) {
 	if len(files) == 0 {
 		return 0, nil
 	}
+	restartCount := 0
 	restartCountLogFileRegex := regexp.MustCompile(`^(\d+)\.log(\..*)?`)
 	for _, file := range files {
 		if file.IsDir() {
@@ -328,7 +329,7 @@ func (m *kubeGenericRuntimeManager) generateContainerConfig(ctx context.Context,
 			Name:    container.Name,
 			Attempt: restartCountUint32,
 		},
-		Image:       &runtimeapi.ImageSpec{Image: imageRef},
+		Image:       &runtimeapi.ImageSpec{Image: imageRef, UserSpecifiedImage: container.Image},
 		Command:     command,
 		Args:        args,
 		WorkingDir:  container.WorkingDir,
@@ -725,19 +726,21 @@ func (m *kubeGenericRuntimeManager) killContainer(ctx context.Context, pod *v1.P
 	}
 	m.recordContainerEvent(pod, containerSpec, containerID.ID, v1.EventTypeNormal, events.KillingContainer, message)
 
+	if gracePeriodOverride != nil {
+		gracePeriod = *gracePeriodOverride
+		klog.V(3).InfoS("Killing container with a grace period override", "pod", klog.KObj(pod), "podUID", pod.UID,
+			"containerName", containerName, "containerID", containerID.String(), "gracePeriod", gracePeriod)
+	}
+
 	// Run the pre-stop lifecycle hooks if applicable and if there is enough time to run it
 	if containerSpec.Lifecycle != nil && containerSpec.Lifecycle.PreStop != nil && gracePeriod > 0 {
 		gracePeriod = gracePeriod - m.executePreStopHook(ctx, pod, containerID, containerSpec, gracePeriod)
 	}
+
 	// always give containers a minimal shutdown window to avoid unnecessary SIGKILLs
 	if gracePeriod < minimumGracePeriodInSeconds {
 		gracePeriod = minimumGracePeriodInSeconds
 	}
-	if gracePeriodOverride != nil {
-		gracePeriod = *gracePeriodOverride
-		klog.V(3).InfoS("Killing container with a grace period override", "pod", klog.KObj(pod), "podUID", pod.UID,
-			"containerName", containerName, "containerID", containerID.String(), "gracePeriod", gracePeriod)
-	}
 
 	klog.V(2).InfoS("Killing container with a grace period", "pod", klog.KObj(pod), "podUID", pod.UID,
 		"containerName", containerName, "containerID", containerID.String(), "gracePeriod", gracePeriod)
@@ -903,6 +906,230 @@ func findNextInitContainerToRun(pod *v1.Pod, podStatus *kubecontainer.PodStatus)
 	return nil, &pod.Spec.InitContainers[0], false
 }
 
+// hasAnyRegularContainerCreated returns true if any regular container has been
+// created, which indicates all init containers have been initialized.
+func hasAnyRegularContainerCreated(pod *v1.Pod, podStatus *kubecontainer.PodStatus) bool {
+	for _, container := range pod.Spec.Containers {
+		status := podStatus.FindContainerStatusByName(container.Name)
+		if status == nil {
+			continue
+		}
+		switch status.State {
+		case kubecontainer.ContainerStateCreated,
+			kubecontainer.ContainerStateRunning,
+			kubecontainer.ContainerStateExited:
+			return true
+		default:
+			// Ignore other states
+		}
+	}
+	return false
+}
+
+// computeInitContainerActions sets the actions on the given changes that need
+// to be taken for the init containers. This includes actions to initialize the
+// init containers and actions to keep restartable init containers running.
+// computeInitContainerActions returns true if pod has been initialized.
+//
+// The actions include:
+// - Start the first init container that has not been started.
+// - Restart all restartable init containers that have started but are not running.
+// - Kill the restartable init containers that are not alive or started.
+//
+// Note that this is a function for the SidecarContainers feature.
+// Please sync with the findNextInitContainerToRun function if any changes are
+// made, as either this or that function will be called.
+func (m *kubeGenericRuntimeManager) computeInitContainerActions(pod *v1.Pod, podStatus *kubecontainer.PodStatus, changes *podActions) bool {
+	if len(pod.Spec.InitContainers) == 0 {
+		return true
+	}
+
+	// If any of the main containers have status and are Running, then all init containers must
+	// have been executed at some point in the past.  However, they could have been removed
+	// from the container runtime now, and if we proceed, it would appear as if they
+	// never ran and will re-execute improperly except for the restartable init containers.
+	podHasInitialized := hasAnyRegularContainerCreated(pod, podStatus)
+
+	// isPreviouslyInitialized indicates if the current init container is
+	// previously initialized.
+	isPreviouslyInitialized := podHasInitialized
+	restartOnFailure := shouldRestartOnFailure(pod)
+
+	// Note that we iterate through the init containers in reverse order to find
+	// the next init container to run, as the completed init containers may get
+	// removed from container runtime for various reasons. Therefore the kubelet
+	// should rely on the minimal number of init containers - the last one.
+	//
+	// Once we find the next init container to run, iterate through the rest to
+	// find the restartable init containers to restart.
+	for i := len(pod.Spec.InitContainers) - 1; i >= 0; i-- {
+		container := &pod.Spec.InitContainers[i]
+		status := podStatus.FindContainerStatusByName(container.Name)
+		klog.V(4).InfoS("Computing init container action", "pod", klog.KObj(pod), "container", container.Name, "status", status)
+		if status == nil {
+			// If the container is previously initialized but its status is not
+			// found, it means its last status is removed for some reason.
+			// Restart it if it is a restartable init container.
+			if isPreviouslyInitialized && types.IsRestartableInitContainer(container) {
+				changes.InitContainersToStart = append(changes.InitContainersToStart, i)
+			}
+			continue
+		}
+
+		if isPreviouslyInitialized && !types.IsRestartableInitContainer(container) {
+			// after initialization, only restartable init containers need to be kept
+			// running
+			continue
+		}
+
+		switch status.State {
+		case kubecontainer.ContainerStateCreated:
+			// nothing to do but wait for it to start
+
+		case kubecontainer.ContainerStateRunning:
+			if !types.IsRestartableInitContainer(container) {
+				break
+			}
+
+			if types.IsRestartableInitContainer(container) {
+				if container.StartupProbe != nil {
+					startup, found := m.startupManager.Get(status.ID)
+					if !found {
+						// If the startup probe has not been run, wait for it.
+						break
+					}
+					if startup != proberesults.Success {
+						if startup == proberesults.Failure {
+							// If the restartable init container failed the startup probe,
+							// restart it.
+							changes.ContainersToKill[status.ID] = containerToKillInfo{
+								name:      container.Name,
+								container: container,
+								message:   fmt.Sprintf("Init container %s failed startup probe", container.Name),
+								reason:    reasonStartupProbe,
+							}
+							changes.InitContainersToStart = append(changes.InitContainersToStart, i)
+						}
+						break
+					}
+				}
+
+				klog.V(4).InfoS("Init container has been initialized", "pod", klog.KObj(pod), "container", container.Name)
+				if i == (len(pod.Spec.InitContainers) - 1) {
+					podHasInitialized = true
+				} else if !isPreviouslyInitialized {
+					// this init container is initialized for the first time, start the next one
+					changes.InitContainersToStart = append(changes.InitContainersToStart, i+1)
+				}
+
+				// A restartable init container does not have to take into account its
+				// liveness probe when it determines to start the next init container.
+				if container.LivenessProbe != nil {
+					liveness, found := m.livenessManager.Get(status.ID)
+					if !found {
+						// If the liveness probe has not been run, wait for it.
+						break
+					}
+					if liveness == proberesults.Failure {
+						// If the restartable init container failed the liveness probe,
+						// restart it.
+						changes.ContainersToKill[status.ID] = containerToKillInfo{
+							name:      container.Name,
+							container: container,
+							message:   fmt.Sprintf("Init container %s failed liveness probe", container.Name),
+							reason:    reasonLivenessProbe,
+						}
+						changes.InitContainersToStart = append(changes.InitContainersToStart, i)
+					}
+				}
+			} else { // init container
+				// nothing do to but wait for it to finish
+				break
+			}
+
+		// If the init container failed and the restart policy is Never, the pod is terminal.
+		// Otherwise, restart the init container.
+		case kubecontainer.ContainerStateExited:
+			if types.IsRestartableInitContainer(container) {
+				changes.InitContainersToStart = append(changes.InitContainersToStart, i)
+			} else { // init container
+				if isInitContainerFailed(status) {
+					if !restartOnFailure {
+						changes.KillPod = true
+						changes.InitContainersToStart = nil
+						return false
+					}
+					changes.InitContainersToStart = append(changes.InitContainersToStart, i)
+					break
+				}
+
+				klog.V(4).InfoS("Init container has been initialized", "pod", klog.KObj(pod), "container", container.Name)
+				if i == (len(pod.Spec.InitContainers) - 1) {
+					podHasInitialized = true
+				} else {
+					// this init container is initialized for the first time, start the next one
+					changes.InitContainersToStart = append(changes.InitContainersToStart, i+1)
+				}
+			}
+
+		default: // kubecontainer.ContainerStatusUnknown or other unknown states
+			if types.IsRestartableInitContainer(container) {
+				// If the restartable init container is in unknown state, restart it.
+				changes.ContainersToKill[status.ID] = containerToKillInfo{
+					name:      container.Name,
+					container: container,
+					message: fmt.Sprintf("Init container is in %q state, try killing it before restart",
+						status.State),
+					reason: reasonUnknown,
+				}
+				changes.InitContainersToStart = append(changes.InitContainersToStart, i)
+			} else { // init container
+				if !isInitContainerFailed(status) {
+					klog.V(4).InfoS("This should not happen, init container is in unknown state but not failed", "pod", klog.KObj(pod), "containerStatus", status)
+				}
+
+				if !restartOnFailure {
+					changes.KillPod = true
+					changes.InitContainersToStart = nil
+					return false
+				}
+
+				// If the init container is in unknown state, restart it.
+				changes.ContainersToKill[status.ID] = containerToKillInfo{
+					name:      container.Name,
+					container: container,
+					message: fmt.Sprintf("Init container is in %q state, try killing it before restart",
+						status.State),
+					reason: reasonUnknown,
+				}
+				changes.InitContainersToStart = append(changes.InitContainersToStart, i)
+			}
+		}
+
+		if !isPreviouslyInitialized {
+			// the one before this init container has been initialized
+			isPreviouslyInitialized = true
+		}
+	}
+
+	// this means no init containers have been started,
+	// start the first one
+	if !isPreviouslyInitialized {
+		changes.InitContainersToStart = append(changes.InitContainersToStart, 0)
+	}
+
+	// reverse the InitContainersToStart, as the above loop iterated through the
+	// init containers backwards, but we want to start them as per the order in
+	// the pod spec.
+	l := len(changes.InitContainersToStart)
+	for i := 0; i < l/2; i++ {
+		changes.InitContainersToStart[i], changes.InitContainersToStart[l-1-i] =
+			changes.InitContainersToStart[l-1-i], changes.InitContainersToStart[i]
+	}
+
+	return podHasInitialized
+}
+
 // GetContainerLogs returns logs of a specific container.
 func (m *kubeGenericRuntimeManager) GetContainerLogs(ctx context.Context, pod *v1.Pod, containerID kubecontainer.ContainerID, logOptions *v1.PodLogOptions, stdout, stderr io.Writer) (err error) {
 	resp, err := m.runtimeService.ContainerStatus(ctx, containerID.ID, false)
diff --git a/pkg/kubelet/kuberuntime/kuberuntime_manager.go b/pkg/kubelet/kuberuntime/kuberuntime_manager.go
index 9c9cc56d5ae..9854b207fd0 100644
--- a/pkg/kubelet/kuberuntime/kuberuntime_manager.go
+++ b/pkg/kubelet/kuberuntime/kuberuntime_manager.go
@@ -26,6 +26,7 @@ import (
 	"time"
 
 	cadvisorapi "github.com/google/cadvisor/info/v1"
+	"github.com/google/go-cmp/cmp"
 	"go.opentelemetry.io/otel/trace"
 	crierror "k8s.io/cri-api/pkg/errors"
 	"k8s.io/klog/v2"
@@ -34,7 +35,6 @@ import (
 	"k8s.io/apimachinery/pkg/api/resource"
 	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
 	kubetypes "k8s.io/apimachinery/pkg/types"
-	"k8s.io/apimachinery/pkg/util/diff"
 	utilruntime "k8s.io/apimachinery/pkg/util/runtime"
 	utilversion "k8s.io/apimachinery/pkg/util/version"
 	utilfeature "k8s.io/apiserver/pkg/util/feature"
@@ -494,9 +494,15 @@ type podActions struct {
 
 	// The next init container to start.
 	NextInitContainerToStart *v1.Container
+	// InitContainersToStart keeps a list of indexes for the init containers to
+	// start, where the index is the index of the specific init container in the
+	// pod spec (pod.Spec.InitContainers).
+	// NOTE: This is a field for SidecarContainers feature. Either this or
+	// NextInitContainerToStart will be set.
+	InitContainersToStart []int
 	// ContainersToStart keeps a list of indexes for the containers to start,
 	// where the index is the index of the specific container in the pod spec (
-	// pod.Spec.Containers.
+	// pod.Spec.Containers).
 	ContainersToStart []int
 	// ContainersToKill keeps a map of containers that need to be killed, note that
 	// the key is the container ID of the container, while
@@ -512,6 +518,11 @@ type podActions struct {
 	UpdatePodResources bool
 }
 
+func (p podActions) String() string {
+	return fmt.Sprintf("KillPod: %t, CreateSandbox: %t, UpdatePodResources: %t, Attempt: %d, InitContainersToStart: %v, ContainersToStart: %v, EphemeralContainersToStart: %v,ContainersToUpdate: %v, ContainersToKill: %v",
+		p.KillPod, p.CreateSandbox, p.UpdatePodResources, p.Attempt, p.InitContainersToStart, p.ContainersToStart, p.EphemeralContainersToStart, p.ContainersToUpdate, p.ContainersToKill)
+}
+
 func containerChanged(container *v1.Container, containerStatus *kubecontainer.Status) (uint64, uint64, bool) {
 	expectedHash := kubecontainer.HashContainer(container)
 	return expectedHash, containerStatus.Hash, containerStatus.Hash != expectedHash
@@ -553,7 +564,7 @@ func (m *kubeGenericRuntimeManager) computePodResizeAction(pod *v1.Pod, containe
 	if !exists || apiContainerStatus.State.Running == nil || apiContainerStatus.Resources == nil ||
 		kubeContainerStatus.State != kubecontainer.ContainerStateRunning ||
 		kubeContainerStatus.ID.String() != apiContainerStatus.ContainerID ||
-		len(diff.ObjectDiff(container.Resources.Requests, apiContainerStatus.AllocatedResources)) != 0 {
+		!cmp.Equal(container.Resources.Requests, apiContainerStatus.AllocatedResources) {
 		return true
 	}
 
@@ -827,7 +838,7 @@ func (m *kubeGenericRuntimeManager) computePodActions(ctx context.Context, pod *
 	if createPodSandbox {
 		if !shouldRestartOnFailure(pod) && attempt != 0 && len(podStatus.ContainerStatuses) != 0 {
 			// Should not restart the pod, just return.
-			// we should not create a sandbox for a pod if it is already done.
+			// we should not create a sandbox, and just kill the pod if it is already done.
 			// if all containers are done and should not be started, there is no need to create a new sandbox.
 			// this stops confusing logs on pods whose containers all have exit codes, but we recreate a sandbox before terminating it.
 			//
@@ -846,18 +857,35 @@ func (m *kubeGenericRuntimeManager) computePodActions(ctx context.Context, pod *
 			}
 			containersToStart = append(containersToStart, idx)
 		}
-		// We should not create a sandbox for a Pod if initialization is done and there is no container to start.
+
+		// We should not create a sandbox, and just kill the pod if initialization
+		// is done and there is no container to start.
 		if len(containersToStart) == 0 {
-			_, _, done := findNextInitContainerToRun(pod, podStatus)
-			if done {
+			hasInitialized := false
+			if !utilfeature.DefaultFeatureGate.Enabled(features.SidecarContainers) {
+				_, _, hasInitialized = findNextInitContainerToRun(pod, podStatus)
+			} else {
+				// If there is any regular container, it means all init containers have
+				// been initialized.
+				hasInitialized = hasAnyRegularContainerCreated(pod, podStatus)
+			}
+
+			if hasInitialized {
 				changes.CreateSandbox = false
 				return changes
 			}
 		}
 
+		// If we are creating a pod sandbox, we should restart from the initial
+		// state.
 		if len(pod.Spec.InitContainers) != 0 {
 			// Pod has init containers, return the first one.
-			changes.NextInitContainerToStart = &pod.Spec.InitContainers[0]
+			if !utilfeature.DefaultFeatureGate.Enabled(features.SidecarContainers) {
+				changes.NextInitContainerToStart = &pod.Spec.InitContainers[0]
+			} else {
+				changes.InitContainersToStart = []int{0}
+			}
+
 			return changes
 		}
 		changes.ContainersToStart = containersToStart
@@ -875,29 +903,38 @@ func (m *kubeGenericRuntimeManager) computePodActions(ctx context.Context, pod *
 	}
 
 	// Check initialization progress.
-	initLastStatus, next, done := findNextInitContainerToRun(pod, podStatus)
-	if !done {
-		if next != nil {
-			initFailed := initLastStatus != nil && isInitContainerFailed(initLastStatus)
-			if initFailed && !shouldRestartOnFailure(pod) {
-				changes.KillPod = true
-			} else {
-				// Always try to stop containers in unknown state first.
-				if initLastStatus != nil && initLastStatus.State == kubecontainer.ContainerStateUnknown {
-					changes.ContainersToKill[initLastStatus.ID] = containerToKillInfo{
-						name:      next.Name,
-						container: next,
-						message: fmt.Sprintf("Init container is in %q state, try killing it before restart",
-							initLastStatus.State),
-						reason: reasonUnknown,
+	if !utilfeature.DefaultFeatureGate.Enabled(features.SidecarContainers) {
+		initLastStatus, next, done := findNextInitContainerToRun(pod, podStatus)
+		if !done {
+			if next != nil {
+				initFailed := initLastStatus != nil && isInitContainerFailed(initLastStatus)
+				if initFailed && !shouldRestartOnFailure(pod) {
+					changes.KillPod = true
+				} else {
+					// Always try to stop containers in unknown state first.
+					if initLastStatus != nil && initLastStatus.State == kubecontainer.ContainerStateUnknown {
+						changes.ContainersToKill[initLastStatus.ID] = containerToKillInfo{
+							name:      next.Name,
+							container: next,
+							message: fmt.Sprintf("Init container is in %q state, try killing it before restart",
+								initLastStatus.State),
+							reason: reasonUnknown,
+						}
 					}
+					changes.NextInitContainerToStart = next
 				}
-				changes.NextInitContainerToStart = next
 			}
+			// Initialization failed or still in progress. Skip inspecting non-init
+			// containers.
+			return changes
+		}
+	} else {
+		hasInitialized := m.computeInitContainerActions(pod, podStatus, &changes)
+		if changes.KillPod || !hasInitialized {
+			// Initialization failed or still in progress. Skip inspecting non-init
+			// containers.
+			return changes
 		}
-		// Initialization failed or still in progress. Skip inspecting non-init
-		// containers.
-		return changes
 	}
 
 	if isInPlacePodVerticalScalingAllowed(pod) {
@@ -993,6 +1030,11 @@ func (m *kubeGenericRuntimeManager) computePodActions(ctx context.Context, pod *
 
 	if keepCount == 0 && len(changes.ContainersToStart) == 0 {
 		changes.KillPod = true
+		if utilfeature.DefaultFeatureGate.Enabled(features.SidecarContainers) {
+			// To prevent the restartable init containers to keep pod alive, we should
+			// not restart them.
+			changes.InitContainersToStart = nil
+		}
 	}
 
 	return changes
@@ -1099,7 +1141,14 @@ func (m *kubeGenericRuntimeManager) SyncPod(ctx context.Context, pod *v1.Pod, po
 
 		// Prepare resources allocated by the Dynammic Resource Allocation feature for the pod
 		if utilfeature.DefaultFeatureGate.Enabled(features.DynamicResourceAllocation) {
-			if m.runtimeHelper.PrepareDynamicResources(pod) != nil {
+			if err := m.runtimeHelper.PrepareDynamicResources(pod); err != nil {
+				ref, referr := ref.GetReference(legacyscheme.Scheme, pod)
+				if referr != nil {
+					klog.ErrorS(referr, "Couldn't make a ref to pod", "pod", klog.KObj(pod))
+					return
+				}
+				m.recorder.Eventf(ref, v1.EventTypeWarning, events.FailedPrepareDynamicResources, "Failed to prepare dynamic resources: %v", err)
+				klog.ErrorS(err, "Failed to prepare dynamic resources", "pod", klog.KObj(pod))
 				return
 			}
 		}
@@ -1225,15 +1274,34 @@ func (m *kubeGenericRuntimeManager) SyncPod(ctx context.Context, pod *v1.Pod, po
 		start(ctx, "ephemeral container", metrics.EphemeralContainer, ephemeralContainerStartSpec(&pod.Spec.EphemeralContainers[idx]))
 	}
 
-	// Step 6: start the init container.
-	if container := podContainerChanges.NextInitContainerToStart; container != nil {
-		// Start the next init container.
-		if err := start(ctx, "init container", metrics.InitContainer, containerStartSpec(container)); err != nil {
-			return
+	if !utilfeature.DefaultFeatureGate.Enabled(features.SidecarContainers) {
+		// Step 6: start the init container.
+		if container := podContainerChanges.NextInitContainerToStart; container != nil {
+			// Start the next init container.
+			if err := start(ctx, "init container", metrics.InitContainer, containerStartSpec(container)); err != nil {
+				return
+			}
+
+			// Successfully started the container; clear the entry in the failure
+			klog.V(4).InfoS("Completed init container for pod", "containerName", container.Name, "pod", klog.KObj(pod))
 		}
+	} else {
+		// Step 6: start init containers.
+		for _, idx := range podContainerChanges.InitContainersToStart {
+			container := &pod.Spec.InitContainers[idx]
+			// Start the next init container.
+			if err := start(ctx, "init container", metrics.InitContainer, containerStartSpec(container)); err != nil {
+				if types.IsRestartableInitContainer(container) {
+					klog.V(4).InfoS("Failed to start the restartable init container for the pod, skipping", "initContainerName", container.Name, "pod", klog.KObj(pod))
+					continue
+				}
+				klog.V(4).InfoS("Failed to initialize the pod, as the init container failed to start, aborting", "initContainerName", container.Name, "pod", klog.KObj(pod))
+				return
+			}
 
-		// Successfully started the container; clear the entry in the failure
-		klog.V(4).InfoS("Completed init container for pod", "containerName", container.Name, "pod", klog.KObj(pod))
+			// Successfully started the container; clear the entry in the failure
+			klog.V(4).InfoS("Completed init container for pod", "containerName", container.Name, "pod", klog.KObj(pod))
+		}
 	}
 
 	// Step 7: For containers in podContainerChanges.ContainersToUpdate[CPU,Memory] list, invoke UpdateContainerResources

Does this PR introduce a user-facing change?

Fixed a 1.28 regression around restarting init containers in the right order relative to normal containers

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/753-sidecar-containers

/priority critical-urgent
/cc @liggitt
/cc @SergeyKanzhelev

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/regression Categorizes issue or PR as related to a regression from a prior release. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 31, 2023
@gjkim42
Copy link
Member Author

gjkim42 commented Aug 31, 2023

/sig node

@matthyx
Copy link
Contributor

matthyx commented Aug 31, 2023

So this one supercedes #120269 ?
Does it solve #120247 also with the feature gate enabled ?

@gjkim42
Copy link
Member Author

gjkim42 commented Aug 31, 2023

So this one supercedes #120269 ?

I think both are needed. This one addresses the #120247 and the potential regressions with the feature gate disabled.

#120269 will address the problem with the feature gate enabled.

@gjkim42
Copy link
Member Author

gjkim42 commented Aug 31, 2023

I'll add the e2e test here instead of including it in #120269.

@gjkim42 gjkim42 force-pushed the feature-gate-sidecar-containers-in-kuberuntime branch from 880697a to afcb601 Compare August 31, 2023 08:30
@k8s-ci-robot k8s-ci-robot added area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Aug 31, 2023
@gjkim42
Copy link
Member Author

gjkim42 commented Aug 31, 2023

/test pull-kubernetes-node-kubelet-serial-containerd
/test pull-kubernetes-node-e2e-containerd-sidecar-containers
/test pull-kubernetes-e2e-gce-cos-alpha-features

@gjkim42
Copy link
Member Author

gjkim42 commented Aug 31, 2023

/test pull-kubernetes-node-kubelet-serial-containerd

@gjkim42 gjkim42 force-pushed the feature-gate-sidecar-containers-in-kuberuntime branch from afcb601 to 2f2d983 Compare August 31, 2023 12:42
@gjkim42
Copy link
Member Author

gjkim42 commented Aug 31, 2023

/test pull-kubernetes-node-kubelet-serial-containerd

@bart0sh bart0sh added this to Triage in SIG Node PR Triage Aug 31, 2023
@gjkim42
Copy link
Member Author

gjkim42 commented Aug 31, 2023

succeeded, retest...

/test pull-kubernetes-node-kubelet-serial-containerd

This adds an e2e test to ensure that a pod should restart its containers
in right order after the pod sandbox is changed.
@gjkim42 gjkim42 force-pushed the feature-gate-sidecar-containers-in-kuberuntime branch from 2f2d983 to 696f84a Compare August 31, 2023 15:13
@mrunalp
Copy link
Contributor

mrunalp commented Sep 6, 2023

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 6, 2023
f := framework.NewDefaultFramework("containers-lifecycle-test-serial")
f.NamespacePodSecurityEnforceLevel = admissionapi.LevelPrivileged

ginkgo.It("should restart the containers in right order after the node reboot", func(ctx context.Context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test case assuming the feature gate is disabled? Would it pass with the feature gate enabled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it doesn't. We'll fix the case with feature gate enabled in #120269 after this.

@dims
Copy link
Member

dims commented Sep 6, 2023

/retest
/approve
/lgtm

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dims, gjkim42, klueska, mrunalp, SergeyKanzhelev

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dims
Copy link
Member

dims commented Sep 6, 2023

@SergeyKanzhelev @mrunalp is the release note expressive enough and captures what need to highlight?

@dims
Copy link
Member

dims commented Sep 6, 2023

@gjkim42 is there a cherry-pick already?

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Sep 6, 2023

@gjkim42: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-node-kubelet-serial-containerd 696f84a link false /test pull-kubernetes-node-kubelet-serial-containerd

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@gjkim42
Copy link
Member Author

gjkim42 commented Sep 6, 2023

@gjkim42 is there a cherry-pick already?

No it isn't yet, I'll create one to release 1.27.

@gjkim42
Copy link
Member Author

gjkim42 commented Sep 6, 2023

@gjkim42 is there a cherry-pick already?

No it isn't yet, I'll create one to release 1.27.

I'm sorry to release-1.28.

CREATED: #120440

@mrunalp
Copy link
Contributor

mrunalp commented Sep 6, 2023

Maybe something like:
Feature gate the changes introduced by SideCarContainers to avoid regressions

@k8s-ci-robot k8s-ci-robot merged commit debe30d into kubernetes:master Sep 6, 2023
17 of 18 checks passed
SIG Node PR Triage automation moved this from Needs Reviewer to Done Sep 6, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.29 milestone Sep 6, 2023
@gjkim42 gjkim42 deleted the feature-gate-sidecar-containers-in-kuberuntime branch September 6, 2023 01:42
@gjkim42 gjkim42 restored the feature-gate-sidecar-containers-in-kuberuntime branch September 6, 2023 01:42
@gjkim42 gjkim42 deleted the feature-gate-sidecar-containers-in-kuberuntime branch September 6, 2023 01:42
@gjkim42 gjkim42 restored the feature-gate-sidecar-containers-in-kuberuntime branch September 6, 2023 01:42
@gjkim42 gjkim42 deleted the feature-gate-sidecar-containers-in-kuberuntime branch September 6, 2023 01:42
@liggitt
Copy link
Member

liggitt commented Sep 6, 2023

the feature gate aspect isn't that import, clarifying it's a regression in 1.28 is helpful though. I updated the release note

@sftim
Copy link
Contributor

sftim commented Sep 6, 2023

Changelog suggestion

Fixed a regression around restarting init containers in the right order after a node reboot.

k8s-ci-robot added a commit that referenced this pull request Sep 6, 2023
…281-upstream-release-1.28

Automated cherry pick of #120281: Feature-gate SidecarContainers code in pkg/kubelet/kuberuntime
@gjkim42
Copy link
Member Author

gjkim42 commented Sep 6, 2023

Changelog suggestion

Fixed a regression around restarting init containers in the right order after a node reboot.

Thanks, updated it.

@cwayne18
Copy link

cwayne18 commented Sep 8, 2023

I assume this will go into Sept releases correct? Or would this necessitate an OOB release?

@liggitt
Copy link
Member

liggitt commented Sep 8, 2023

yes, this is on track for 1.28.2, targeting release on 2023-09-13 per https://kubernetes.io/releases/patch-releases/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/regression Categorizes issue or PR as related to a regression from a prior release. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

App container is started before initContainers could complete on node reboot