Skip to content

Commit

Permalink
UPSTREAM: <carry>: disable load balancing on created cgroups when man…
Browse files Browse the repository at this point in the history
…aged is enabled

Previously, cpu load balancing was enabled in cri-o by manually changing the sched_domain of cpus in sysfs.
However, RHEL 9 dropped support for this knob, instead requiring it be changed in cgroups directly.

To enable cpu load balancing on cgroupv1, the specified cgroup must have cpuset.sched_load_balance set to 0, as well as
all of that cgroup's parents, plus all of the cgroups that contain a subset of the cpus that load balancing is disabled for.

By default, all cpusets inherit the set from their parent and sched_load_balance as 1. Since we need to keep the cpus that need
load balancing disabled in the root cgroup, all slices will inherit the full cpuset.

Rather than rebalancing every cgroup whenever a new guaranteed cpuset cgroup is created, the approach this PR takes is to
set load balancing to disabled for all slices. Since slices definitionally don't have any processes in them, setting load balancing won't
affect the actual scheduling decisions of the kernel. All it will do is open the opportunity for CRI-O to set the actually set load balancing to
disabled for containers that request it.

Signed-off-by: Peter Hunt <pehunt@redhat.com>

UPSTREAM: <carry>: kubelet/cm: disable cpu load balancing on slices when using static cpu manager policy

There are situations where cpu load balance disabling is desired when the kubelet is not in managed state.
Instead of using that condition, set the cpu load balancing parameter for new slices when the cpu policy is static

Signed-off-by: Peter Hunt <pehunt@redhat.com>

UPSTREAM: <carry>: cm: reorder setting of sched_load_balance for sandbox slice

If we call mgr.Apply() first, libcontainer's cpusetCopyIfNeeded()
will copy the parent cpuset and set load balancing to 1 by default.
This causes the kernel to set the cpus to not load balanced for a brief moment
which causes churn.

instead, create the cgroup and set load balance, then have Apply() copy the values into it.

Signed-off-by: Peter Hunt <pehunt@redhat.com>

UPSTREAM: <carry>: kubelet/cm: use MkdirAll when creating cpuset to ignore file exists error

Signed-off-by: Peter Hunt <pehunt@redhat.com>
  • Loading branch information
haircommander authored and bertinatto committed Mar 6, 2024
1 parent 45e5ccf commit ec482bf
Show file tree
Hide file tree
Showing 4 changed files with 36 additions and 1 deletion.
29 changes: 28 additions & 1 deletion pkg/kubelet/cm/cgroup_manager_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ import (
"sync"
"time"

"github.com/opencontainers/runc/libcontainer/cgroups"
libcontainercgroups "github.com/opencontainers/runc/libcontainer/cgroups"
"github.com/opencontainers/runc/libcontainer/cgroups/fscommon"
"github.com/opencontainers/runc/libcontainer/cgroups/manager"
Expand Down Expand Up @@ -150,6 +151,10 @@ type cgroupManagerImpl struct {

// useSystemd tells if systemd cgroup manager should be used.
useSystemd bool

// cpuLoadBalanceDisable tells whether kubelet should disable
// cpu load balancing on new cgroups it creates.
cpuLoadBalanceDisable bool
}

// Make sure that cgroupManagerImpl implements the CgroupManager interface
Expand Down Expand Up @@ -460,6 +465,25 @@ func (m *cgroupManagerImpl) Create(cgroupConfig *CgroupConfig) error {
return err
}

// Disable cpuset.sched_load_balance for all cgroups Kubelet creates.
// This way, CRI can disable sched_load_balance for pods that must have load balance
// disabled, but the slices can contain all cpus (as the guaranteed cpus are known dynamically).
// Note: this should be done before Apply(-1) below, as Apply contains cpusetCopyIfNeeded(), which will
// populate the cpuset with the parent's cpuset. However, it will be initialized to sched_load_balance=1
// which will cause the kernel to move all cpusets out of their isolated sched_domain, causing unnecessary churn.
if m.cpuLoadBalanceDisable && !libcontainercgroups.IsCgroup2UnifiedMode() {
path := manager.Path("cpuset")
if path == "" {
return fmt.Errorf("Failed to find cpuset for newly created cgroup")
}
if err := os.MkdirAll(path, 0o755); err != nil {
return fmt.Errorf("failed to create cpuset for newly created cgroup: %w", err)
}
if err := cgroups.WriteFile(path, "cpuset.sched_load_balance", "0"); err != nil {
return err
}
}

// Apply(-1) is a hack to create the cgroup directories for each resource
// subsystem. The function [cgroups.Manager.apply()] applies cgroup
// configuration to the process with the specified pid.
Expand All @@ -475,7 +499,6 @@ func (m *cgroupManagerImpl) Create(cgroupConfig *CgroupConfig) error {
if err := manager.Set(libcontainerCgroupConfig.Resources); err != nil {
utilruntime.HandleError(fmt.Errorf("cgroup manager.Set failed: %w", err))
}

return nil
}

Expand Down Expand Up @@ -747,3 +770,7 @@ func (m *cgroupManagerImpl) SetCgroupConfig(name CgroupName, resource v1.Resourc
}
return nil
}

func (m *cgroupManagerImpl) SetCPULoadBalanceDisable() {
m.cpuLoadBalanceDisable = true
}
3 changes: 3 additions & 0 deletions pkg/kubelet/cm/cgroup_manager_unsupported.go
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,9 @@ func (m *unsupportedCgroupManager) SetCgroupConfig(name CgroupName, resource v1.
return errNotSupported
}

func (m *unsupportedCgroupManager) SetCPULoadBalanceDisable() {
}

var RootCgroupName = CgroupName([]string{})

func NewCgroupName(base CgroupName, components ...string) CgroupName {
Expand Down
3 changes: 3 additions & 0 deletions pkg/kubelet/cm/container_manager_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -248,6 +248,9 @@ func NewContainerManager(mountUtil mount.Interface, cadvisorInterface cadvisor.I
// Turn CgroupRoot from a string (in cgroupfs path format) to internal CgroupName
cgroupRoot := ParseCgroupfsToCgroupName(nodeConfig.CgroupRoot)
cgroupManager := NewCgroupManager(subsystems, nodeConfig.CgroupDriver)
if nodeConfig.CPUManagerPolicy == string(cpumanager.PolicyStatic) {
cgroupManager.SetCPULoadBalanceDisable()
}
// Check if Cgroup-root actually exists on the node
if nodeConfig.CgroupsPerQOS {
// this does default to / when enabled, but this tests against regressions.
Expand Down
2 changes: 2 additions & 0 deletions pkg/kubelet/cm/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,8 @@ type CgroupManager interface {
GetCgroupConfig(name CgroupName, resource v1.ResourceName) (*ResourceConfig, error)
// Set resource config for the specified resource type on the cgroup
SetCgroupConfig(name CgroupName, resource v1.ResourceName, resourceConfig *ResourceConfig) error
// Toggle whether CPU load balancing should be disabled for new cgroups the kubelet creates
SetCPULoadBalanceDisable()
}

// QOSContainersInfo stores the names of containers per qos
Expand Down

0 comments on commit ec482bf

Please sign in to comment.