-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
daemon: stop setting container resources to zero #45704
Conversation
Switch to using t.TempDir() instead of rolling our own. Clean up mounts leaked by the tests as otherwise the tests fail due to the leaked mounts because unlike the old cleanup code, t.TempDir() cleanup does not ignore errors from os.RemoveAll. Signed-off-by: Cory Snider <csnider@mirantis.com>
// HORRIBLE HACK: clean up shm mounts leaked by some tests. Otherwise the | ||
// offending tests would fail due to the mounts blocking the temporary | ||
// directory from being cleaned up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked: only the tests are impacted. The shm mounts are not leaked on container exit—though I'm not sure where in the code they get unmounted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I was running into this one as well in #45474 (comment)
and https://github.com/moby/moby/pull/43346/files#diff-7044cd74671fe1aa629c3374b3b65fb57e7d47c6a9a20a23b539ee1425a88f2eL19-L52
(Not sure why I didn't use c.ShmPath
though 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good; one comment on the type change for Resources
// HORRIBLE HACK: clean up shm mounts leaked by some tests. Otherwise the | ||
// offending tests would fail due to the mounts blocking the temporary | ||
// directory from being cleaned up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I was running into this one as well in #45474 (comment)
and https://github.com/moby/moby/pull/43346/files#diff-7044cd74671fe1aa629c3374b3b65fb57e7d47c6a9a20a23b539ee1425a88f2eL19-L52
(Not sure why I didn't use c.ShmPath
though 🤔
daemon/oci_linux.go
Outdated
if r.BlkioWeight != 0 { | ||
w := r.BlkioWeight | ||
specResources.BlockIO.Weight = &w | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely not a blocker, but would it make sense to have a getBlkioWeight()
func (like we do for the other options above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. The other options require non-trivial processing, unlike this one. It could make sense to factor out a getBlockIOResources()
func like getMemoryResources()
, getCPUResources()
and getPidsLimit()
, though.
daemon/oci_linux.go
Outdated
if s.Linux.Resources != nil && len(s.Linux.Resources.Devices) > 0 { | ||
specResources.Devices = s.Linux.Resources.Devices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this, I feel like WithResources()
is doing too much, or we should split out a WithBlkIO()
- it looks like this part is here, to prevent WithResources()
from overwriting Devices
(which probably is set separately), so trying to merge multiple options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, ensuring that mutations to the spec by WithDevices()
are not clobbered is not WithResources()
doing too much. It's just WithResources()
being a good(-ish) implementation of a functional option, and what allows the functional options pattern to be used to factor distinct concerns into distinct option functions. Factoring out a WithBlkIO()
function would also have to ensure that it does not clobber changes to the spec applied by WithResources()
and other options.
That all being said, the way that WithResources()
goes about not clobbering existing LinuxResources
fields it does not otherwise touch is not particularly forward-compatible as it will zero out all other fields aside from Devices
. I'll fix that.
type Resources specs.LinuxResources | ||
type Resources = specs.LinuxResources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this needed for this PR? I feel like this should be separate (and possibly when doing so, mark Resources
deprecated in favour of specs.LinuxResources
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It avoids having to explicitly cast types.Resources
values to specs.LinuxResources
. And the type serves a purpose: it allows the libcontainerd Task
interface to be have a single definition which is applicable for both Windows and Linux platforms.
moby/libcontainerd/types/types_windows.go
Lines 26 to 27 in 1aae940
// Resources defines updatable container resource values. | |
type Resources struct{} |
moby/libcontainerd/types/types.go
Line 96 in 1aae940
UpdateResources(ctx context.Context, resources *Resources) error |
I am sure it would have been written as a type alias originally, if it existed at the time. (Proper type aliases were added in Go 1.9.)
moby/libcontainerd/remote/client_linux.go
Lines 24 to 26 in 1aae940
// go doesn't like the alias in 1.8, this means this need to be | |
// platform specific | |
return t.Update(ctx, containerd.WithResources((*specs.LinuxResources)(resources))) |
I can split this out into its own commit if you feel strongly about it, but I don't think it's a significant enough change to warrant a separate PR. While technically a breaking change in the strictest sense, even Go's own apidiff tooling does not consider merging types to be worth flagging.
Actually; looks like there's some test-failures that may be related 😅 |
Many of the fields in LinuxResources struct are pointers to scalars for some reason, presumably to differentiate between set-to-zero and unset when unmarshaling from JSON, despite zero being outside the acceptable range for the corresponding kernel tunables. When creating the OCI spec for a container, the daemon sets the container's OCI spec CPUShares and BlkioWeight parameters to zero when the corresponding Docker container configuration values are zero, signifying unset, despite the minimum acceptable value for CPUShares being two, and BlkioWeight ten. This has gone unnoticed as runC does not distingiush set-to-zero from unset as it also uses zero internally to represent unset for those fields. However, kata-containers v3.2.0-alpha.3 tries to apply the explicit-zero resource parameters to the container, exactly as instructed, and fails loudly. The OCI runtime-spec is silent on how the runtime should handle the case when those parameters are explicitly set to out-of-range values and kata's behaviour is not unreasonable, so the daemon must therefore be in the wrong. Translate unset values in the Docker container's resources HostConfig to omit the corresponding fields in the container's OCI spec when starting and updating a container in order to maximize compatibility with runtimes. Signed-off-by: Cory Snider <csnider@mirantis.com>
Audit the OCI spec options used for Linux containers to ensure they are less order-dependent. Ensure they don't assume that any pointer fields are non-nil and that they don't unintentionally clobber mutations to the spec applied by other options. Signed-off-by: Cory Snider <csnider@mirantis.com>
3c52bbe
to
8a094fe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Many of the fields in LinuxResources struct are pointers to scalars for some reason, presumably to differentiate between set-to-zero and unset when unmarshaling from JSON, despite zero being outside the acceptable range for the corresponding kernel tunables. When creating the OCI spec for a container, the daemon sets the container's OCI spec CPUShares and BlkioWeight parameters to zero when the corresponding Docker container configuration values are zero, signifying unset, despite the minimum acceptable value for CPUShares being two, and BlkioWeight ten. This has gone unnoticed as runC does not distingiush set-to-zero from unset as it also uses zero internally to represent unset for those fields. However, kata-containers v3.2.0-alpha.3 tries to apply the explicit-zero resource parameters to the container, exactly as instructed, and fails loudly. The OCI runtime-spec is silent on how the runtime should handle the case when those parameters are explicitly set to out-of-range values and kata's behaviour is not unreasonable, so the daemon must therefore be in the wrong.
Translate unset values in the Docker container's resources HostConfig to omit the corresponding fields in the container's OCI spec when starting and updating a container in order to maximize compatibility with runtimes.
- What I did
- How I did it
- How to verify it
- Description for the changelog
- A picture of a cute animal (not mandatory but encouraged)