Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

daemon: stop setting container resources to zero #45704

Merged
merged 3 commits into from
Jun 12, 2023

Conversation

corhere
Copy link
Contributor

@corhere corhere commented Jun 6, 2023

Many of the fields in LinuxResources struct are pointers to scalars for some reason, presumably to differentiate between set-to-zero and unset when unmarshaling from JSON, despite zero being outside the acceptable range for the corresponding kernel tunables. When creating the OCI spec for a container, the daemon sets the container's OCI spec CPUShares and BlkioWeight parameters to zero when the corresponding Docker container configuration values are zero, signifying unset, despite the minimum acceptable value for CPUShares being two, and BlkioWeight ten. This has gone unnoticed as runC does not distingiush set-to-zero from unset as it also uses zero internally to represent unset for those fields. However, kata-containers v3.2.0-alpha.3 tries to apply the explicit-zero resource parameters to the container, exactly as instructed, and fails loudly. The OCI runtime-spec is silent on how the runtime should handle the case when those parameters are explicitly set to out-of-range values and kata's behaviour is not unreasonable, so the daemon must therefore be in the wrong.

Translate unset values in the Docker container's resources HostConfig to omit the corresponding fields in the container's OCI spec when starting and updating a container in order to maximize compatibility with runtimes.

- What I did

- How I did it

- How to verify it

- Description for the changelog

  • Improved compatibility with OCI runtimes which are more strict than runC about resource constraint parameter values

- A picture of a cute animal (not mandatory but encouraged)

Switch to using t.TempDir() instead of rolling our own.

Clean up mounts leaked by the tests as otherwise the tests fail due to
the leaked mounts because unlike the old cleanup code, t.TempDir()
cleanup does not ignore errors from os.RemoveAll.

Signed-off-by: Cory Snider <csnider@mirantis.com>
Comment on lines +54 to +56
// HORRIBLE HACK: clean up shm mounts leaked by some tests. Otherwise the
// offending tests would fail due to the mounts blocking the temporary
// directory from being cleaned up.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked: only the tests are impacted. The shm mounts are not leaked on container exit—though I'm not sure where in the code they get unmounted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was running into this one as well in #45474 (comment)
and https://github.com/moby/moby/pull/43346/files#diff-7044cd74671fe1aa629c3374b3b65fb57e7d47c6a9a20a23b539ee1425a88f2eL19-L52

(Not sure why I didn't use c.ShmPath though 🤔

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good; one comment on the type change for Resources

Comment on lines +54 to +56
// HORRIBLE HACK: clean up shm mounts leaked by some tests. Otherwise the
// offending tests would fail due to the mounts blocking the temporary
// directory from being cleaned up.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was running into this one as well in #45474 (comment)
and https://github.com/moby/moby/pull/43346/files#diff-7044cd74671fe1aa629c3374b3b65fb57e7d47c6a9a20a23b539ee1425a88f2eL19-L52

(Not sure why I didn't use c.ShmPath though 🤔

Comment on lines 989 to 992
if r.BlkioWeight != 0 {
w := r.BlkioWeight
specResources.BlockIO.Weight = &w
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not a blocker, but would it make sense to have a getBlkioWeight() func (like we do for the other options above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. The other options require non-trivial processing, unlike this one. It could make sense to factor out a getBlockIOResources() func like getMemoryResources(), getCPUResources() and getPidsLimit(), though.

Comment on lines 994 to 995
if s.Linux.Resources != nil && len(s.Linux.Resources.Devices) > 0 {
specResources.Devices = s.Linux.Resources.Devices
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this, I feel like WithResources() is doing too much, or we should split out a WithBlkIO() - it looks like this part is here, to prevent WithResources() from overwriting Devices (which probably is set separately), so trying to merge multiple options.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, ensuring that mutations to the spec by WithDevices() are not clobbered is not WithResources() doing too much. It's just WithResources() being a good(-ish) implementation of a functional option, and what allows the functional options pattern to be used to factor distinct concerns into distinct option functions. Factoring out a WithBlkIO() function would also have to ensure that it does not clobber changes to the spec applied by WithResources() and other options.

That all being said, the way that WithResources() goes about not clobbering existing LinuxResources fields it does not otherwise touch is not particularly forward-compatible as it will zero out all other fields aside from Devices. I'll fix that.

type Resources specs.LinuxResources
type Resources = specs.LinuxResources
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this needed for this PR? I feel like this should be separate (and possibly when doing so, mark Resources deprecated in favour of specs.LinuxResources

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It avoids having to explicitly cast types.Resources values to specs.LinuxResources. And the type serves a purpose: it allows the libcontainerd Task interface to be have a single definition which is applicable for both Windows and Linux platforms.

// Resources defines updatable container resource values.
type Resources struct{}

UpdateResources(ctx context.Context, resources *Resources) error

I am sure it would have been written as a type alias originally, if it existed at the time. (Proper type aliases were added in Go 1.9.)

// go doesn't like the alias in 1.8, this means this need to be
// platform specific
return t.Update(ctx, containerd.WithResources((*specs.LinuxResources)(resources)))

I can split this out into its own commit if you feel strongly about it, but I don't think it's a significant enough change to warrant a separate PR. While technically a breaking change in the strictest sense, even Go's own apidiff tooling does not consider merging types to be worth flagging.

@thaJeztah
Copy link
Member

Actually; looks like there's some test-failures that may be related 😅

Many of the fields in LinuxResources struct are pointers to scalars for
some reason, presumably to differentiate between set-to-zero and unset
when unmarshaling from JSON, despite zero being outside the acceptable
range for the corresponding kernel tunables. When creating the OCI spec
for a container, the daemon sets the container's OCI spec CPUShares and
BlkioWeight parameters to zero when the corresponding Docker container
configuration values are zero, signifying unset, despite the minimum
acceptable value for CPUShares being two, and BlkioWeight ten. This has
gone unnoticed as runC does not distingiush set-to-zero from unset as it
also uses zero internally to represent unset for those fields. However,
kata-containers v3.2.0-alpha.3 tries to apply the explicit-zero resource
parameters to the container, exactly as instructed, and fails loudly.
The OCI runtime-spec is silent on how the runtime should handle the case
when those parameters are explicitly set to out-of-range values and
kata's behaviour is not unreasonable, so the daemon must therefore be in
the wrong.

Translate unset values in the Docker container's resources HostConfig to
omit the corresponding fields in the container's OCI spec when starting
and updating a container in order to maximize compatibility with
runtimes.

Signed-off-by: Cory Snider <csnider@mirantis.com>
Audit the OCI spec options used for Linux containers to ensure they are
less order-dependent. Ensure they don't assume that any pointer fields
are non-nil and that they don't unintentionally clobber mutations to the
spec applied by other options.

Signed-off-by: Cory Snider <csnider@mirantis.com>
@corhere corhere force-pushed the fix-zeroes-in-linux-resources branch from 3c52bbe to 8a094fe Compare June 6, 2023 17:10
@tianon tianon requested a review from thaJeztah June 8, 2023 22:48
Copy link
Member

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants