-
Notifications
You must be signed in to change notification settings - Fork 38.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow setting custom volume sizes for Windows containers with containerd #109702
Conversation
Please note that we're already in Test Freeze for the |
/hold |
/cc @dcantah |
/lgtm |
/triage accepted |
/hold cancel |
@marosset: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
}) | ||
|
||
ginkgo.It("validate rootfs size can be set larger than 20Gb", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want a test that validates it can be smaller? The docs say Some users may want to override this default and configure the free space to a smaller or larger value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It cannot at the moment be smaller than 20GB, shrinking from the default is a harder problem.. I've been thinking of some ways to do this in containerd, but currently in main that's not there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This.
I can update the comments in code to say it can be set larger than the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, is there anybody that has a use case for it to be smaller that we know?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that a limitation in containerd only? https://docs.microsoft.com/en-us/virtualization/windowscontainers/manage-containers/container-storage#storage-limits seems to suggest that it is possible in docker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not in docker either iirc, I think the call succeeds but you stay at 20
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok good to know it isn't a missing feature, may need to get those docs updated then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, we'll update docs for the release (once we settle on what fields in the pod-spec control this)
@@ -188,3 +196,94 @@ func testPodWithROVolume(podName string, source v1.VolumeSource, path string) *v | |||
}, | |||
} | |||
} | |||
|
|||
func getNodeContainerRuntimeAndVersion(n v1.Node) (string, semver.Version, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be in our utils.go
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, probably :)
I'll update
Command: []string{ | ||
"powershell.exe", | ||
"-Command", | ||
"if (-not ((Get-PsDrive -Name C).Free -gt 21474836480)) { exit 1 } else { exit 0 }", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why greater than 21gb? is there a reason not test for 30 gb? I guess the scratch won't be the exact same size everytime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I picked this number because it was greater than the default (20Gb) so we wouldn't get a false positives.
With this powershell command I can only get free space, not total space and because of that I wanted wiggle room.
This e2e test is primarily validating that the field on the pod-spec gets passed all the way to the CRI.
I think other layers (containerd or HCS) can test to make sure the size matches exactly since that is much easier to do from outside of the container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, works for me, especially if the other layers are doing the validation
// override this default. | ||
// https://docs.microsoft.com/virtualization/windowscontainers/manage-containers/container-storage#scratch-space | ||
// https://docs.microsoft.com/virtualization/windowscontainers/manage-containers/container-storage#storage-limits | ||
ephemeralStorageLimit := container.Resources.Limits.StorageEphemeral().Value() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading through https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
If the kubelet is managing local ephemeral storage as a resource, then the kubelet measures storage use in:
emptyDir volumes, except tmpfs emptyDir volumes
directories holding node-level logs
writeable container layers
If a Pod is using more ephemeral storage than you allow it to, the kubelet sets an eviction signal that triggers Pod eviction.
It seems that kubelet tracks more than just the rootfs with the Ephemeral Storage Limit value. The RootfssizeinBytes seems to be the "writable layer" here, but would this mean pods would be evicted if they added emtpydir
volume as well as resize the rootsfs of the container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Windows I don't think it is possible to resize the rootfs of the container.
but would this mean pods would be evicted if they added emtpydir volume as well as resize the rootsfs of the container?
I believe this would be the case that
Also if the writable layer is nearly full and the container produced a lot of logs that could possibly trigger an eviction too.
I went back and forth trying to decide between using limit
or request
here.
limit
made more sense to me when looking at the context of only the container's ephemeral-storage but I could maybe request
makes sense so users can set limits
higher than requests to allow for emptydir usage?
@dcantah - do the volumes created here use 'thin' provisioning?
One concern I have if they do not is that it could be possible to over-provision storage capacity on the node if we use request
for the size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msau42 @ddebroy @mauriciopoppe - Do any of you (or anyone else from SIG-storage) have any thoughts here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I am struggling a little bit with overloading the ephemeralstoragelimit
concept and the RootfsSizeInBytes
. I am not sure they are 1 to 1 but this is the first time I am looking at it.
I was looking around and it doesn't look like Linux sets the scratch layer sizes anywhere. Does linux have a similar concept or is this windows only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @jingxu97
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @liggitt
I remember discussing this with him (can't remember why exactly) but he asked if it would make sense to use ephemeral storage resources here. I think it can make sense but I'm not sure if/what the linux equivalent is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The vhd itself for the volume is thin provisioned and will expand on data written, it's nowhere near 20GiB on the hosts disk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The vhd itself for the volume is thin provisioned and will expand on data written, it's nowhere near 20GiB on the hosts disk
That was what I thought - thanks for confirming!
@marosset: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
thanks @marosset !!! in passing, id say things like matching parity/behaviour for density related things imo we can fix later - since we know this fixes a concrete problem people have. |
@jingxu97 @ddebroy @mauriciopoppe - Can someone from SIG-storage (preferably with some Windows knowledge :P) weigh in here? On Windows the default size for the 'writable container layer' is 20Gb and this cannot be expanded after the VM is created. /sig storage |
I also have concerns about overloading ephemeral storage request/limit for this usage. The request/limit is for the feature of local storage capacity isolation and have specific meaning about it. It already has some complexity on its own, I don't think we should overload more meaning on it. Right now this feature is beta and plan to promote to GA. But due to some system could not correctly get disk usage information (e.g., kind rootless), they have to disable this feature. For allow setting customizable writable container layer, do you think it is useful to set per pod/container level, or just cluster level? |
The best solution would allow for this to be set per-container but I think being able to set this per-pod would be acceptable. |
Here are a couple of scenarios that may lead to unexpected outcomes as a result of Unexpected pod eviction due to verbose logs:
Backward compatibility breaks:
It feels like a distinct parameter (+ validations that it is < container ephemeral storage limit) to specify the rootfssize would be ideal. Today,
|
Hi @marosset |
/milestone clear |
We are still having some on-going discussions on the best way to expose this field to users. |
/close |
@marosset: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Signed-off-by: Mark Rossetti marosset@microsoft.com
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
On Windows containers are created with a default volume size of 20Gb.
With docker this can be specified per-container with
--storage-opt "size=xxGb"
or globally by specifyingstorage-opts
in the dockerd config file.https://docs.microsoft.com/virtualization/windowscontainers/manage-containers/container-storage#storage-limits
This PR allows for the container volume size to be specified in pod specs and will pass that size to CRI container create calls.
Which issue(s) this PR fixes:
Related to #108894 and containerd/containerd#6694
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
/sig windows
/area kubelet