Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local Ephemeral Storage Capacity Isolation #361

Open
jingxu97 opened this issue Jul 26, 2017 · 82 comments
Open

Local Ephemeral Storage Capacity Isolation #361

jingxu97 opened this issue Jul 26, 2017 · 82 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. sig/storage Categorizes an issue or PR as relevant to SIG Storage. stage/stable Denotes an issue tracking an enhancement targeted for Stable/GA status tracked/no Denotes an enhancement issue is NOT actively being tracked by the Release Team
Milestone

Comments

@jingxu97
Copy link
Contributor

jingxu97 commented Jul 26, 2017

Feature Description

@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. kind/feature Categorizes issue or PR as related to a new feature. labels Jul 26, 2017
@jingxu97 jingxu97 added the stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status label Jul 26, 2017
@jingxu97 jingxu97 self-assigned this Jul 26, 2017
@idvoretskyi idvoretskyi added this to the 1.8 milestone Jul 26, 2017
@idvoretskyi
Copy link
Member

idvoretskyi commented Sep 5, 2017

@jingxu97 @kubernetes/sig-storage-feature-requests any updates for 1.8? Is this feature still on track for the release?

@jingxu97
Copy link
Contributor Author

jingxu97 commented Sep 5, 2017

This feature is on track for 1.8. Details is here #43607

@idvoretskyi
Copy link
Member

idvoretskyi commented Sep 12, 2017

@jingxu97 please, update the features tracking board with the relevant data.

@jingxu97 jingxu97 changed the title Local Storage Capacity Isolation Local Ephemeral Storage Capacity Isolation Oct 27, 2017
@saad-ali saad-ali modified the milestones: v1.8, v1.10 Jan 23, 2018
@saad-ali
Copy link
Member

saad-ali commented Jan 23, 2018

We intend to move local ephemeral storage to beta in 1.10.

@idvoretskyi idvoretskyi added stage/beta Denotes an issue tracking an enhancement targeted for Beta status tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team and removed stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status labels Jan 29, 2018
@Bradamant3
Copy link
Member

Bradamant3 commented Mar 2, 2018

@jingxu97 it looks as though docs need updating for 1.10. Can you please submit a docs PR as soon as possible (it's now officially late), and update the 1.10 feature tracking spreadsheet? Thanks!

@jingxu97
Copy link
Contributor Author

jingxu97 commented Mar 3, 2018

@Bradamant3
Copy link
Member

Bradamant3 commented Mar 5, 2018

Hi @jingxu97 -- Thanks for the docs PR. The spreadsheet is updated. Please note that you need to rebase your docs PR against the 1.10 docs branch -- we branch docs differently from the code repos. Thanks again! Jennifer

@warmchang
Copy link

warmchang commented Mar 7, 2018

Hi @jingxu97 @saad-ali , the local ephemeral storage management only applies for the root partition in release-1.9 (alpha). Does it suppot the runtime partition in release-1.10 (beta)?

@jingxu97
Copy link
Contributor Author

jingxu97 commented Mar 7, 2018

@warmchang, for beta version, it will be the same as alpha which only applies for the root partition. We currently don't plan to support other runtime partition due to the complexity. Could you please let me know what user case you need it for different partitions? Thanks!

@warmchang
Copy link

warmchang commented Mar 7, 2018

@jingxu97 I checked the original proposal local-storage-overview, it include the "Runtime Patition" description.

One scenario:
The K8S deploy on IaaS (OpenStack or VMware) platform, base on considerations such as disk capacity, the nodes VMs would mount cloud disk as the "Docker Root Dir" instead of using the VMs' system root partitions.
And then, how to manage the ephemeral storage for the containers running on the nodes? Thanks!

@dashpole
Copy link
Contributor

dashpole commented Mar 7, 2018

@warmchang the runtime partition still has the same support it has had in the past. The kubelet will monitor the runtime partition, and perform evictions if space runs low based on the highest consumers of the runtime partition.

In your example, I'm not sure why using a cloud disk requires you to split the kubelet's and the runtime's partitions.

@warmchang
Copy link

warmchang commented Mar 8, 2018

@dashpole Before this Local Ephemeral Storage features, because the container writable layer unlimited write temporary files (such as logs) lead to full disk, resulting in the operating system hang, in order to prevent this behavior, we mount a Separate partition for Docker Root Dir.

We try the feature by this scenario, and found that it can not limit the capacity of container.

From a technical point of view, what is the difference between the capacity limits of the runtime partition and the root partition? Thanks!

@dashpole
Copy link
Contributor

dashpole commented Mar 12, 2018

We try the feature by this scenario, and found that it can not limit the capacity of container.

The behavior you describe should work regardless of this feature. Make sure you have --root-dir set correctly. Docker reports its root directory to the kubelet, so as long as your images are stored on the same partition that contains /var/lib/docker (or whatever your docker root dir is), this should work correctly.

@warmchang
Copy link

warmchang commented Mar 16, 2018

@dashpole Very useful skill!

After verification (ping @zhangxiaoyu-zidif ), the expected effect can be achieved. 👏👏

[root@k8s-master-controller:/]$ kubectl get rs
NAME                             DESIRED   CURRENT   READY     AGE
busybox-apps-v1beta1-7f8dd8d89   1         1         1         21m
[root@k8s-master-controller:/]$ kubectl get pod --show-all
NAME                                   READY     STATUS    RESTARTS   AGE
busybox-apps-v1beta1-7f8dd8d89-kh6xc   1/1       Running   0          19m
busybox-apps-v1beta1-7f8dd8d89-mg7ls   0/1       Evicted   0          21m
[root@k8s-master-controller:/]$ kubectl describe pod busybox-apps-v1beta1-7f8dd8d89-mg7ls
Name:           busybox-apps-v1beta1-7f8dd8d89-mg7ls
Namespace:      default
Node:           172.160.134.17/
Start Time:     Mon, 23 Apr 2018 09:27:02 +0800
Labels:         app=busybox-apps-v1beta1
                pod-template-hash=394884845
Annotations:    kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"busybox-apps-v1beta1-7f8dd8d89","uid":"6c817aea-4695-11e8-9103-f...
Status:         Failed
Reason:         Evicted
Message:        The node was low on resource: ephemeral-storage.
IP:
Created By:     ReplicaSet/busybox-apps-v1beta1-7f8dd8d89
Controlled By:  ReplicaSet/busybox-apps-v1beta1-7f8dd8d89
Containers:
  busybox:
    Image:  busybox
    Port:   <none>
    Command:
      sleep
      3600
    Limits:
      ephemeral-storage:  50Mi
    Requests:
      ephemeral-storage:  50Mi
    Environment:          <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-7tchh (ro)
Volumes:
  default-token-7tchh:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-7tchh
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     <none>
Events:
  Type     Reason                 Age   From                     Message
  ----     ------                 ----  ----                     -------
  Normal   Scheduled              22m   default-scheduler        Successfully assigned busybox-apps-v1beta1-7f8dd8d89-mg7ls to 172.160.134.17
  Normal   SuccessfulMountVolume  22m   kubelet, 172.160.134.17  MountVolume.SetUp succeeded for volume "default-token-7tchh"
  Normal   Pulled                 22m   kubelet, 172.160.134.17  Container image "busybox" already present on machine
  Normal   Created                22m   kubelet, 172.160.134.17  Created container
  Normal   Started                22m   kubelet, 172.160.134.17  Started container
  Warning  Evicted                19m   kubelet, 172.160.134.17  pod ephemeral local storage usage exceeds the total limit of containers {{52428800 0} {<nil>} 50Mi BinarySI}
  Normal   Killing                19m   kubelet, 172.160.134.17  Killing container with id docker://busybox:Need to kill Pod
[root@k8s-master-controller:/]$

@zhangxiaoyu-zidif
Copy link
Member

zhangxiaoyu-zidif commented Mar 16, 2018

that's great for us. thanks for your help =) @dashpole

@justaugustus
Copy link
Member

justaugustus commented Apr 17, 2018

@jingxu97 @vishh
Any plans for this in 1.11?

If so, can you please ensure the feature is up-to-date with the appropriate:

  • Description
  • Milestone
  • Assignee(s)
  • Labels:
    • stage/{alpha,beta,stable}
    • sig/*
    • kind/feature

cc @idvoretskyi

@krol3
Copy link

krol3 commented Jul 6, 2022

Hello @jingxu97 👋, 1.25 Release Docs shadow here.
This enhancement is marked as ‘Needs Docs’ for 1.25 release.

Please follow the steps detailed in the documentation to open a PR against dev-1.25 branch in the k/website repo. This PR can be just a placeholder at this time, and must be created by August 4.
 Also, take a look at Documenting for a release to familiarize yourself with the docs requirement for the release. 


Thank you!

@jingxu97
Copy link
Contributor Author

jingxu97 commented Jul 11, 2022

@BenTheElder thank you for bringing up the issue. So the problem here is in some system such kind rootless, it is hard (or not possible) to get storage usage information and it will fail to start kubelet.

If storage usage information is not feasible in some systems, I am wondering whether the following logic can help avoid blocking GA while not breaking the existing system behavior:

  • add bool enableLocalStorageCapacityIsolation (default=true) into kubelet configuration
  • For system that cannot support detecting root disk usage, set enableLocalStorageCapacityIsolation=false in kubelet configration.

Reference link kubernetes-sigs/kind#2411

@dchen1107
Copy link
Member

dchen1107 commented Jul 19, 2022

Thanks for coming to SIG Node to review earlier today.

Re: #361 (comment)

I think local ephemeral storage support is very critical feature for a cluster. Without it being enabled, the cluster cannot be treated as production-ready from our earlier experience with K8s and GKE. So far we only know kind cluster and minkube disable the feature, not any production offers. In this case, can we make the feature default on, but introducing a config to disable it for those features?

I have a concern to introduce another kubelet config for such critical features. If really needed, can we automatically detect if the system has this capability?

@Priyankasaggu11929
Copy link
Member

Priyankasaggu11929 commented Jul 21, 2022

Hello @jingxu97 👋

Checking in once more as we approach 1.25 code freeze at 01:00 UTC on Wednesday, 3rd August 2022.

Please ensure the following items are completed:

  • All PRs to the Kubernetes repo that are related to your enhancement are linked in the above issue description (for tracking purposes).
  • All PRs are fully merged by the code freeze deadline.

I was unable to find out any k/k PRs promoting KEP 361 to stable. Could you please point me,if any related PRs are currently open or confirm if they're yet to be raised?

The status of the enhancement is currently marked as at-risk.

Please also update the issue description with the relevant links for tracking purpose. Thank you so much!

@BenTheElder
Copy link
Member

BenTheElder commented Jul 25, 2022

So far we only know kind cluster and minkube disable the feature, not any production offers.

I'm not aware of any production offerings, but so far all tools I've found that run Kubernetes rootless require opting out of LocalStorageCapacityIsolation feature gate, more can be found with https://grep.app/search?q=LocalStorageCapacityIsolation

https://github.com/podenv/silverkube

https://github.com/saschagrunert/kubernix

https://github.com/rootless-containers/usernetes/

k3d-io/k3d#802

canonical/microk8s#1587 (comment)

In this case, can we make the feature default on, but introducing a config to disable it for those features?

I think this makes sense. It can be a kubelet config field.

I have a concern to introduce another kubelet config for such critical features. If really needed, can we automatically detect if the system has this capability?

I feel like it might be concerning if it turns off automatically and admins are not aware that isolation is not be respected?
With a config field disabling it is explicit. If there's a bug in this check it might be difficult to catch and problematic for production.

Unless there's a status on the pod or something this might be difficult to track down when isolation is being ignored by automatic disablement.

@AnaMMedina21
Copy link

AnaMMedina21 commented Jul 27, 2022

Hi @jingxu97 👋🏽

I’m one of the 1.25 Release Comms Shadow, Thanks for submitting this KEP for a Feature Blog. We have the code freeze upcoming and need to have a placeholder PR up already. Once you have it up, leave a comment on here (or slack me, and I will add it to the tracking sheet)

Let me know if you have any questions or if I can assist in any way :)

@jingxu97
Copy link
Contributor Author

jingxu97 commented Jul 29, 2022

Hi @AnaMMedina21 thank you for the reminder.
the PR is ready for review #3422

@AnaMMedina21
Copy link

AnaMMedina21 commented Jul 29, 2022

@jingxu97 Awesome, Thank you! Just updated our tracking document

@Priyankasaggu11929
Copy link
Member

Priyankasaggu11929 commented Aug 1, 2022

Hello @jingxu97 👋

Just a gentle reminder from the enhancement team as we approach 1.25 code freeze at 01:00 UTC on Wednesday, 3rd August 2022 (which is almost 2 days from now)

Please plan to have the open k/k PR merged before then:

The status of this enhancement is currently marked as at risk

Thank you

@jingxu97
Copy link
Contributor Author

jingxu97 commented Aug 3, 2022

the PR to promote GA kubernetes/kubernetes#111513 got approval and lgtm already

@BenTheElder
Copy link
Member

BenTheElder commented Aug 3, 2022

kubernetes/kubernetes#111513 merged (thanks @jingxu97 !)

I've sent a heads up to each of the projects i've identified that will need to migrate, besides the release note.

@jingxu97
Copy link
Contributor Author

jingxu97 commented Aug 3, 2022

Great thanks for @BenTheElder on helping it!

kubernetes/kubernetes#111513 merged (thanks @jingxu97 !)

I've sent a heads up to each of the projects i've identified that will need to migrate, besides the release note.

@cathchu
Copy link

cathchu commented Aug 4, 2022

Hey there @jingxu97 👋, 1.25 Release Docs Shadow here!

This enhancement is still marked as ‘Needs Docs’ for 1.25 release.

Tomorrow (August 4th) is the deadline for opening a placeholder PR against dev-1.25 branch in the k/website repo.

Please follow the steps detailed in the documentation to open the PR. This PR can be just a placeholder at this for now, as final docs PRs are due August 9th.

For more info, take a look at Documenting for a release to familiarize yourself with the docs requirement for the release.

@jingxu97
Copy link
Contributor Author

jingxu97 commented Aug 4, 2022

The PR is kubernetes/website#35687

@AnaMMedina21
Copy link

AnaMMedina21 commented Aug 4, 2022

Hi @jingxu97
I just checked on #3422 today and we don't have any blog content on the PR.

The blog PR needs to be made against k/website . An example of a blog post can be seen on kubernetes/website#33979.

Let me know if you have any questions, and I'm happy to help!

@AnaMMedina21
Copy link

AnaMMedina21 commented Aug 11, 2022

Hi @jingxu97!

Just wanted to ping you again about the PR for the feature blog post. Let me know if you have any questions/I can assist!

@jingxu97
Copy link
Contributor Author

jingxu97 commented Aug 15, 2022

I posted a new PR for doc change due to some issue with my old one. kubernetes/website#35989

@jingxu97
Copy link
Contributor Author

jingxu97 commented Aug 16, 2022

@AnaMMedina21 blog PR is created kubernetes/website#36025
Thank you!

@rhockenbury rhockenbury added tracked/no Denotes an enhancement issue is NOT actively being tracked by the Release Team and removed tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team labels Sep 11, 2022
@KeithTt
Copy link

KeithTt commented Oct 12, 2022

Same problem here, Docker Root Dir: /opt/docker.

May I create a soft link to /var/lib/docker?

kubectl drain node
systemctl stop kubelet
systemctl stop docker

mv /var/lib/docker /var/lib/docker.bak
ln -sv /opt/docker /var/lib/docker

Or link /var/lib/kubelet to /opt/kubelet

ln -sv /opt/kubelet /var/lib/kubelet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. sig/storage Categorizes an issue or PR as relevant to SIG Storage. stage/stable Denotes an issue tracking an enhancement targeted for Stable/GA status tracked/no Denotes an enhancement issue is NOT actively being tracked by the Release Team
Projects
None yet
Development

No branches or pull requests