Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ephemeral storage requests on initContainers can cause pod to be unschedulable #96083

Closed
jmcmeek opened this issue Nov 1, 2020 · 8 comments · Fixed by #96092
Closed

Ephemeral storage requests on initContainers can cause pod to be unschedulable #96083

jmcmeek opened this issue Nov 1, 2020 · 8 comments · Fixed by #96092
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.

Comments

@jmcmeek
Copy link
Contributor

jmcmeek commented Nov 1, 2020

What happened:

In a v1.18 cluster with the LocalStorageCapacityIsolation feature gate disabled setting an ephemeral-storage request or limit causes the pod to be unschedulable:
Warning FailedScheduling 56s (x3 over 119s) default-scheduler 0/3 nodes are available: 3 Insufficient ephemeral-storage.

If the ephemeral-storage request is removed the pod is scheduled.

This behavior exists in v1.19 and a recent v1.20 build.

What you expected to happen:

In previous releases such pods are scheduled. I expect that any ephemeral-storage limit is not enforced.

How to reproduce it (as minimally and precisely as possible):

Disable the LocalStorageCapacityIsolation feature guide by adding --feature-gates=LocalStorageCapacityIsolation=false to the kube-scheduler.

Create a pod like this:

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    name: nginx
spec:
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 80
    resources:
      requests:
        ephemeral-storage: 1024M
      limits:
        ephemeral-storage: 1024M
  initContainers:
  - name: init-myservice
    image: busybox:1.28
    command: ['sh', '-c', "echo waiting for myservice; sleep 7;"]
    resources:
      requests:
        cpu: 500m
        ephemeral-storage: 2M

Anything else we need to know?:

The problem was originally seen after moving from IBM Cloud Red Hat OpenShift v4.4 to v4.5. Pods that worked in 4.4 (Kubernetes v1.17) remained in a Pending state in OpenShift v4.5 (Kubernetes 1.18). We discovered that the IBM Cloud OpenShift service has the LocalStorageCapacityIsolation feature-gate disabled while Red Hat has enabled the feature. We then replicated the behavior with a plain Kubernetes cluster.

Environment:

  • Kubernetes version (use kubectl version): v1.18.10, v1.19.3, v1.20.0
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:
@jmcmeek jmcmeek added the kind/bug Categorizes issue or PR as related to a bug. label Nov 1, 2020
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Nov 1, 2020
@k8s-ci-robot
Copy link
Contributor

@jmcmeek: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Nov 1, 2020
@jmcmeek jmcmeek changed the title Ephereral storage requests on initContainers can cause pod to be unschedulable Ephemeral storage requests on initContainers can cause pod to be unschedulable Nov 1, 2020
@jmcmeek
Copy link
Contributor Author

jmcmeek commented Nov 1, 2020

/sig sig-scheduling

@k8s-ci-robot
Copy link
Contributor

@jmcmeek: The label(s) sig/sig-scheduling cannot be applied, because the repository doesn't have them

In response to this:

/sig sig-scheduling

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jmcmeek
Copy link
Contributor Author

jmcmeek commented Nov 1, 2020

/sig scheduling

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 1, 2020
@Huang-Wei
Copy link
Member

I can confirm it's a bug. Will work on it.

/assign

@jmcmeek
Copy link
Contributor Author

jmcmeek commented Nov 2, 2020

Clarifying the recreate steps:

I disabled the LocalStorageCapacityIsolation feature guide by adding --feature-gates=LocalStorageCapacityIsolation=false to the kube-scheduler since this behavior seems to be within the scheduler.

@Huang-Wei
Copy link
Member

Thanks @jmcmeek for confirming!

@Huang-Wei
Copy link
Member

@jmcmeek This bug has been fixed in master branch, and will be backported to 1.18 and 1.19 soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants