Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default node volume size is too small #780

Closed
stefanprodan opened this issue May 4, 2019 · 11 comments · Fixed by #2317
Closed

Default node volume size is too small #780

stefanprodan opened this issue May 4, 2019 · 11 comments · Fixed by #2317

Comments

@stefanprodan
Copy link
Contributor

Running a dozen workloads like Istio and Weave Cloud agents will result in pods being evicted due to disk pressure. I think we should increase the default volume size from 20GB to 100GB (default size on GKE).

The --node-volume-size does not have a default value so you'll find out about the 20GB limit after running into the disk space issue.

@mumoshu
Copy link
Contributor

mumoshu commented May 4, 2019

Yeah this makes sense. FWIW, coincidentally I have been using 100GB for my own default volume size, which worked well in practice. So I agree with the number too 😃

@whereisaaron
Copy link

Goldilocks disagrees, 80GB is the one true size 😀👍

@errordeveloper
Copy link
Contributor

I think it's plausible to increase it. But we will need to devise a way doing it, as some instance types use local disks, so attaching EBS to those seems unhelpful.

@whereisaaron
Copy link

Now we have launch template support, can custom disk arrangements be configured into the template by users? Not just a sized EBS, but also two EBS or local+EBS e.g. where you need a whole device for ceph?

@errordeveloper
Copy link
Contributor

errordeveloper commented May 13, 2019 via email

@urvineet
Copy link

Getting FreeDiskSpaceFailed i have defined
--- eksctl node config
nodeGroups:

  • name: Company-private
    instanceType: m5.2xlarge
    labels:
    nodegroup-type: app-workers
    desiredCapacity: 2
    minSize: 2
    maxSize: 10
    privateNetworking: true
    volumeSize: 200
    VolumeType: gp2

--- kubectl describe node
Capacity:
attachable-volumes-aws-ebs: 25
cpu: 8
ephemeral-storage: 209702892Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 32120772Ki
pods: 29
Allocatable:
attachable-volumes-aws-ebs: 25
cpu: 8
ephemeral-storage: 193262184948
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 32018372Ki
pods: 29
System Info:
Kernel Version: 4.14.106-97.85.amzn2.x86_64
OS Image: Amazon Linux 2
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://18.6.1
Kubelet Version: v1.12.7
Kube-Proxy Version: v1.12.7
ProviderID: aws:///regionb/i-07b9f51130ecb15c9

Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits


cpu 1810m (22%) 38 (475%)
memory 3328Mi (10%) 54230Mi (173%)
attachable-volumes-aws-ebs 0 0
Events:
Type Reason Age From Message


Warning FreeDiskSpaceFailed 46m kubelet, failed to garbage collect required amount of images. Wanted to free 9439938969 bytes, but freed 2353231631 bytes

@philvarner
Copy link

+1 for making it larger. I'm trying to deploy jupyterhub to k8s and new to both tools. Even upping this to 50GB still caused DiskPressure warnings because of the large Docker images pulled for jupyterhub and my notebooks (which were a bloated 17GB). As a newbie, it was confusing when the DiskPressure the out of space caused things to start crashing.

@whereisaaron
Copy link

17GB image!?! 😮 I'm guessing you don't run that with AlwaysPull? 😄

In the past eksctl has generally defaulted to a minimal cost cluster, e.g. small instance-types, only public cluster with no NAT, or one NAT gateway rather than one per AZ. So defaulting to what some might see as a small size, but big enough for a collection of the very commonly used nginx, php, mysql-sized images seems appropriate. The sort of images and workloads you see in k8s 'getting started' tutorials.

Whatever size is default, it won't be right for everybody. If this is an important choice for new users with heavier requirements, then I suggest it should be highlighted in the documentation.

The Getting Started page could have a call-out box mentioning root disk size; "For large container images, you might need to specify a larger size".

The Creating Clusters and Creating Node Groups pages currently don't include disk size in the example config file/options. They could do so, so people realize it is a choice and are prompted to think about it.

@philvarner
Copy link

philvarner commented Oct 11, 2019

17GB image!?! 😮 I'm guessing you don't run that with AlwaysPull? 😄

Well, I was 😄 (I inherited it from someone else and didn't realize how big it actually was until i built it myself and saw that push was taking forever)

I agree with the highlighting in documentation, along with the instance types. I was following the tutorials that assume you're deploying a small image like nginx, and deploying jupyterhub on a 3 x t3.medium w/ 20GB of disk basically crashed the cluster, but I didn't know enough about what to expect from various commands to know that that was what was happening.

I was also trying to merge together the zero-to-jupyterhub-k8s instructions for deploying without eksctl (with lots of manual steps and kubectl) with the ekctl getting started, and didn't do a good job of that. I might look into rewriting that tutorial myself.

@onprema
Copy link

onprema commented Nov 23, 2019

yeah.. how do we specify the volume size we want in your yaml config file?

@onprema
Copy link

onprema commented Nov 23, 2019

oh, nevermind... I see that its

volumeSize: 100
volumeType: gp2

under nodeGroups

michaelbeaumont added a commit to michaelbeaumont/eksctl that referenced this issue Jun 10, 2020
michaelbeaumont added a commit to michaelbeaumont/eksctl that referenced this issue Jun 10, 2020
michaelbeaumont added a commit to michaelbeaumont/eksctl that referenced this issue Jun 12, 2020
michaelbeaumont added a commit that referenced this issue Jun 12, 2020
* Fix applying default volume size to config

* Increase default volume size to 80G

Closes #780
michaelbeaumont added a commit to michaelbeaumont/eksctl that referenced this issue Jun 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants