-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
We should figure out what to do with instance storage / root disks / btrfs #429
Comments
We doing aufs or btrfs? |
For the docker instances we're doing aufs or overlay. We should revisit that as other approaches get more testing. For using instance storage, we should use whatever is appropriate for whatever we decide to use it for :-) |
Then what should we use. |
Is this still on the roadmap? |
Adding a +1 on the need for exposing instance storage - the new AWS i3 instances bench at 18GB/sec+ on instance storage (NVMe-based), which is substantially higher than EBS. |
We also would like to expose the instance storage, also for i3 instances. I'm not sure I agree that AWS is moving away from instance storage -- they are just moving them to a new style of instance. |
Not only do the i3s have amazing iops performance, the d2 instance class has the most cost-efficient storage available on AWS... 6TB for $150 a month is almost as cheap as s3. |
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
+1 for supporting instance storage. I3's performance is great |
/lifecycle frozen |
I realise its probably implicit; but for reliability having the kubelet working space (logs, tmp) on a different volume to the pod storage as well as container writable layers is really important - whatever is done here, please do preserve that (at least as an option). |
Would really like to see this incorporated into kops sooner rather than later, especially now that local PVs are beta in k8s 1.10. |
Are there any workarounds how to use the instance storage for pods that benefit from the extra speed of the storage optimized instances? |
Would love to see this issue get some love. The NVMe instance storage on i3 instances is so fast and useful. I think that many of us would see a big jump in the utility of our instances if this was available for emptyDir and Docker image storage. |
I'm interested in helping out if someone could get me pointed in the right direction. I'm pretty new to the kops codebase. |
@justinsb asked on Slack for a use case for this issue, so here's mine: We are doing CI on Kubernetes, running our software builds in pods that leverage emptyDir scratch directories for code fetches and compiles. It's very I/O intensive, so we chose i3.large instances. Unfortunately, without access to the NVMe disk, these builds are slow as molasses. Without NVMe access, there's no reason to use i3 instance with kops/Kubernetes. We really need these volumes and I'm willing to take a stab at implementing this but I need someone to point me in the right direction because I'm not very familiar with the kops codebase. Thanks. |
Another usecase: We have a Kafka cluster running on kubernetes. Kafka takes care of data replication. We stream large amounts of data onto this kafka cluster. The bottleneck is disk bandwidth. --> We want i3 instances with NVMe to maximize our performance. |
Our use case is similar to Hermain's in that we are running a pod-based Cassandra cluster and also want to maximize disk performance by using the locally attached storage rather than ebs volumes. |
I'm still pretty new to kops development so I'm hoping that someone can set me straight here. The instance types and their ephemeral storage (if any) are defined here: https://github.com/kubernetes/kops/blob/master/upup/pkg/fi/cloudup/awsup/machine_types.go It feels like nodeup should detect the presence of ephemeral disks and issue the
Other thoughts....
|
I'm also wondering how to set up Also I've seen mongo recommends So, I think it would probably be something that should be configurable. I think ideally instead of having any default / automatic behavior here, adding a configuration section to the instance group that specifies what to do with extra volumes, e.g. whether or not to format them, what filesystem to use if so, and what path to mount them at, if any. On startup the instance would examine this configuration and format/mount the disks as specified. At this point, though, perhaps instead of actual new configuration options, a simpler solution might just be add some examples to the docs how to use |
I think volume setup shouldn't be a part of kops unless bringing the node into the cluster actually requires the setup part. If you want to use ephemeral storage for the docker directory then that should be part of kops. But for using the ephemeral storage as a ceph node, it should not be in kops. For applications like ceph or mongo, you should probably just run a daemonset which mounts a hostPath and formats it directly, then exposes it. It's a more generic and higher level way to configure your hosts. |
As a workaround, we used the additionalUserData in the IG spec to instruct cloud-init to place the ephemeral node storage on a c3.large instance on a given path, like this example:
Then used the local storage provisioner (https://github.com/kubernetes-incubator/external-storage/tree/master/local-volume), specifying the parent
Although the Node still gets an EBS volume as its root from kops, at least the fast local storage can be used for I/O intensive workloads, satisfying our use-case. |
Just a note on the complications around software raid that I ran into, it was fairly trivial on the newest k8s stretch AMIs. I did run into issues on the jessie images because of a debconf setting calling for UI interaction for post-install hooks which meant mdadm splashed a blue config screen the first time I tried this manually. I was actually unable to successfully change this configuration in bootstrap prior to installing mdadm (although I'm sure I was missing something). Outside of unusual install hooks though, if your node is fully ephemeral and the storage is fully ephemeral, you don't need to consider your reboot configuration settings, which was the only other option that there seemed to be. I think there's obviously testing to be done as to how this would consistently operate on various OS's (I didn't need to solve for ubuntu, coreOS, amazon linux, etc), but the process itself was pretty trivial. I haven't run into any cases where the RAIDing process has failed. I have been using this for about a month and a half now and it really just works. |
@thejosephstevens did you try setting I've also had to set apt-get --no-install-recommends --fix-broken --fix-missing --assume-yes --auto-remove --quiet -o DPkg::options::="--force-confdef" -o DPkg::options::="--force-confnew" install ... |
Yeah, tried that to no success. It ended up being a non-issue though once I moved to the most recent kops-1.10 stretch image (although I normally wouldn't advocate changing AMIs just to get different OS default settings). Caveat to my earlier posts though, software raid in one of my environments started freaking out (I ran into the md127 bug), so I ended up de-RAIDing my worker nodes. Without drilling further into that bug (not a current priority for me), I can't recommend my RAID setup from above. The non-RAIDed local drives are still working great though, and I'd be perfectly happy if kops built support for a mapping of local drives to directory paths and a file-system choice (or just default ext4). I think the main trick there is navigating the bootstrap priority so you don't get any races and blow out system data anywhere. |
the suggestions in this post worked for me (md127 bug). create an array entry in /etc/mdadm/mdadm.conf and run update-initramfs -u This is what i'm using, not sure it is the most elegant way but it's working:
|
I used ideas from this thread to get it working. This is for a single-volume NVMe drive as found on an AWS EC2 m5d.xlarge instance:
The downside of this approach is that the mkfs(8) is slow and adds a considerable amount of time to instance launching--at least 3-4 minutes. |
FWIW, we've moved to a systemd-based solution now to avoid messing with docker and kube's storage after they start running, just got it running today.
There's absolutely more work to be done on this, I'd like better conditionality on it so this could just be applied to all nodes without just resulting in a failing systemd unit on differently configured machines, but I think there may be a model here to extend that doesn't require as much finagling around processes that depend on the potential mount points. I'm pretty sure this doesn't handle system restart though, so I wouldn't buy it wholesale. |
I am using a user data solution to mount /var/lib/docker on a c5d ephemeral volume like what is posted above (thank you @chrissnell ). I skipped mounting /var/lib/kubelet/pods because kubelet cannot delete the container directories
PVC's work on c5 types instances. I thought it was because of the device name mismatch from AWS api and the linux instance
but devices have the nvme names on c5's and pvc work there. Not sure what's going on with that. Update: PVC's are working on c5d's in another cluster where I'm running a newer kops debian ami. |
It seems mkfs.ext4 hangs on default image for kops 1.12.1. It might be related to bargees/barge-os#76 but not quite sure. |
Is there any plan to address this? |
Is there a good step-by-step guide available for using NVME as PVCs in pods. |
If you're talking about the ephemeral on-host disks like in AWS I wouldn't recommend it for anything other than scratch disk. The way that I did it in my example above was to mount the disks at the paths in the OS that docker uses for basic container process storage (
Just be aware that all the contents of these disks will be lost if you lose the machine, so don't use it for anything you want to persist (prometheus metrics, logs, whatever). Given my experiences with managing these disks in AWS, it's not clear to me that it was at all worth the effort. We spent a good amount of time debugging issues at runtime (see my mention of |
Hi, we are using the NVMe drive provided by AWS with some instances, for now I use the following KOPS hook to mount the NVMe & to assing pods & containers onto it:
This does work, and we saw improvement of our performances thanks to this local NVMe. Does anyone know if it's possible to fully move Kubelet & Docker onto the NVMe to avoid Kubelet polling space disk from / ? |
I have some containers that need fast temporary storage (around 100GB) we were using gp2 type AWS EBS volumes however they would quickly run out of burst balance. Local instance storage seemed like the perfect replacement as it would reduce the spend on slow EBS volumes and provide fast temporary storage. However I quickly found that Kubernetes doesn't seem to have quite implemented a way to use the local instance storage yet. I wanted to use
However like previous posters have mentioned I started seeing issues with disk pressure and the pods being evicted even though the local instance storage had only used 35% capacity. Instead we have now switched to using a
Relevant container configuration:
What would be nice is to be able to specify the root volume in kops to use the local instance storage rather than having to be backed by EBS. I think this makes sense as the EBS volume is only used for temporary storage and is deleted when the instance is deleted. |
@kxesd most likely you will have to wait for kubernetes&kops 1.19. The root cause is a bug in cAdvisor that was fixed only recently. This made kubernetes incorrectly detect the ImageFS partition and with it the random partition usage. For more info, check google/cadvisor#2586. |
Did the fix to cadvisor in 1.19 resolve the issue? We're currently on an older version. |
@missinglink I did not notice the issue since 1.19. |
/remove-lifecycle frozen |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-lifecycle rotten |
is this still a thing? is this tracked upstream by any chance? |
We've had a number of problems with ephemeral storage on EC2, not least that newer instance types don't include them (e.g. kubernetes/kubernetes#23787). Also symlinking /mnt/ephemeral seems to confuse the garbage collector.
We should figure out how to ensure that we have a big enough root disk, maybe how to re-enable btrfs, and then if there is anything we can do with the instance storage if we're otherwise not going to use it (maybe hostVolumes? Or some sort of caching service?)
The text was updated successfully, but these errors were encountered: