Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloudprovider/aws: EBS attachment fails: /dev/sdba is not a valid EBS device name (v1.3.0-beta.0) #27534

Closed
simonswine opened this issue Jun 16, 2016 · 5 comments · Fixed by #27628
Assignees
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@simonswine
Copy link
Contributor

simonswine commented Jun 16, 2016

I am having trouble on my AWS 1.3.0beta0 cluster to attach EBS with the controller-manager based attacher.

2016-06-16T10:50:46.081028537Z I0616 10:50:46.080908       1 reconciler.go:126] Started AttachVolume for volume "kubernetes.io/aws-ebs/aws://eu-west-1a/vol-9aa4532b" to node "ip-172-20-130-252.eu-west-1.compute.internal"
2016-06-16T10:50:46.594535809Z E0616 10:50:46.594230       1 attacher.go:78] Error attaching volume "aws://eu-west-1a/vol-9aa4532b": Error attaching EBS volume: InvalidParameterValue: Value (/dev/sdba) for parameter device is invalid. /dev/sdba is not a valid EBS device name.
2016-06-16T10:50:46.594571007Z  status code: 400, request id: 
2016-06-16T10:50:46.594578574Z E0616 10:50:46.594286       1 attacher_detacher.go:124] Attach operation for device "kubernetes.io/aws-ebs/aws://eu-west-1a/vol-9aa4532b" to node "ip-172-20-130-252.eu-west-1.compute.internal" failed with: Error attaching EBS volume: InvalidParameterValue: Value (/dev/sdba) for parameter device is invalid. /dev/sdba is not a valid EBS device name.
2016-06-16T10:50:46.594585498Z  status code: 400, request id: 

There problem is here: https://github.com/kubernetes/kubernetes/blob/release-1.3/pkg/cloudprovider/providers/aws/aws.go#L1302

This is failing as the kube-controller-manager is running in a pod via kubelet manifests. I think if you have a cluster with different instance types mixed you ran into trouble as well, as the device names will be either (sdX or xvdX) and the cluster will do it based on the state of the master (controller-manager).

My work around for now is to enable a host mount of /dev for the controller manager. But this need to be addressed at some point.

    volumeMounts:
    - mountPath: /dev
      name: dev-host
  volumes:
  - hostPath:
      path: /dev
    name: dev-host
  • Is there any way we get the correct name from the AWS api?
  • I think it's depending on the kernel version if it is xvdX or sdX. Am I right on this?
  • A more ugly solution could be to annotate the node during registration, if it's has xvdX or sdX
    device names

@justinsb maybe you can help me get a proper solution for this. I am happy to contribute a PR

@simonswine
Copy link
Contributor Author

I have just found this, I think we could get this information from AWS:

aws ec2 describe-images --image-ids image_id --query Images[].RootDeviceName

@justinsb
Copy link
Member

That code is supposed to check, but it is quite possible I got it backwards looking at it. Also I agree that the logic is going to be problematic anyway with mixed instance types etc if we've moved attachment to KCM.

What AMI are you using? I may have just been "lucky" so far in the AMIs I've used, in that the heuristics have worked correctly.

Also, if you have time, would be great to know:

  1. What /dev/ looks like on one of the problematic nodes (i.e. in particular we have /dev/sda and/or /dev/xvda)?
  2. What is the output of the describe-images command for that AMI (particularly if it is private)?

I like the describe-images approach!

I'm marking this as 1.3 P1 (sorry @davidopp) as for now my working hypothesis is that this was introduced when we moved disk attachment to KCM (I believe previously the kubelet did it, but want to double-check)

@justinsb justinsb added this to the v1.3 milestone Jun 16, 2016
@justinsb justinsb added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. area/platform/aws labels Jun 16, 2016
simonswine added a commit to simonswine/kubernetes that referenced this issue Jun 16, 2016
@simonswine
Copy link
Contributor Author

@justinsb I tried to get a such environments with CoreOS amis: (all eu-west1)

  • ami-706cfd03 vtype is PV
  • ami-c36effb0 vtype is HVM

The describe instances returns different rootdevicenames:

aws ec2 describe-instances | jq '.Reservations[].Instances[] | .RootDeviceName , .ImageId'
"/dev/xvda"
"ami-c36effb0"
"/dev/sda"
"ami-706cfd03"

But if I ssh into the two instances, the devices are called the same for both:

ip-172-20-128-52 ~ # ls -l /dev/xvda
brw-rw---- 1 root disk 202, 0 Jun 16 13:41 /dev/xvda
core@ip-172-20-130-251 ~ $ ls -l /dev/xvda
brw-rw---- 1 root disk 202, 0 Jun 16 10:14 /dev/xvda

I think we have to somehow signal the devicename from the Kubelet to the KCM to solve this properly.

My understanding is to get the sdX names we have to use an pretty old kernel. Maybe it'a good idea to change the default behaviour, see PR #27545. So fallback to '/dev/xvdX'

@erictune
Copy link
Member

I don't see any indication that this is a regression. I am kicking it out of milestone 1.3. Explain how it is a regression or otherwise super-important to get it back on the milestone.

@erictune erictune removed this from the v1.3 milestone Jun 17, 2016
@justinsb justinsb added this to the v1.3 milestone Jun 17, 2016
@justinsb
Copy link
Member

It is a regression: volume mounting is broken on AWS. My believe is that it happens because we have moved volume mounting to KCM, and/or KCM runs in a container. Between those two, volume mounting just doesn't work right now.

justinsb added a commit to justinsb/kubernetes that referenced this issue Jun 17, 2016
We are using HVM style names, which cannot be paravirtual style names.

See
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/device_naming.html

This also fixes problems introduced when moving volume mounting to KCM.

Fix kubernetes#27534
k8s-github-robot pushed a commit that referenced this issue Jun 19, 2016
Automatic merge from submit-queue

AWS volumes: Use /dev/xvdXX names with EC2

We are using HVM style names, which cannot be paravirtual style names.

See
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/device_naming.html

This also fixes problems introduced when moving volume mounting to KCM.

Fix #27534
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants