Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix AWS device allocator to only use valid device names #41455

Merged

Conversation

gnufied
Copy link
Member

@gnufied gnufied commented Feb 15, 2017

According to
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/device_naming.html
we can only use /dev/xvd[b-c][a-z] as device names - so we can only
allocate upto 52 ebs volumes on a node.

fixes #41453

cc @justinsb @kubernetes/sig-storage-pr-reviews

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 15, 2017
@k8s-reviewable
Copy link

This change is Reviewable

@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed labels Feb 15, 2017
Copy link
Member

@justinsb justinsb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Some nits that I would prefer that we address, but if you push back I'm not going to quibble.

@@ -1225,7 +1225,7 @@ func (c *Cloud) getMountDevice(
if deviceAllocator == nil {
// we want device names with two significant characters, starting with
// /dev/xvdba (leaving xvda - xvdz and xvdaa-xvdaz to the system)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to update this comment per #41455. xvdaa - xvdaz are not allowed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

for {
candidate = d.nextDevice(candidate)
candidate, foundIndex = d.nextDevice(candidate, foundIndex+1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for review completeness: we do skip the first device the first time "around", but I agree that it does not matter

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah - I am skipping first device on first iteration. It shouldn't matter as you though - it will get used after all 51 devices are used etc.

return mountDevice(dev)
}
dev[i] = 'a'
func (d *deviceAllocator) nextDevice(device mountDevice, nextIndex int) (mountDevice, int) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe nextDevice is only called from GetNext. Given that, and the fact that most of the complexity is not in the GetNext -> nextDevice interface, I wonder if it would be easier to inline nextDevice (i.e. eliminate nextDevice).

Copy link
Member Author

@gnufied gnufied Feb 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wrapping around when nextIndex >= len is nice and tidy in nextDevice, but it can go either way. There was a unused parameter left from old code though - I have fixed that.

@justinsb
Copy link
Member

LGTM - nice find & fix. cc @jingxu97

According to
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/device_naming.html
we can only use /dev/xvd[b-c][a-z] as device names - so we can only
allocate upto 52 ebs volumes on a node.
@justinsb
Copy link
Member

Also cc @jsafrane as I believe you probably know the most about this code

@saad-ali saad-ali added this to the v1.5 milestone Feb 15, 2017
@saad-ali saad-ali added cherrypick-candidate cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels Feb 15, 2017
@jsafrane
Copy link
Member

As I wrote in #41453, I think I checked weird names like xvduu when I tested KUBE_MAX_PD_VOLS=<large number>, it seems to me that something has changed on AWS side.

LGTM, however it makes our hack not to reuse recently used device names in #38818 quite useless. Device names are going to be reused quickly.

@jsafrane
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 15, 2017
@gnufied
Copy link
Member Author

gnufied commented Feb 15, 2017

@justinsb can you do "/approve" magic on this PR?

@justinsb
Copy link
Member

/approve

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

The following people have approved this PR: gnufied, justinsb

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 15, 2017
@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 17e7456 into kubernetes:master Feb 15, 2017
@k8s-ci-robot
Copy link
Contributor

@gnufied: The following test(s) failed:

Test name Commit Details Rerun command
Jenkins CRI GCE e2e 7337023 link @k8s-bot cri e2e test this

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@jingxu97
Copy link
Contributor

jingxu97 commented Feb 15, 2017 via email

@gnufied
Copy link
Member Author

gnufied commented Feb 15, 2017

@jingxu97 it is not that #38818 will not work after this change. We are still always picking next device in the list rather than reusing device names like before. However, what will now happen is - after 52 volumes have been attached on a node - the code will start to reuse device names.

@jsafrane's original code had much bigger device pool and hence reuse will happen a lot less frequently (26p2), but that can't helped now since AWS explicitly doesn't allow that much bigger pool.

So after my change, we have kind of middle ground situation. We are still picking next device but the pool is lot smaller owing to AWS naming restrictions. I don't see any way around that. :(

@jingxu97
Copy link
Contributor

I see. Then the chances of hitting mounting to wrong volume is still quite small. I am ok with this change for now but we need to work on something to avoid hitting this issue.

k8s-github-robot pushed a commit that referenced this pull request Feb 24, 2017
…5-upstream-release-1.5

Automatic merge from submit-queue

Automated cherry pick of #41455

Cherry pick of #41455 on release-1.5.

#41455: Fix AWS device allocator to only use valid device names
@k8s-cherrypick-bot
Copy link

Commit found in the "release-1.5" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

k8s-github-robot pushed a commit that referenced this pull request Feb 26, 2017
…5-upstream-release-1.4

Automatic merge from submit-queue

Automated cherry pick of #41455

Cherry pick of #41455 on release-1.4.

#41455: Fix AWS device allocator to only use valid device names
@gnufied gnufied mentioned this pull request Mar 1, 2017
@dkerwin
Copy link

dkerwin commented Mar 7, 2017

Thanks for fixing this. Hit this bug with only 6 ebs volumes attached. Any chance to get an immediate 1.5.4 release? Kind of a dealbreaker for aws installations in my opinion

@zmerlynn
Copy link
Member

zmerlynn commented Mar 7, 2017

@dkerwin: I believe we're cutting 1.5.4 today or tomorrow.

@dkerwin
Copy link

dkerwin commented Mar 8, 2017

Deployed v1.5.4 and ebs volumes work as expected. Awesome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Could not attach ebs volume more than 50 for one node