New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix AWS device allocator to only use valid device names #41455
Fix AWS device allocator to only use valid device names #41455
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Some nits that I would prefer that we address, but if you push back I'm not going to quibble.
@@ -1225,7 +1225,7 @@ func (c *Cloud) getMountDevice( | |||
if deviceAllocator == nil { | |||
// we want device names with two significant characters, starting with | |||
// /dev/xvdba (leaving xvda - xvdz and xvdaa-xvdaz to the system) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to update this comment per #41455. xvdaa - xvdaz are not allowed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
for { | ||
candidate = d.nextDevice(candidate) | ||
candidate, foundIndex = d.nextDevice(candidate, foundIndex+1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for review completeness: we do skip the first device the first time "around", but I agree that it does not matter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah - I am skipping first device on first iteration. It shouldn't matter as you though - it will get used after all 51 devices are used etc.
return mountDevice(dev) | ||
} | ||
dev[i] = 'a' | ||
func (d *deviceAllocator) nextDevice(device mountDevice, nextIndex int) (mountDevice, int) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe nextDevice is only called from GetNext. Given that, and the fact that most of the complexity is not in the GetNext -> nextDevice interface, I wonder if it would be easier to inline nextDevice (i.e. eliminate nextDevice).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wrapping around when nextIndex >= len
is nice and tidy in nextDevice
, but it can go either way. There was a unused parameter left from old code though - I have fixed that.
LGTM - nice find & fix. cc @jingxu97 |
According to http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/device_naming.html we can only use /dev/xvd[b-c][a-z] as device names - so we can only allocate upto 52 ebs volumes on a node.
89d075f
to
7337023
Compare
Also cc @jsafrane as I believe you probably know the most about this code |
As I wrote in #41453, I think I checked weird names like xvduu when I tested LGTM, however it makes our hack not to reuse recently used device names in #38818 quite useless. Device names are going to be reused quickly. |
/lgtm |
@justinsb can you do "/approve" magic on this PR? |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED The following people have approved this PR: gnufied, justinsb Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
Automatic merge from submit-queue |
@gnufied: The following test(s) failed:
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
On Wed, Feb 15, 2017 at 1:48 AM, Jan Šafránek ***@***.***> wrote:
As I wrote in #41453
<#41453>, I *think* I
checked weird names like xvduu when I tested KUBE_MAX_PD_VOLS=<large
number>, it seems to me that something has changed on AWS side.
LGTM, however it makes our hack not to reuse recently used device names in
#38818 <#38818> quite
useless. Device names are going to be reused quickly.
I am quite concern about this. If #38818 will not work after this PR, we
might end up having the issue of mounting to wrong volume again. I just had
a PR #41363 to increase the default polling period to 1 minute which
increases the the window of data out-of-sync. If device names will be
reused quickly, we might hit the mounting to wrong volume issue.
—
… You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#41455 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ASSNxVlS5bMbsY9rkkvLecCYKSHFwSnxks5rcsnegaJpZM4MBPiU>
.
--
- Jing
|
@jingxu97 it is not that #38818 will not work after this change. We are still always picking next device in the list rather than reusing device names like before. However, what will now happen is - after 52 volumes have been attached on a node - the code will start to reuse device names. @jsafrane's original code had much bigger device pool and hence reuse will happen a lot less frequently (26p2), but that can't helped now since AWS explicitly doesn't allow that much bigger pool. So after my change, we have kind of middle ground situation. We are still picking next device but the pool is lot smaller owing to AWS naming restrictions. I don't see any way around that. :( |
I see. Then the chances of hitting mounting to wrong volume is still quite small. I am ok with this change for now but we need to work on something to avoid hitting this issue. |
Commit found in the "release-1.5" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked. |
Thanks for fixing this. Hit this bug with only 6 ebs volumes attached. Any chance to get an immediate 1.5.4 release? Kind of a dealbreaker for aws installations in my opinion |
@dkerwin: I believe we're cutting 1.5.4 today or tomorrow. |
Deployed v1.5.4 and ebs volumes work as expected. Awesome! |
According to
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/device_naming.html
we can only use /dev/xvd[b-c][a-z] as device names - so we can only
allocate upto 52 ebs volumes on a node.
fixes #41453
cc @justinsb @kubernetes/sig-storage-pr-reviews