-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving etcd volume detection logic, ensuring that root volumes are not mounted #3208
Improving etcd volume detection logic, ensuring that root volumes are not mounted #3208
Conversation
|
||
vol.Status = state | ||
func (a *AWSVolumes) findEctdVolumes(p *ec2.DescribeVolumesOutput, volumes []*Volume) []*Volume { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small typo on function name (findEctdVolumes
-> findEtcdVolumes
) and could include a comment for the function so go-lint doesn't warn on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Etc etc ;) thanks will update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does go-lint warn on lower case funcs? Through it only earned on upper case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does go-lint warn on lower case funcs? Through it only earned on upper case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yep my mistake, only warns on upper case :D
@@ -52,6 +52,12 @@ func (k *VolumeMountController) mountMasterVolumes() ([]*Volume, error) { | |||
} | |||
|
|||
for _, v := range attached { | |||
// Do not ever try to mount root devices | |||
if v.LocalDevice == "/dev/sda1" || v.LocalDevice == "/dev/xvda" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the meat of the change? It's quite surprising to me that these volumes got through the etcd tag filters ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is was put in just in case. The filtering should take care of the problem, but we should never mount a root volume ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is just going to bite us later and it shouldn't be needed anyway. Also presumably the root volumes are always attached?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it should not be needed, but our code will go into an infinite loop if you try to mount /dev/sda1 which is the drive in the rhel launchconfig. There are more details in the issue, but we should not ever try to mount either of these partitions, and I would like code to safeguard that from occuring.
I can pull it, but 'shouldn't' has got me in trouble a bunch,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in this case an infinite loop is a good outcome - we're trying to format the root partition.
If you really want this here, how about:
// Do not ever try to mount root devices
if v.LocalDevice == "/dev/sda1" || v.LocalDevice == "/dev/xvda" {
glog.Fatalf("Attempted to mount root device")
}
But IMO, it's just not worth it ... we're as likely one day to legitimately mount on /dev/xvda
.
What are the tags that are being added by the background process? I'm thinking that either we need to filter more carefully (does this PR change the tag filters?) or if the background process is adding etcd tags, that it really can't be doing that - it's going to cause serious problems. |
The client is using the cloudwatch events to add the ec2 instance tags to the root volume of the ec2 instance. All tags that the ec2 instance has are added. The etcd tags are not on the instance and thus not added. |
|
||
skipVolume := false | ||
// skip all volumes only set to false if we find correct etcd tags | ||
skipVolume := true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@justinsb this is the meat - switched the logic to excluding everything, unless the etcd tags exist.
@@ -52,6 +52,12 @@ func (k *VolumeMountController) mountMasterVolumes() ([]*Volume, error) { | |||
} | |||
|
|||
for _, v := range attached { | |||
// Do not ever try to mount root devices | |||
if v.LocalDevice == "/dev/sda1" || v.LocalDevice == "/dev/xvda" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is just going to bite us later and it shouldn't be needed anyway. Also presumably the root volumes are always attached?
@@ -164,73 +164,83 @@ func newEc2Filter(name string, value string) *ec2.Filter { | |||
func (a *AWSVolumes) findVolumes(request *ec2.DescribeVolumesInput) ([]*Volume, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So findVolumes
is used in two places, once when querying all volumes and once when querying by ID.
Although in theory it's fine to filter both by etcd volume, I think it would be cleaner to add the filter at the end here:
func (a *AWSVolumes) FindVolumes() ([]*Volume, error) {
request := &ec2.DescribeVolumesInput{}
request.Filters = []*ec2.Filter{
newEc2Filter("tag:"+awsup.TagClusterName, a.clusterTag),
newEc2Filter("tag-key", awsup.TagNameRolePrefix+awsup.TagRoleMaster),
newEc2Filter("availability-zone", a.zone),
}
return a.findVolumes(request)
}
Incidentally, that shows that it is the master role tag that is causing the problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is fine, I will extract a third method for unit testing then. I did not mock out the unit test with the aws mocks, so that is why I left the aws code here, and extracted a func that I could test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have "k8s.io/etcd/events" and "k8s.io/etcd/main" tag keys. Not sure how we can filter on two tags. Should we make two calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking just post-filter (i.e. in code, not pushed-down to AWS); there should only be a handful of volumes with the master tag anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may be talking in circles since from my perspective we are doing a post-filter in the code that I wrote and tested. Sorry confused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
findVolumes is a wrapper around DescribeVolumes, which maps volumes to our abstract model. But it doesn't do any filtering of mountable volumes (except in the error-case, i.e. we don't return mistagged volumes). The idea is that we're doing the minimal mapping in the cloud abstraction layer. The reason it matters is because of AttachVolume
, which would now be broken for anything that isn't a mount of an etcd volume. Admittedly that is more layering/design than practice, because the only volumes protokube mounts (today) are etcd volumes.
I'm thinking the right place is probably actually in the caller of FindVolumes, therefore, as then it addresses GCE as well :-) (i.e. attachMasterVolumes)
But ... if you look at what's happening there, really the idea is that if you set the master tag on a volume, we're going to try mounting it on the masters. I'm still not convinced that the behaviour we have isn't desireable ... if you add a master volume tag incorrectly, we hang; we could do a lot worse :-)
@@ -93,6 +99,11 @@ func (k *VolumeMountController) mountMasterVolumes() ([]*Volume, error) { | |||
} | |||
|
|||
func (k *VolumeMountController) safeFormatAndMount(device string, mountpoint string, fstype string) error { | |||
// Do not attempt to mount root volumes ever |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here I think...
Need to test, but updated with PR review comments |
@justinsb @KashifSaadat @gambol99 can I get a review? I updated with @justinsb's recommendations. Also, I am doing the skip for the root volumes only in the AWS code, since "/dev/sda1" and "/dev/xdba" are defined as root volumes in ec2. You can define "/dev/sda1" as a non-root drive, and where Ectd is located in vSphere for instance. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: chrislovecnm, justinsb The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue |
Fixes: #3167
When an AWS account has functionality that adds an ec2 instance tags to a volume automatically, protokube can attempt to mount the root volume. This PR tightens the logic for detecting etcd volumes. Also, the two volumes that AWS defines as root volume devices are never mounted. Added a unit test, which required refactoring of the code into a separate method.