Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving etcd volume detection logic, ensuring that root volumes are not mounted #3208

Merged

Conversation

chrislovecnm
Copy link
Contributor

@chrislovecnm chrislovecnm commented Aug 15, 2017

Fixes: #3167

When an AWS account has functionality that adds an ec2 instance tags to a volume automatically, protokube can attempt to mount the root volume. This PR tightens the logic for detecting etcd volumes. Also, the two volumes that AWS defines as root volume devices are never mounted. Added a unit test, which required refactoring of the code into a separate method.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 15, 2017
@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 15, 2017

vol.Status = state
func (a *AWSVolumes) findEctdVolumes(p *ec2.DescribeVolumesOutput, volumes []*Volume) []*Volume {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small typo on function name (findEctdVolumes -> findEtcdVolumes) and could include a comment for the function so go-lint doesn't warn on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Etc etc ;) thanks will update

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does go-lint warn on lower case funcs? Through it only earned on upper case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does go-lint warn on lower case funcs? Through it only earned on upper case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yep my mistake, only warns on upper case :D

@@ -52,6 +52,12 @@ func (k *VolumeMountController) mountMasterVolumes() ([]*Volume, error) {
}

for _, v := range attached {
// Do not ever try to mount root devices
if v.LocalDevice == "/dev/sda1" || v.LocalDevice == "/dev/xvda" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the meat of the change? It's quite surprising to me that these volumes got through the etcd tag filters ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is was put in just in case. The filtering should take care of the problem, but we should never mount a root volume ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is just going to bite us later and it shouldn't be needed anyway. Also presumably the root volumes are always attached?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it should not be needed, but our code will go into an infinite loop if you try to mount /dev/sda1 which is the drive in the rhel launchconfig. There are more details in the issue, but we should not ever try to mount either of these partitions, and I would like code to safeguard that from occuring.

I can pull it, but 'shouldn't' has got me in trouble a bunch,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this case an infinite loop is a good outcome - we're trying to format the root partition.

If you really want this here, how about:

// Do not ever try to mount root devices
if v.LocalDevice == "/dev/sda1" || v.LocalDevice == "/dev/xvda" {
  glog.Fatalf("Attempted to mount root device")
}

But IMO, it's just not worth it ... we're as likely one day to legitimately mount on /dev/xvda.

@justinsb
Copy link
Member

What are the tags that are being added by the background process?

I'm thinking that either we need to filter more carefully (does this PR change the tag filters?) or if the background process is adding etcd tags, that it really can't be doing that - it's going to cause serious problems.

@chrislovecnm
Copy link
Contributor Author

The client is using the cloudwatch events to add the ec2 instance tags to the root volume of the ec2 instance. All tags that the ec2 instance has are added. The etcd tags are not on the instance and thus not added.


skipVolume := false
// skip all volumes only set to false if we find correct etcd tags
skipVolume := true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinsb this is the meat - switched the logic to excluding everything, unless the etcd tags exist.

@@ -52,6 +52,12 @@ func (k *VolumeMountController) mountMasterVolumes() ([]*Volume, error) {
}

for _, v := range attached {
// Do not ever try to mount root devices
if v.LocalDevice == "/dev/sda1" || v.LocalDevice == "/dev/xvda" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is just going to bite us later and it shouldn't be needed anyway. Also presumably the root volumes are always attached?

@@ -164,73 +164,83 @@ func newEc2Filter(name string, value string) *ec2.Filter {
func (a *AWSVolumes) findVolumes(request *ec2.DescribeVolumesInput) ([]*Volume, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So findVolumes is used in two places, once when querying all volumes and once when querying by ID.

Although in theory it's fine to filter both by etcd volume, I think it would be cleaner to add the filter at the end here:

func (a *AWSVolumes) FindVolumes() ([]*Volume, error) {
	request := &ec2.DescribeVolumesInput{}
	request.Filters = []*ec2.Filter{
		newEc2Filter("tag:"+awsup.TagClusterName, a.clusterTag),
		newEc2Filter("tag-key", awsup.TagNameRolePrefix+awsup.TagRoleMaster),
		newEc2Filter("availability-zone", a.zone),
	}

	return a.findVolumes(request)
}

Incidentally, that shows that it is the master role tag that is causing the problems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is fine, I will extract a third method for unit testing then. I did not mock out the unit test with the aws mocks, so that is why I left the aws code here, and extracted a func that I could test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have "k8s.io/etcd/events" and "k8s.io/etcd/main" tag keys. Not sure how we can filter on two tags. Should we make two calls?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking just post-filter (i.e. in code, not pushed-down to AWS); there should only be a handful of volumes with the master tag anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may be talking in circles since from my perspective we are doing a post-filter in the code that I wrote and tested. Sorry confused

Copy link
Member

@justinsb justinsb Aug 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findVolumes is a wrapper around DescribeVolumes, which maps volumes to our abstract model. But it doesn't do any filtering of mountable volumes (except in the error-case, i.e. we don't return mistagged volumes). The idea is that we're doing the minimal mapping in the cloud abstraction layer. The reason it matters is because of AttachVolume, which would now be broken for anything that isn't a mount of an etcd volume. Admittedly that is more layering/design than practice, because the only volumes protokube mounts (today) are etcd volumes.

I'm thinking the right place is probably actually in the caller of FindVolumes, therefore, as then it addresses GCE as well :-) (i.e. attachMasterVolumes)

But ... if you look at what's happening there, really the idea is that if you set the master tag on a volume, we're going to try mounting it on the masters. I'm still not convinced that the behaviour we have isn't desireable ... if you add a master volume tag incorrectly, we hang; we could do a lot worse :-)

@@ -93,6 +99,11 @@ func (k *VolumeMountController) mountMasterVolumes() ([]*Volume, error) {
}

func (k *VolumeMountController) safeFormatAndMount(device string, mountpoint string, fstype string) error {
// Do not attempt to mount root volumes ever
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here I think...

@chrislovecnm
Copy link
Contributor Author

Need to test, but updated with PR review comments

@chrislovecnm
Copy link
Contributor Author

@justinsb @KashifSaadat @gambol99 can I get a review?

I updated with @justinsb's recommendations. Also, I am doing the skip for the root volumes only in the AWS code, since "/dev/sda1" and "/dev/xdba" are defined as root volumes in ec2. You can define "/dev/sda1" as a non-root drive, and where Ectd is located in vSphere for instance.

@justinsb
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 22, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chrislovecnm, justinsb

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:
  • OWNERS [chrislovecnm,justinsb]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 074e251 into kubernetes:master Aug 22, 2017
@chrislovecnm chrislovecnm deleted the protokube-volume-mount-fixes branch December 30, 2017 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants