Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes issue where PVCs using standard StorageClass create PDs in disks in wrong zone in multi-zone GKE clusters #52322

Merged
merged 1 commit into from
Nov 21, 2017

Conversation

davidz627
Copy link
Contributor

@davidz627 davidz627 commented Sep 12, 2017

Fixes #50115

Changed GetAllZones to only get zones with nodes that are currently running (renamed to GetAllCurrentZones). Added E2E test to confirm this behavior.

Fix a bug in GCE multizonal clusters where PersistentVolumes were sometimes created in zones without nodes.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 12, 2017
@k8s-github-robot k8s-github-robot added the do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. label Sep 12, 2017
@xiangpengzhao
Copy link
Contributor

/release-note-none
/ok-to-test

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 12, 2017
@davidz627 davidz627 force-pushed the multizoneWrongZone branch 2 times, most recently from 2ff6383 to 861ca18 Compare September 12, 2017 16:53

for _, node := range nodeList.Items {
labels := node.ObjectMeta.Labels
zone := labels[kubeletapis.LabelZoneFailureDomain]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check for label doesn't exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

var extraZone string
for _, zone := range allZonesInRegion {
if !expectedZones.Has(zone.Name) {
extraZone = zone.Name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can break after this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


// If no zones left to create an extra instance we screwed and we stop the test right here
Expect(extraZone).NotTo(Equal(""), fmt.Sprintf("No extra zones available in region %s", region))
framework.Logf("We are going to start a compute instance in unused zone: %v\n", extraZone)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use By() here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

gotZones, err := gceCloud.GetAllCurrentZones()
Expect(err).NotTo(HaveOccurred())

Expect(gotZones.Equal(expectedZones)).To(BeTrue(), fmt.Sprintf("Expected zones: %v, Got Zones: %v", expectedZones, gotZones))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to sort the zones before comparing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is set equality. It does a carnality comparison then checks whether one is a superset of the other.

func OnlyAllowNodeZones(f *framework.Framework, zoneCount int, image string) {
gceCloud, err := framework.GetGCECloud()
Expect(err).NotTo(HaveOccurred())
gceCloud.SetClientSet(f.ClientSet)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to set this earlier in e2e.go when we initialize the cloud provider?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not without pretty significant code changes I think. I looked at the way the framework is setting it up for f.ClientSet and its pretty involved and has some pretty complex dependancies. Also f.ClientSet isn't set up until the start of the specific test so we can't use it in e2e.go.

lmk if you want to talk about this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think i've actually solved this. Testing now.

@msau42
Copy link
Member

msau42 commented Sep 12, 2017

/assign

test/e2e/e2e.go Outdated
if cloudConfig.Zone == "" && framework.TestContext.CloudConfig.MultiZone {
zones, err := gceCloud.GetAllZones()
zones, err := gceCloud.GetAllCurrentZones()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to work? This is done early in e2e initialization, where client is not set yet?

zones := sets.NewString()
nodeList, err := gce.client.Core().Nodes().List(metav1.ListOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check for client nil

@@ -61,8 +62,87 @@ var _ = framework.KubeDescribe("Multi-AZ Clusters", func() {
It("should schedule pods in the same zones as statically provisioned PVs", func() {
PodsUseStaticPVsOrFail(f, (2*zoneCount)+1, image)
})

It("should only be allowed to provision PDs in zones where nodes exist", func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also control the zones that volumes are provisioned in, if we name them correctly. So maybe a test case we can add is to create PVCs with specific names, and make sure they get spread across the appropriate zones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this test case be added in this same PR or should I create a new PR/issue for it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put it as part of this test, maybe instead of (or in addition to) comparing the GetAllZones call. Because from a user's standpoint, they would see issues in the volume provisioning. The GetAllZones call is just an implementation detail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it in addition to. So the GetAllZones check still exists, the PVC checking follows.

@davidz627
Copy link
Contributor Author

addressed comments

@davidz627
Copy link
Contributor Author

/retest

}
nodeList, err := client.Core().Nodes().List(metav1.ListOptions{})
if err != nil {
log.Fatalf("Failed to list nodes: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can let the caller log the error if they want to.

zones.Insert(zone)
} else {
return nil, fmt.Errorf("Could not find zone for node %v", node.ObjectMeta.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should log and skip. I don't want to fail the whole provisioning operation just because one of the nodes is in a bad state.

return zones, nil
}

// GetAllManagedZones gets all the zones managed by k8s
func (gce *GCECloud) GetAllManagedZones() (sets.String, error) {
if gce.managedZones != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is this field set? Is it during initialization and never changes after that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it set from the config and doesn't ever change after. I feel like the zones returned by this method could be different from GetAllCurrentZones

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how useful "zones when initialized" is considering you can add/remove zones during runtime.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - that was my point when I suggested removing it. It seems @davidz627 agreed for removing this function completely in another comment I added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something that confuses me. There are a few places that iterate through managedZones, but if managedZones never changed after initialization, how does it get new zones that are added later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like all other uses of managedZones makes sense to me. Just this one seems to be related to the issue this PR is trying to solve. Can anyone confirm?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's checking that the zone passed in is in the managedZones list, but maybe it makes more sense to compare against GetAllCurrentZones. The actual selection of the zone is done by ChooseZoneForVolumes, which calls GetAllCurrentZones already. But it's also possible for the user to explicitly set an invalid zone as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saad-ali we were wondering if we should restrict both CreateDisk and CreateRegionalDisk to only being able to create disks in zones with nodes in them (a la GetAllCurrentZones). Or if they should be left in the current state (to allow users to create disks in any zone in the region).

In its current state we believe that a user trying to create a Disk in a zone without a node would be most likely a mistake (since the PD would be unusable) and should trigger an error. However, it could be possible that advanced users may want to be able to create PDs in zones without nodes for regionaldisks for manual failover.

What do you think, Saad? It seems to be a tradeoff between failing fast for standard use-cases, or allowing users to do hypothetical funky things manually that we may not want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW @msau42 and I think surfacing an error for provisioning a PD in a zone without nodes is the way to go here. But I am interested to see if this may have some unintended side effects or if we missed some other use cases.

Copy link
Member

@saad-ali saad-ali Sep 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saad-ali we were wondering if we should restrict both CreateDisk and CreateRegionalDisk to only being able to create disks in zones with nodes in them (a la GetAllCurrentZones). Or if they should be left in the current state (to allow users to create disks in any zone in the region).

In its current state we believe that a user trying to create a Disk in a zone without a node would be most likely a mistake (since the PD would be unusable) and should trigger an error. However, it could be possible that advanced users may want to be able to create PDs in zones without nodes for regionaldisks for manual failover.

What do you think, Saad? It seems to be a tradeoff between failing fast for standard use-cases, or allowing users to do hypothetical funky things manually that we may not want.

FWIW @msau42 and I think surfacing an error for provisioning a PD in a zone without nodes is the way to go here. But I am interested to see if this may have some unintended side effects or if we missed some other use cases.

I agree, provisioning a volume in a zone that does not have any active nodes should result in an error.

expectedZones, err := gceCloud.GetAllCurrentZones()
Expect(err).NotTo(HaveOccurred())
framework.Logf("Expected zones: %v\n", expectedZones)
// Get all the zones in this current region
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add more newlines between operations for better readability?

test/e2e/e2e.go Outdated
if err != nil {
glog.Fatal("Error loading client: ", err)
}
gceCloud.SetClientSet(cs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a production environment, when is clientset set? Can we add it to CloudConfig instead of a new method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be set using a method "Initialize" in gce.go. It's the line "gce.client = clientBuilder.ClientOrDie("cloud-provider")".

Looks like in production code this is done separately from creating the gceCloud object itself because it is an interface method and is called for any of the many cloud objects. It is called here:

func StartControllers(s *options.CloudControllerManagerServer, kubeconfig *restclient.Config, clientBuilder controller.ControllerClientBuilder, stop <-chan struct{}, recorder record.EventRecorder, cloud cloudprovider.Interface) error {

Additionally I'm not sure that the clientset is a "config" item. In production code the cloudconfig is generated from reading in a config file and the clientset doesn't really seem to fit into that paradigm.

Setting the clientset with a method (although not the same method) after it is created is consistent with how it works in the production code. I think adding it to the CloudConfig for the e2e tests would actually make it inconsistent with the production code. I am open to adding it to the CloudConfig if you still think it is the right move, just wanted to provide you with additional info before we make a decision.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok that's fine then. Thanks for the detailed explanation!

@@ -422,7 +422,7 @@ var _ = SIGDescribe("Dynamic Provisioning", func() {
Expect(err).NotTo(HaveOccurred())

// Get all k8s managed zones
managedZones, err = gceCloud.GetAllZones()
managedZones, err = gceCloud.GetAllManagedZones()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we can't have "GetAllCurrentZones() here?

return zones, nil
}

// GetAllManagedZones gets all the zones managed by k8s
func (gce *GCECloud) GetAllManagedZones() (sets.String, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this method? I would really like to get rid of it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently there is only one place in test using this method and I personally think that we can use GetAllCurrentZones() there too.

Note that currently "managed_zones" can be only set to two values:

  • one particular zones
  • all zones from the region

There is no way to set it differently in production, thus we probably shouldn't set it differently in tests too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. I have removed this method and replaced the one use with GetAllCurrentZones

// TODO: Caching, but this is currently only called when we are creating a volume,
// which is a relatively infrequent operation, and this is only # zones API calls
// GetAllCurrentZones returns all the zones in which nodes are currently running
func (gce *GCECloud) GetAllCurrentZones() (sets.String, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a TODO to add some caching to it?

If there will be too many calls to it, it may overload apiaserver.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we would like to make it watch-based.

@davidz627
Copy link
Contributor Author

All comments addressed.

Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment. Also please squash commits

Other than that LGTM

@@ -421,8 +421,8 @@ var _ = SIGDescribe("Dynamic Provisioning", func() {
gceCloud, err := framework.GetGCECloud()
Expect(err).NotTo(HaveOccurred())

// Get all k8s managed zones
managedZones, err = gceCloud.GetAllZones()
// Get all k8s managed zones (in this case it is just the single zone)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's actually true. If you will be running this test for multizone clusters, these actually can be more than one zone.

@davidz627
Copy link
Contributor Author

Lots of unit test issues because of change to CreateDisk. Have to modify many of them so that may need to be looked at later.

@davidz627
Copy link
Contributor Author

comment addressed. Modified unit tests and level of indirection for GetAllCurrentZones. Please review new commit changes.

@@ -37,7 +38,8 @@ func TestCreateDisk_Basic(t *testing.T) {
/* Arrange */
gceProjectId := "test-project"
gceRegion := "fake-region"
fakeManager := newFakeManager(gceProjectId, gceRegion)
zonesWithNodes := []string{"zone1"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the managedZones setting in these tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

many of the tests rely on "managed zones", especially the ones that delete instances. I can remove them on the ones that don't require doing a delete if you think that would be useful

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nah, I'm just trying to see if there's a way we can avoid duplicate definitions. Maybe can you change the managedZones to use zonesWithNodes?

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 21, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: davidz627, msau42, saad-ali, wojtek-t

Associated issue: 50115

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link

[MILESTONENOTIFIER] Milestone Pull Request Current

@davidz627 @foxish @krousey @msau42 @saad-ali

Note: If this pull request is not resolved or labeled as priority/critical-urgent by Wed, Nov 22 it will be moved out of the v1.9 milestone.

Pull Request Labels
  • sig/auth sig/storage: Pull Request will be escalated to these SIGs if needed.
  • priority/important-soon: Escalate to the pull request owners and SIG owner; move out of milestone after several unsuccessful escalation attempts.
  • kind/bug: Fixes a bug discovered during the current release.
Help

@davidz627
Copy link
Contributor Author

/retest

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Nov 21, 2017

@davidz627: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-e2e-gce-bazel 38fd65b link /test pull-kubernetes-e2e-gce-bazel
pull-kubernetes-e2e-gce-etcd3 38fd65b link /test pull-kubernetes-e2e-gce-etcd3

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 80e1c79 into kubernetes:master Nov 21, 2017
@wojtek-t wojtek-t modified the milestones: v1.9, v1.7 Nov 21, 2017
@wojtek-t
Copy link
Member

Added cherrypick-candidate label - this should be cherrypicked to 1.7 and 1.8 branches.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Nov 28, 2017
@wojtek-t
Copy link
Member

@davidz627 - I wanted to do automated cherrypick of it to 1.8 branch but it generated conflicts. Can you please create this cherrypick - I think it's better since you wrote this code.

@wojtek-t
Copy link
Member

Thanks @davidz627 !

k8s-github-robot pushed a commit that referenced this pull request Dec 1, 2017
…322-upstream-release-1.8

Automatic merge from submit-queue.

Automated cherry pick of #52322 upstream release 1.8

Automated cherry pick of #52322 upstream release 1.8

#52322: Fixes issue where PVCs using `standard` StorageClass create PDs in disks in wrong zone in multi-zone GKE clusters

```release-note
Fix a bug in GCE multizonal clusters where PersistentVolumes were sometimes created in zones without nodes.
```
k8s-github-robot pushed a commit that referenced this pull request Dec 2, 2017
…322-upstream-release-1.7

Automatic merge from submit-queue.

Automated cherry pick of #52322 upstream release 1.7

Cherry pick of #52322 on release-1.7.

#52322: Fixes issue where PVCs using `standard` StorageClass create PDs in disks in wrong zone in multi-zone GKE clusters

```release-note
Fix a bug in GCE multizonal clusters where PersistentVolumes were sometimes created in zones without nodes.
```
@k8s-cherrypick-bot
Copy link

Commit found in the "release-1.7" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet