Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Security Group rules are removed when adding/removing worker nodes #64148

Closed
zegl opened this issue May 22, 2018 · 29 comments
Closed

AWS Security Group rules are removed when adding/removing worker nodes #64148

zegl opened this issue May 22, 2018 · 29 comments
Labels
area/provider/aws Issues or PRs related to aws provider kind/bug Categorizes issue or PR as related to a bug. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider.

Comments

@zegl
Copy link
Contributor

zegl commented May 22, 2018

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:

When scaling up or down the amount of worker nodes, all rules in the the security group managed by the controller-manager are removed.

Here are the logs from the controller manager when the problem started: It doesn't contain any errors or warnings.

I0522 12:43:25.588545       1 service_controller.go:636] Detected change in list of current cluster nodes. New node set: map[ip-10-1-128-213.eu-west-1.compute.internal:{} ip-10-1-170-254.eu-west-1.compute.internal:{} ip-10-1-149-189.eu-west-1.compute.internal:{} ip-10-1-149-238.eu-west-1.compute.internal:{}]
I0522 12:43:27.168490       1 service_controller.go:644] Successfully updated 15 out of 15 load balancers to direct traffic to the updated set of nodes
I0522 12:43:27.168682       1 event.go:218] Event(v1.ObjectReference{Kind:"Service", Namespace:"ingress-nginx", Name:"ingress-nginx", UID:"b6d3caf6-5290-11e8-a49a-0a96ccd212fe", APIVersion:"v1", ResourceVersion:"133452", FieldPath:""}): type: 'Normal' reason: 'UpdatedLoadBalancer' Updated load balancer with new hosts

I don't know where the number "15" comes from, there's only one LoadBalancer (3 ports, 4 workers, 3 AZs).

Here are the logs from the controller-manager when another node has taken over the leadership:

I0522 12:48:51.268858       1 event.go:218] Event(v1.ObjectReference{Kind:"Service", Namespace:"ingress-nginx", Name:"ingress-nginx", UID:"b6d3caf6-5290-11e8-a49a-0a96ccd212fe", APIVersion:"v1", ResourceVersion:"133452", FieldPath:""}): type: 'Normal' reason: 'EnsuringLoadBalancer' Ensuring load balancer
I0522 12:48:51.282935       1 controller_utils.go:1026] Caches are synced for disruption controller
I0522 12:48:51.282955       1 disruption.go:296] Sending events to api server.
I0522 12:48:52.414735       1 event.go:218] Event(v1.ObjectReference{Kind:"Service", Namespace:"ingress-nginx", Name:"ingress-nginx", UID:"b6d3caf6-5290-11e8-a49a-0a96ccd212fe", APIVersion:"v1", ResourceVersion:"133452", FieldPath:""}): type: 'Normal' reason: 'EnsuredLoadBalancer' Ensured load balancer

What you expected to happen:

Security group rules should not be removed when the LoadBalancer has not changed.

How to reproduce it (as minimally and precisely as possible):

  1. Run Kubernetes worker nodes on AWS,
  2. Add a LoadBalancer service that's using AWS NLBs
  3. Add/remove a worker nodes.

Anything else we need to know?:

I've only tested this with NLBs, ELBs might not be effected.

The Service has the following annotations:

service.beta.kubernetes.io/aws-load-balancer-internal: "true"
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"

Restarting the leader controller-manager solves the problem. The new leader will re-add the missing security group rules.

We're running multiple Kubernetes Clusters on the same AWS account.

Environment:

  • Kubernetes version (use kubectl version): v1.10.2
  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release): Ubuntu 16.04
  • Install tools: None
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. kind/bug Categorizes issue or PR as related to a bug. labels May 22, 2018
@zegl
Copy link
Contributor Author

zegl commented May 22, 2018

/sig AWS

@k8s-ci-robot k8s-ci-robot added sig/aws and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 22, 2018
@zegl
Copy link
Contributor Author

zegl commented May 22, 2018

I enabled verbose logging and reproduced the problem:

I0522 13:55:42.877392       1 aws_loadbalancer.go:657] Removing rule for client MTU discovery from the network load balancer ([0.0.0.0/0]) to instances (sg-04b4cbf9a3fa15db1)
I0522 13:55:42.877432       1 aws_loadbalancer.go:658] Removing rule for client traffic from the network load balancer ([0.0.0.0/0]) to instance (sg-04b4cbf9a3fa15db1)
I0522 13:55:42.877450       1 aws_loadbalancer.go:660] Removing rule for health check traffic from the network load balancer ([0.0.0.0/0]) to instance (sg-04b4cbf9a3fa15db1)
I0522 13:55:42.877472       1 aws_loadbalancer.go:657] Removing rule for client MTU discovery from the network load balancer ([0.0.0.0/0]) to instances (sg-04b4cbf9a3fa15db1)
I0522 13:55:42.877488       1 aws_loadbalancer.go:658] Removing rule for client traffic from the network load balancer ([0.0.0.0/0]) to instance (sg-04b4cbf9a3fa15db1)
I0522 13:55:42.877504       1 aws_loadbalancer.go:660] Removing rule for health check traffic from the network load balancer ([0.0.0.0/0]) to instance (sg-04b4cbf9a3fa15db1)
I0522 13:55:42.877524       1 aws_loadbalancer.go:657] Removing rule for client MTU discovery from the network load balancer ([0.0.0.0/0]) to instances (sg-04b4cbf9a3fa15db1)
I0522 13:55:42.877540       1 aws_loadbalancer.go:658] Removing rule for client traffic from the network load balancer ([0.0.0.0/0]) to instance (sg-04b4cbf9a3fa15db1)
I0522 13:55:42.877555       1 aws_loadbalancer.go:660] Removing rule for health check traffic from the network load balancer ([0.0.0.0/0]) to instance (sg-04b4cbf9a3fa15db1)
I0522 13:55:42.942559       1 aws.go:2879] Removing security group ingress: sg-04b4cbf9a3fa15db1 [{
  FromPort: 32551,
  IpProtocol: "tcp",
  IpRanges: [{
      CidrIp: "0.0.0.0/0",
      Description: "kubernetes.io/rule/nlb/client=ab6d3caf6529011e8a49a0a96ccd212f"
    }],
  ToPort: 32551
} {
  FromPort: 31093,
  IpProtocol: "tcp",
  IpRanges: [{
      CidrIp: "0.0.0.0/0",
      Description: "kubernetes.io/rule/nlb/client=ab6d3caf6529011e8a49a0a96ccd212f"
    }],
  ToPort: 31093
} {
  FromPort: 32030,
  IpProtocol: "tcp",
  IpRanges: [{
      CidrIp: "0.0.0.0/0",
      Description: "kubernetes.io/rule/nlb/client=ab6d3caf6529011e8a49a0a96ccd212f"
    }],
  ToPort: 32030
}]
I0522 13:55:43.064509       1 aws_loadbalancer.go:660] Removing rule for health check traffic from the network load balancer ([10.1.0.0/16]) to instance (sg-04b4cbf9a3fa15db1)
I0522 13:55:43.064542       1 aws_loadbalancer.go:660] Removing rule for health check traffic from the network load balancer ([10.1.0.0/16]) to instance (sg-04b4cbf9a3fa15db1)
I0522 13:55:43.064585       1 aws_loadbalancer.go:660] Removing rule for health check traffic from the network load balancer ([10.1.0.0/16]) to instance (sg-04b4cbf9a3fa15db1)
I0522 13:55:43.084370       1 aws.go:2879] Removing security group ingress: sg-04b4cbf9a3fa15db1 [{
  FromPort: 31093,
  IpProtocol: "tcp",
  IpRanges: [{
      CidrIp: "10.1.0.0/16",
      Description: "kubernetes.io/rule/nlb/health=ab6d3caf6529011e8a49a0a96ccd212f"
    }],
  ToPort: 31093
} {
  FromPort: 32030,
  IpProtocol: "tcp",
  IpRanges: [{
      CidrIp: "10.1.0.0/16",
      Description: "kubernetes.io/rule/nlb/health=ab6d3caf6529011e8a49a0a96ccd212f"
    }],
  ToPort: 32030
} {
  FromPort: 32551,
  IpProtocol: "tcp",
  IpRanges: [{
      CidrIp: "10.1.0.0/16",
      Description: "kubernetes.io/rule/nlb/health=ab6d3caf6529011e8a49a0a96ccd212f"
    }],
  ToPort: 32551
}]
I0522 13:55:43.164137       1 service_controller.go:326] Not persisting unchanged LoadBalancerStatus for service ingress-nginx/ingress-nginx to registry.
I0522 13:55:43.164469       1 event.go:218] Event(v1.ObjectReference{Kind:"Service", Namespace:"ingress-nginx", Name:"ingress-nginx", UID:"b6d3caf6-5290-11e8-a49a-0a96ccd212fe", APIVersion:"v1", ResourceVersion:"133452", FieldPath:""}): type: 'Normal' reason: 'EnsuredLoadBalancer' Ensured load balancer

A re-election causes the next kube-controller-manager to re-add the rules:

I0522 13:59:15.782597       1 aws_loadbalancer.go:650] Adding rule for client MTU discovery from the network load balancer ([0.0.0.0/0]) to instances (sg-04b4cbf9a3fa15db1)
I0522 13:59:15.782625       1 aws_loadbalancer.go:651] Adding rule for client traffic from the network load balancer ([0.0.0.0/0]) to instances (sg-04b4cbf9a3fa15db1)
I0522 13:59:15.782640       1 aws_loadbalancer.go:650] Adding rule for client MTU discovery from the network load balancer ([0.0.0.0/0]) to instances (sg-04b4cbf9a3fa15db1)
I0522 13:59:15.782649       1 aws_loadbalancer.go:651] Adding rule for client traffic from the network load balancer ([0.0.0.0/0]) to instances (sg-04b4cbf9a3fa15db1)
I0522 13:59:15.782659       1 aws_loadbalancer.go:650] Adding rule for client MTU discovery from the network load balancer ([0.0.0.0/0]) to instances (sg-04b4cbf9a3fa15db1)
I0522 13:59:15.782670       1 aws_loadbalancer.go:651] Adding rule for client traffic from the network load balancer ([0.0.0.0/0]) to instances (sg-04b4cbf9a3fa15db1)
I0522 13:59:15.801114       1 aws.go:2791] Existing security group ingress: sg-04b4cbf9a3fa15db1 [{
  FromPort: 3,
  IpProtocol: "icmp",
  IpRanges: [{
      CidrIp: "0.0.0.0/0",
      Description: "kubernetes.io/rule/nlb/mtu=ab6d3caf6529011e8a49a0a96ccd212f"
    }],
  ToPort: 4
}]
I0522 13:59:15.801170       1 aws.go:2819] Adding security group ingress: sg-04b4cbf9a3fa15db1 [{
  FromPort: 32551,
  IpProtocol: "tcp",
  IpRanges: [{
      CidrIp: "0.0.0.0/0",
      Description: "kubernetes.io/rule/nlb/client=ab6d3caf6529011e8a49a0a96ccd212f"
    }],
  ToPort: 32551
} {
  FromPort: 32030,
  IpProtocol: "tcp",
  IpRanges: [{
      CidrIp: "0.0.0.0/0",
      Description: "kubernetes.io/rule/nlb/client=ab6d3caf6529011e8a49a0a96ccd212f"
    }],
  ToPort: 32030
} {
  FromPort: 31093,
  IpProtocol: "tcp",
  IpRanges: [{
      CidrIp: "0.0.0.0/0",
      Description: "kubernetes.io/rule/nlb/client=ab6d3caf6529011e8a49a0a96ccd212f"
    }],
  ToPort: 31093
}]
I0522 13:59:15.941217       1 aws_loadbalancer.go:653] Adding rule for health check traffic from the network load balancer ([10.1.0.0/16]) to instances (sg-04b4cbf9a3fa15db1)
I0522 13:59:15.941249       1 aws_loadbalancer.go:653] Adding rule for health check traffic from the network load balancer ([10.1.0.0/16]) to instances (sg-04b4cbf9a3fa15db1)
I0522 13:59:15.941264       1 aws_loadbalancer.go:653] Adding rule for health check traffic from the network load balancer ([10.1.0.0/16]) to instances (sg-04b4cbf9a3fa15db1)
I0522 13:59:15.960322       1 aws.go:2791] Existing security group ingress: sg-04b4cbf9a3fa15db1 [{
  FromPort: 32551,
  IpProtocol: "tcp",
  IpRanges: [{
      CidrIp: "0.0.0.0/0",
      Description: "kubernetes.io/rule/nlb/client=ab6d3caf6529011e8a49a0a96ccd212f"
    }],
  ToPort: 32551
} {
  FromPort: 31093,
  IpProtocol: "tcp",
  IpRanges: [{
      CidrIp: "0.0.0.0/0",
      Description: "kubernetes.io/rule/nlb/client=ab6d3caf6529011e8a49a0a96ccd212f"
    }],
  ToPort: 31093
} {
  FromPort: 32030,
  IpProtocol: "tcp",
  IpRanges: [{
      CidrIp: "0.0.0.0/0",
      Description: "kubernetes.io/rule/nlb/client=ab6d3caf6529011e8a49a0a96ccd212f"
    }],
  ToPort: 32030
} {
  FromPort: 3,
  IpProtocol: "icmp",
  IpRanges: [{
      CidrIp: "0.0.0.0/0",
      Description: "kubernetes.io/rule/nlb/mtu=ab6d3caf6529011e8a49a0a96ccd212f"
    }],
  ToPort: 4
}]
I0522 13:59:15.960394       1 aws.go:2819] Adding security group ingress: sg-04b4cbf9a3fa15db1 [{
  FromPort: 32551,
  IpProtocol: "tcp",
  IpRanges: [{
      CidrIp: "10.1.0.0/16",
      Description: "kubernetes.io/rule/nlb/health=ab6d3caf6529011e8a49a0a96ccd212f"
    }],
  ToPort: 32551
} {
  FromPort: 32030,
  IpProtocol: "tcp",
  IpRanges: [{
      CidrIp: "10.1.0.0/16",
      Description: "kubernetes.io/rule/nlb/health=ab6d3caf6529011e8a49a0a96ccd212f"
    }],
  ToPort: 32030
} {
  FromPort: 31093,
  IpProtocol: "tcp",
  IpRanges: [{
      CidrIp: "10.1.0.0/16",
      Description: "kubernetes.io/rule/nlb/health=ab6d3caf6529011e8a49a0a96ccd212f"
    }],
  ToPort: 31093
}]
I0522 13:59:16.045626       1 service_controller.go:326] Not persisting unchanged LoadBalancerStatus for service ingress-nginx/ingress-nginx to registry.
I0522 13:59:16.045945       1 event.go:218] Event(v1.ObjectReference{Kind:"Service", Namespace:"ingress-nginx", Name:"ingress-nginx", UID:"b6d3caf6-5290-11e8-a49a-0a96ccd212fe", APIVersion:"v1", ResourceVersion:"133452", FieldPath:""}): type: 'Normal' reason: 'EnsuredLoadBalancer' Ensured load balancer

@zegl
Copy link
Contributor Author

zegl commented May 22, 2018

Honest question: We have one "k8s-managed-lb" shared between all worker nodes. Should we have one managed security group rule per instance instead? I don't know how that would work together with auto scaling groups, but it's worth asking.

@zegl
Copy link
Contributor Author

zegl commented May 22, 2018

/cc @micahhausler

@FrederikNJS
Copy link

I'm also testing out Kubernetes 1.10.1 with NLBs for ingress, and I'm seeing exactly the same problem.

@zegl
Copy link
Contributor Author

zegl commented May 22, 2018

@FrederikNS Thanks for letting me know.

I guess we'll have to use ELBs instead until this issue has been resolved.

@FrederikNJS
Copy link

It seems that this this might have something to do with using "Private Subnets".

I have 2 clusters running the same version, with the same NLB set up. The only difference between the two clusters is that one has all the worker nodes in public subnets and the other has all nodes in private subnets.

Only the private subnet cluster is experiencing this problem.

I can see from the logs, that right before the ports are removed from the security group, the log outputs Ignoring private subnet for public ELB for all the private worker subnets.

@micahhausler
Copy link
Member

Yea this looks like a bug. Thanks for opening this.

@vikasuy
Copy link

vikasuy commented May 24, 2018

related: #60825

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 22, 2018
@vikasuy
Copy link

vikasuy commented Aug 22, 2018

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 22, 2018
@kellycampbell
Copy link

I think this is fixed by #68422. I tested by replacing a node and watching the security group rules in AWS before/after.

kellycampbell pushed a commit to kellycampbell/kubernetes that referenced this issue Nov 11, 2018
This corrects a problem where valid security group ports were removed
unintentionally when updating a service or when node changes occur.

Fixes kubernetes#60825, kubernetes#64148
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 6, 2018
@FrederikNJS
Copy link

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 6, 2018
@FrederikNJS
Copy link

I have not been able to test this as kops doesn't support 1.13 yet. And I have not heard anyone report it fixed, so I'd like to keep this open until we have confirmation.

M00nF1sh pushed a commit to M00nF1sh/kubernetes that referenced this issue Jan 9, 2019
This corrects a problem where valid security group ports were removed
unintentionally when updating a service or when node changes occur.

Fixes kubernetes#60825, kubernetes#64148
M00nF1sh pushed a commit to M00nF1sh/kubernetes that referenced this issue Jan 9, 2019
This corrects a problem where valid security group ports were removed
unintentionally when updating a service or when node changes occur.

Fixes kubernetes#60825, kubernetes#64148
marshallbrekka pushed a commit to wearefair/kubernetes that referenced this issue Jan 11, 2019
This corrects a problem where valid security group ports were removed
unintentionally when updating a service or when node changes occur.

Fixes kubernetes#60825, kubernetes#64148
M00nF1sh pushed a commit to M00nF1sh/kubernetes that referenced this issue Jan 16, 2019
This corrects a problem where valid security group ports were removed
unintentionally when updating a service or when node changes occur.

Fixes kubernetes#60825, kubernetes#64148
M00nF1sh pushed a commit to M00nF1sh/kubernetes that referenced this issue Jan 16, 2019
This corrects a problem where valid security group ports were removed
unintentionally when updating a service or when node changes occur.

Fixes kubernetes#60825, kubernetes#64148
@tuapuikia
Copy link

kops 1.11 support k8s 1.11.x. I already tested it on 1.11.7 and it working as expected.

@arielb135
Copy link

Does this work on 1.10? was it patched?
what i've experienced - it removes it, and puts it back after ~1 minute - which means, full minute where the service is not active.

in our production cluster it didnt even put it back.

@jochenhebbrecht
Copy link

Hi,

Can somebody confirm in which version(s) of K8S this issue is fixed? We are bumping into the same issue and we are on 1.11.6.

Thanks,
Jochen

@kellycampbell
Copy link

kellycampbell commented Mar 14, 2019 via email

@jochenhebbrecht
Copy link

@kellycampbell thanks! Just located this file: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.11.md and I've noticed it should be fixed in 1.11.6. Unfortunately, that's the version we are on and we're still facing the issue.

From what I can read from the logs, it seems that the leader kubectl-controller-manager pod is stuck. It stopped updating the security group rules.

@kellycampbell
Copy link

kellycampbell commented Mar 14, 2019 via email

@jochenhebbrecht
Copy link

Yes, sorry, that title confused me.
Ok, we'll try to upgrade our K8S cluster and we'll verify if the issue pops up again or not.

rjaini added a commit to msazurestackworkloads/kubernetes that referenced this issue Apr 14, 2019
* Default extensions/v1beta1 Deployment's ProgressDeadlineSeconds to MaxInt32.

1. MaxInt32 has the same meaning as unset, for compatibility
2. Deployment controller treats MaxInt32 the same as unset (nil)

* Update API doc of ProgressDeadlineSeconds

* Autogen

1. hack/update-generated-protobuf.sh
2. hack/update-generated-swagger-docs.sh
3. hack/update-swagger-spec.sh
4. hack/update-openapi-spec.sh
5. hack/update-api-reference-docs.sh

* Lookup PX api port from k8s service

Fixes kubernetes#70033

Signed-off-by: Harsh Desai <harsh@portworx.com>

* cache portworx API port

- reused client whenever possible
- refactor get client function into explicit cluster-wide and local functions

Signed-off-by: Harsh Desai <harsh@portworx.com>

* Fix bug with volume getting marked as not in-use with pending op

Add test for verifying volume detach

* Fix flake with e2e test that checks detach while mount in progress

A volume can show up as in-use even before it gets attached
to the node.

* fix node and kubelet start times

* Bump golang to 1.10.7 (CVE-2018-16875)

* Kubernetes version v1.11.7-beta.0 openapi-spec file updates

* Add/Update CHANGELOG-1.11.md for v1.11.6.

* New sysctls to improve pod termination

* Retry scheduling on various events.

* Test rescheduling on various events. - Add resyncPeriod parameter for setupCluster to make resync period of scheduler configurable. - Add test case for static provisioning and delay binding storage class. Move pods into active queue on PV add/update events. - Add a stress test with scheduler resync to detect possible race conditions.

* fix predicate invalidation method

* Fixed clearing of devicePath after UnmountDevice

UnmountDevice must not clear devicepath, because such devicePath
may come from node.status (e.g. on AWS) and subsequent MountDevice
operation (that may be already enqueued) needs it.

* fix race condition when attach azure disk in vmss

fix gofmt issue

* Check for volume-subpaths directory in orpahaned pod cleanup

* Leave refactoring TODO

* Update BUILD file

* Protect Netlink calls with a mutex

* Fix race in setting nominated node

* autogenerated files

* update cloud provider boilerplate

The pull-kubernetes-verify presubmit is failing on
verify-cloudprovider-gce.sh because it is a new year and thus current
test generated code doesn't match the prior committed generated code in
the copyright header.  The verifier is removed in master now, so for
simplicity and rather than fixing the verifier to ignore the header
differences for prior supported branched, this commit is the result of
rerunning hack/update-cloudprovider-gce.sh.

Signed-off-by: Tim Pepper <tpepper@vmware.com>

* Cluster Autoscaler 1.3.5

* Move unmount volume util from pkg/volume/util to pkg/util/mount

* Update doCleanSubpaths to use UnmountMountPoint

* Add unit test for UnmountMountPoint

* Add comments around use of PathExists

* Move linux test utils to os-independent test file

* Rename UnmountMountPoint to CleanupMountPoint

* Add e2e test for removing the subpath directory

* change azure disk host cache to ReadOnly by default

change cachingMode default value for azure disk PV

revert back to ReadWrite in azure disk PV setting

* activate unschedulable pods only if the node became more schedulable

* make integration/verify script look for k8s under GOPATH

* Clean up artifacts variables in hack scripts

* use json format to get rbd image size

* change sort function of scheduling queue to avoid starvation when unschedulable pods are in the queue

When starvation heppens:
- a lot of unschedulable pods exists in the head of queue
- because condition.LastTransitionTime is updated only when condition.Status changed
- (this means that once a pod is marked unschedulable, the field never updated until the pod successfuly scheduled.)

What was changed:
- condition.LastProbeTime is updated everytime when pod is determined
unschedulable.
- changed sort function so to use LastProbeTime to avoid starvation
described above

Consideration:
- This changes increases k8s API server load because it updates Pod.status whenever scheduler decides it as
unschedulable.

Signed-off-by: Shingo Omura <everpeace@gmail.com>

* Fix action required for pr 61373

* Fix kube-proxy PodSecurityPolicy RoleBinding namespace

* Find current resourceVersion for waiting for deletion/conditions

* Add e2e test for file exec

* Fix nil panic propagation

* Add `metrics-port` to kube-proxy cmd flags.

* Fix AWS NLB security group updates

This corrects a problem where valid security group ports were removed
unintentionally when updating a service or when node changes occur.

Fixes kubernetes#60825, kubernetes#64148

* Unit test for aws_lb security group filtering

kubernetes#60825

* Do not snapshot scheduler cache before starting preemption

* Fix and improve preemption test to work with the new logic

* changelog duplicate

* Increase limit for object size in streaming serializer

* Attempt to deflake HPA e2e test

Increase CPU usage requested from resource consumer. Observed CPU usage
must:
- be consistently above 300 milliCPU (2 pods * 500 mCPU request per
pod * .3 target utilization) to avoid scaling down below 3.
- never exceed 600 mCPU (4 pods * ...) to avoid scaling up above 4.

Also improve logging in case this doesn't solve the problem.

Change-Id: Id1d9c0193ccfa063855b29c5274587f05c1eb4d3

* Kubernetes version v1.11.8-beta.0 openapi-spec file updates

* Add/Update CHANGELOG-1.11.md for v1.11.7.

* Correlate max-inflight values in GCE with master VM sizes

* Update to go1.10.8

* Don't error on error on deprecated native http_archive rule

* add goroutine to move unschedulablepods to activeq regularly

* Always select the in-memory group/version as a target when decoding from storage

* fix mac filtering in vsphere cloud provider

* fix mac filtering in vsphere cloud provider

* Fix kubernetes#73479 AWS NLB target groups missing tags

`elbv2.AddTags` doesn't seem to support assigning the same set of
tags to multiple resources at once leading to the following error:
  Error adding tags after modifying load balancer targets:
  "ValidationError: Only one resource can be tagged at a time"

This can happen when using AWS NLB with multiple listeners pointing
to different node ports.

When k8s creates a NLB it creates a target group per listener along
with installing security group ingress rules allowing the traffic to
reach the k8s nodes.

Unfortunately if those target groups are not tagged, k8s will not
manage them, thinking it is not the owner.

This small changes assigns tags one resource at a time instead of
batching them as before.

Signed-off-by: Brice Figureau <brice@daysofwonder.com>

* support multiple cidr vpc for nlb health check

* Use watch cache when rv=0 even when limit is set

* Avoid going back in time in watchcache watchers

* Bump the pod memory to higher levels to work on power

* vendor: bump github.com/evanphx/json-patch

Grab important bug fix that can cause a `panic()` from this package on
certain inputs. See evanphx/json-patch@73af7f5

Signed-off-by: Brandon Philips <brandon@ifup.org>

* vendor: bump github.com/evanphx/json-patch

Grab important bug fix that can cause a `panic()` from this package on
certain inputs. See evanphx/json-patch#64

* update json-patch to pick up bug fixes

* Importing latest json-patch.

* Set the maximum size increase the copy operations in a json patch can cause

* Adding a limit on the maximum bytes accepted to be decoded in a resource write request.

* Cluster Autoscaler 1.3.7

* Make intergration test helper public.

This was done in the master branch in
kubernetes#69902. The pull includes many
other changes, so we made this targeted patch.

* add integration test

* Loosing the request body size limit to 100MB to account for the size ratio between json and protobuf.

* Limit the number of operations in a single json patch to be 10,000

* Fix testing if an interface is the loopback

It's not guaranteed that the loopback interface only has the loopback
IP, in our environments our loopback interface is also assigned a 169
address as well.

* fix smb remount issue on Windows

add comments for doSMBMount func

fix comments about smb mount

fix build error

* Allow headless svc without ports to have endpoints

As cited in
kubernetes/dns#174 - this is documented to
work, and I don't see why it shouldn't work.  We allowed the definition
of headless services without ports, but apparently nobody tested it very
well.

Manually tested clusterIP services with no ports - validation error.

Manually tested services with negative ports - validation error.

New tests failed, output inspected and verified.  Now pass.

* do not return error on invalid mac address in vsphere cloud provider

* remove get azure accounts in the init process set timeout for get azure account operation

use const for timeout value

remove get azure accounts in the init process

add lock for account init

* add timeout in GetVolumeLimits operation

add timeout for getAllStorageAccounts

* record event on endpoint update failure

* fix parse devicePath issue on Azure Disk

* add retry for detach azure disk

add more logging info in detach disk

add azure disk attach/detach logs

* Fix find-binary to locate bazel e2e tests

* Reduce cardinality of admission webhook metrics

* Explicitly set GVK when sending objects to webhooks

* Kubernetes version v1.11.9-beta.0 openapi-spec file updates

* Add/Update CHANGELOG-1.11.md for v1.11.8.

* add Azure Container Registry anonymous repo support

apply fix for msi and fix test failure

* cri_stats_provider: overload nil as 0 for exited containers stats

Always report 0 cpu/memory usage for exited containers to make
metrics-server work as expect.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>

* Fix panic in kubectl cp command

* Adding a check to make sure UseInstanceMetadata flag is true to get data from metadata.

* add module 'nf_conntrack' in ipvs prerequisite check

* Ensure Azure load balancer cleaned up on 404 or 403

* Allow disable outbound snat when Azure standard load balancer is used

* Allow session affinity a period of time to setup for new services.

This is to deal with the flaky session affinity test.

* build/gci: bump CNI version to 0.7.5

* Fix size of repd e2e to use Gi

* Missed one changes.
rjaini added a commit to msazurestackworkloads/kubernetes that referenced this issue May 6, 2019
* Default extensions/v1beta1 Deployment's ProgressDeadlineSeconds to MaxInt32.

1. MaxInt32 has the same meaning as unset, for compatibility
2. Deployment controller treats MaxInt32 the same as unset (nil)

* Update API doc of ProgressDeadlineSeconds

* Autogen

1. hack/update-generated-protobuf.sh
2. hack/update-generated-swagger-docs.sh
3. hack/update-swagger-spec.sh
4. hack/update-openapi-spec.sh
5. hack/update-api-reference-docs.sh

* Lookup PX api port from k8s service

Fixes kubernetes#70033

Signed-off-by: Harsh Desai <harsh@portworx.com>

* cache portworx API port

- reused client whenever possible
- refactor get client function into explicit cluster-wide and local functions

Signed-off-by: Harsh Desai <harsh@portworx.com>

* Fix bug with volume getting marked as not in-use with pending op

Add test for verifying volume detach

* Fix flake with e2e test that checks detach while mount in progress

A volume can show up as in-use even before it gets attached
to the node.

* fix node and kubelet start times

* Bump golang to 1.10.7 (CVE-2018-16875)

* Kubernetes version v1.11.7-beta.0 openapi-spec file updates

* Add/Update CHANGELOG-1.11.md for v1.11.6.

* New sysctls to improve pod termination

* Retry scheduling on various events.

* Test rescheduling on various events. - Add resyncPeriod parameter for setupCluster to make resync period of scheduler configurable. - Add test case for static provisioning and delay binding storage class. Move pods into active queue on PV add/update events. - Add a stress test with scheduler resync to detect possible race conditions.

* fix predicate invalidation method

* Fixed clearing of devicePath after UnmountDevice

UnmountDevice must not clear devicepath, because such devicePath
may come from node.status (e.g. on AWS) and subsequent MountDevice
operation (that may be already enqueued) needs it.

* fix race condition when attach azure disk in vmss

fix gofmt issue

* Check for volume-subpaths directory in orpahaned pod cleanup

* Leave refactoring TODO

* Update BUILD file

* Protect Netlink calls with a mutex

* Fix race in setting nominated node

* autogenerated files

* update cloud provider boilerplate

The pull-kubernetes-verify presubmit is failing on
verify-cloudprovider-gce.sh because it is a new year and thus current
test generated code doesn't match the prior committed generated code in
the copyright header.  The verifier is removed in master now, so for
simplicity and rather than fixing the verifier to ignore the header
differences for prior supported branched, this commit is the result of
rerunning hack/update-cloudprovider-gce.sh.

Signed-off-by: Tim Pepper <tpepper@vmware.com>

* Cluster Autoscaler 1.3.5

* Move unmount volume util from pkg/volume/util to pkg/util/mount

* Update doCleanSubpaths to use UnmountMountPoint

* Add unit test for UnmountMountPoint

* Add comments around use of PathExists

* Move linux test utils to os-independent test file

* Rename UnmountMountPoint to CleanupMountPoint

* Add e2e test for removing the subpath directory

* change azure disk host cache to ReadOnly by default

change cachingMode default value for azure disk PV

revert back to ReadWrite in azure disk PV setting

* activate unschedulable pods only if the node became more schedulable

* make integration/verify script look for k8s under GOPATH

* Clean up artifacts variables in hack scripts

* use json format to get rbd image size

* change sort function of scheduling queue to avoid starvation when unschedulable pods are in the queue

When starvation heppens:
- a lot of unschedulable pods exists in the head of queue
- because condition.LastTransitionTime is updated only when condition.Status changed
- (this means that once a pod is marked unschedulable, the field never updated until the pod successfuly scheduled.)

What was changed:
- condition.LastProbeTime is updated everytime when pod is determined
unschedulable.
- changed sort function so to use LastProbeTime to avoid starvation
described above

Consideration:
- This changes increases k8s API server load because it updates Pod.status whenever scheduler decides it as
unschedulable.

Signed-off-by: Shingo Omura <everpeace@gmail.com>

* Fix action required for pr 61373

* Fix kube-proxy PodSecurityPolicy RoleBinding namespace

* Find current resourceVersion for waiting for deletion/conditions

* Add e2e test for file exec

* Fix nil panic propagation

* Add `metrics-port` to kube-proxy cmd flags.

* Fix AWS NLB security group updates

This corrects a problem where valid security group ports were removed
unintentionally when updating a service or when node changes occur.

Fixes kubernetes#60825, kubernetes#64148

* Unit test for aws_lb security group filtering

kubernetes#60825

* Do not snapshot scheduler cache before starting preemption

* Fix and improve preemption test to work with the new logic

* changelog duplicate

* Increase limit for object size in streaming serializer

* Attempt to deflake HPA e2e test

Increase CPU usage requested from resource consumer. Observed CPU usage
must:
- be consistently above 300 milliCPU (2 pods * 500 mCPU request per
pod * .3 target utilization) to avoid scaling down below 3.
- never exceed 600 mCPU (4 pods * ...) to avoid scaling up above 4.

Also improve logging in case this doesn't solve the problem.

Change-Id: Id1d9c0193ccfa063855b29c5274587f05c1eb4d3

* Kubernetes version v1.11.8-beta.0 openapi-spec file updates

* Add/Update CHANGELOG-1.11.md for v1.11.7.

* Correlate max-inflight values in GCE with master VM sizes

* Update to go1.10.8

* Don't error on error on deprecated native http_archive rule

* add goroutine to move unschedulablepods to activeq regularly

* Always select the in-memory group/version as a target when decoding from storage

* fix mac filtering in vsphere cloud provider

* fix mac filtering in vsphere cloud provider

* Fix kubernetes#73479 AWS NLB target groups missing tags

`elbv2.AddTags` doesn't seem to support assigning the same set of
tags to multiple resources at once leading to the following error:
  Error adding tags after modifying load balancer targets:
  "ValidationError: Only one resource can be tagged at a time"

This can happen when using AWS NLB with multiple listeners pointing
to different node ports.

When k8s creates a NLB it creates a target group per listener along
with installing security group ingress rules allowing the traffic to
reach the k8s nodes.

Unfortunately if those target groups are not tagged, k8s will not
manage them, thinking it is not the owner.

This small changes assigns tags one resource at a time instead of
batching them as before.

Signed-off-by: Brice Figureau <brice@daysofwonder.com>

* support multiple cidr vpc for nlb health check

* Use watch cache when rv=0 even when limit is set

* Avoid going back in time in watchcache watchers

* Bump the pod memory to higher levels to work on power

* vendor: bump github.com/evanphx/json-patch

Grab important bug fix that can cause a `panic()` from this package on
certain inputs. See evanphx/json-patch@73af7f5

Signed-off-by: Brandon Philips <brandon@ifup.org>

* vendor: bump github.com/evanphx/json-patch

Grab important bug fix that can cause a `panic()` from this package on
certain inputs. See evanphx/json-patch#64

* update json-patch to pick up bug fixes

* Importing latest json-patch.

* Set the maximum size increase the copy operations in a json patch can cause

* Adding a limit on the maximum bytes accepted to be decoded in a resource write request.

* Cluster Autoscaler 1.3.7

* Make intergration test helper public.

This was done in the master branch in
kubernetes#69902. The pull includes many
other changes, so we made this targeted patch.

* add integration test

* Loosing the request body size limit to 100MB to account for the size ratio between json and protobuf.

* Limit the number of operations in a single json patch to be 10,000

* Fix testing if an interface is the loopback

It's not guaranteed that the loopback interface only has the loopback
IP, in our environments our loopback interface is also assigned a 169
address as well.

* fix smb remount issue on Windows

add comments for doSMBMount func

fix comments about smb mount

fix build error

* Allow headless svc without ports to have endpoints

As cited in
kubernetes/dns#174 - this is documented to
work, and I don't see why it shouldn't work.  We allowed the definition
of headless services without ports, but apparently nobody tested it very
well.

Manually tested clusterIP services with no ports - validation error.

Manually tested services with negative ports - validation error.

New tests failed, output inspected and verified.  Now pass.

* do not return error on invalid mac address in vsphere cloud provider

* remove get azure accounts in the init process set timeout for get azure account operation

use const for timeout value

remove get azure accounts in the init process

add lock for account init

* add timeout in GetVolumeLimits operation

add timeout for getAllStorageAccounts

* record event on endpoint update failure

* fix parse devicePath issue on Azure Disk

* add retry for detach azure disk

add more logging info in detach disk

add azure disk attach/detach logs

* Fix find-binary to locate bazel e2e tests

* Reduce cardinality of admission webhook metrics

* Explicitly set GVK when sending objects to webhooks

* Kubernetes version v1.11.9-beta.0 openapi-spec file updates

* Add/Update CHANGELOG-1.11.md for v1.11.8.

* add Azure Container Registry anonymous repo support

apply fix for msi and fix test failure

* cri_stats_provider: overload nil as 0 for exited containers stats

Always report 0 cpu/memory usage for exited containers to make
metrics-server work as expect.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>

* Fix panic in kubectl cp command

* Adding a check to make sure UseInstanceMetadata flag is true to get data from metadata.

* Update Cluster Autoscaler version to 1.3.8

* add module 'nf_conntrack' in ipvs prerequisite check

* Ensure Azure load balancer cleaned up on 404 or 403

* Allow disable outbound snat when Azure standard load balancer is used

* kubelet: updated logic of verifying a static critical pod

- check if a pod is static by its static pod info
- meanwhile, check if a pod is critical by its corresponding mirror pod info

* Allow session affinity a period of time to setup for new services.

This is to deal with the flaky session affinity test.

* Restore username and password kubectl flags

* build/gci: bump CNI version to 0.7.5

* Fix size of repd e2e to use Gi

* bump repd min size in e2es

* allows configuring NPD release and flags on GCI and add cluster e2e test

* allows configuring NPD image version in node e2e test and fix the test

* Kubernetes version v1.11.10-beta.0 openapi-spec file updates

* Add/Update CHANGELOG-1.11.md for v1.11.9.

* stop vsphere cloud provider from spamming logs with `failed to patch IP` Fixes: kubernetes#75236

* Restore *filter table for ipvs

Resolve: kubernetes#68194

* Update gcp images with security patches

[stackdriver addon] Bump prometheus-to-sd to v0.5.0 to pick up security fixes.
[fluentd-gcp addon] Bump fluentd-gcp-scaler to v0.5.1 to pick up security fixes.
[fluentd-gcp addon] Bump event-exporter to v0.2.4 to pick up security fixes.
[fluentd-gcp addon] Bump prometheus-to-sd to v0.5.0 to pick up security fixes.
[metatada-proxy addon] Bump prometheus-to-sd v0.5.0 to pick up security fixes.

* Bump debian-iptables to v11.0.2.

* Updated Regional PD failover test to use node taints instead of instance group deletion

* Updated regional PD minimum size; changed regional PD failover test to use StorageClassTest to generate PVC template

* Removed istio related addon manifests, as the directory is deprecated.

* Use Node-Problem-Detector v0.6.3 on GCI

* Increase default maximumLoadBalancerRuleCount to 250

* Fix Azure SLB support for multiple backend pools

Azure VM and vmssVM support multiple backend pools for the same SLB, but
not for different LBs.

* disable HTTP2 ingress test

* Upgrade compute API to version 2019-03-01

* Replace vmss update API with instance-level update API

* Cleanup codes that not required any more

* Cleanup interfaces and add unit tests

* Update vendors

* Create the "internal" firewall rule for kubemark master.

This is equivalent to the "internal" firewall rule that is created for
the regular masters.
The main reason for doing it is to allow prometheus scraping metrics
from various kubemark master components, e.g. kubelet.

Ref. kubernetes/perf-tests#503

* Move back APIs to Azure stack supported version (#19)
@jaybe78
Copy link

jaybe78 commented May 18, 2019

I also have a kubernetes cluster in a vpc and I still get the same issue. i'm on kubernetes 1.11.9.
My 2 worker nodes do not pass health check.
@jochenhebbrecht @kellycampbell @arielb135 Have you guys managed to make it work on your side ?

@jochenhebbrecht
Copy link

Hi @jaybe78 .

Yes, we managed to make it work by upgrading to 1.11.7. We're currently still on that version and we are no longer bumping into this issue.

Jochen

@M00nF1sh
Copy link
Contributor

@jaybe78 I believe the original issue is already fixed by #68422 (which got back ported to older versions).

Would you help share your serviceSpec and worker node securitygroup settings on AWS to me(@M00nF1sh in k8s slack channel), i can help take a look

@k8s-ci-robot k8s-ci-robot added area/provider/aws Issues or PRs related to aws provider needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. and removed sig/aws labels Aug 6, 2019
@dhanvi
Copy link

dhanvi commented Aug 27, 2019

/sig cloud-provider

@k8s-ci-robot k8s-ci-robot added sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 27, 2019
@zegl
Copy link
Contributor Author

zegl commented Aug 27, 2019

I'll close this. The bug has been fixed (and released) on v1.11, v1.12, v1.13, and v1.14.

@zegl zegl closed this as completed Aug 27, 2019
@1hanymhajna
Copy link

1hanymhajna commented Jan 20, 2022

Hello,
We just saw the same behavior in v1.18 version,
During node replacement (Scaling up new nodes, and moving old nodes to cordon mode) we notice that the sg has been updated automatically and the relevant ELB rule has been removed.
We are using EKS-1.18, and our load balancer type is ELB, nodes are managed by us.

We can see the event triggered by the workers (in cloudtrail) so it approved it happened automatically from the controller side ( not a human mistake)
But it just happened suddenly, we tried the same flow in other clusters this didn't happen there.
I think ti fix has a specific condition that triggers the bug again or something.

We saw it as well in dev cluster, when we lost some spot nodes it also happened, but again it not happen all the time. so still don't have specific flow how to reproduce it

@zegl

@1hanymhajna
Copy link

The fix is just to handle NLB type, doesn't need to make it global? (ELB, ALB) ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/aws Issues or PRs related to aws provider kind/bug Categorizes issue or PR as related to a bug. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider.
Projects
None yet
Development

No branches or pull requests