Skip to content
This repository has been archived by the owner on Dec 23, 2019. It is now read-only.

Add HPA to nginx-ingress-controller deployment #78

Merged
merged 11 commits into from
Jun 26, 2019
Merged

Add HPA to nginx-ingress-controller deployment #78

merged 11 commits into from
Jun 26, 2019

Conversation

MarcelMue
Copy link
Contributor

@MarcelMue MarcelMue commented Feb 1, 2019

Out of this discussion: https://gigantic.slack.com/archives/C3C7ZQXC1/p1549018125001100

Basic idea is to have HPA in order to have a better nginx-ingress setup for autoscaling.

towards https://github.com/giantswarm/giantswarm/pull/2206

@MarcelMue MarcelMue self-assigned this Feb 1, 2019
@@ -95,7 +95,7 @@ spec:
exec:
command:
- sleep
- "15"
- "60"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

15 seconds seemed short for me. I saw other charts with HPA use 60 seconds, so I adapted this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, this makes it more graceful

name: {{ .Values.controller.name }}
minReplicas: 2
# This is the maximum number of nodes we allow currently. Having more controllers than nodes should not happen.
maxReplicas: 100
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be hyper safe. We should probably set this lower, but I don't know of a sensible value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this should be a maxed out with number of nodes as @calvix suggested, or maybe a multiple of max number of nodes? but for now I guess let's use 100 and we should somehow test this

@@ -15,6 +15,7 @@ spec:
template:
metadata:
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: 'true'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sneaky: I forgot to add this annotation the default-backend.

@MarcelMue
Copy link
Contributor Author

I tested this on a tenant in gauss which has autoscaling. It seemed to work fine and the count was adjusted from 6 controllers to 2:

Name:                                                     nginx-ingress-controller
Namespace:                                                kube-system
Labels:                                                   app=nginx-ingress-controller
                                                          giantswarm.io/service-type=managed
                                                          k8s-app=nginx-ingress-controller
Annotations:                                              <none>
CreationTimestamp:                                        Fri, 01 Feb 2019 17:46:04 +0100
Reference:                                                Deployment/nginx-ingress-controller
Metrics:                                                  ( current / target )
  resource memory on pods  (as a percentage of request):  16% (103989248) / 50%
  resource cpu on pods  (as a percentage of request):     1% (5m) / 50%
Min replicas:                                             2
Max replicas:                                             100
Conditions:
  Type            Status  Reason            Message
  ----            ------  ------            -------
  AbleToScale     True    ReadyForNewScale  recommended size matches current size
  ScalingActive   True    ValidMetricFound  the HPA was able to successfully calculate a replica count from memory resource utilization (percentage of request)
  ScalingLimited  True    TooFewReplicas    the desired replica count is increasing faster than the maximum scale rate
Events:
  Type     Reason                        Age              From                       Message
  ----     ------                        ----             ----                       -------
  Warning  FailedGetResourceMetric       9m (x2 over 9m)  horizontal-pod-autoscaler  unable to get metrics for resource memory: no metrics returned from resource metrics API
  Warning  FailedComputeMetricsReplicas  9m (x2 over 9m)  horizontal-pod-autoscaler  failed to get memory utilization: unable to get metrics for resource memory: no metrics returned from resource metrics API

@MarcelMue MarcelMue requested review from a team February 1, 2019 16:59
- type: Resource
resource:
name: memory
targetAverageUtilization: 50
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did this value come from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked some other repos and found the same value here: https://gitlab.com/charts/gitlab/blob/master/charts/nginx/values.yaml#L155-156

Also @puja108 checked with some contacts who also recommended 50%. We should see how this works out though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manuel, the maintainer of the IC said (paraphrased) "nginx above 50% is a strong sign for high load because of the way nginx works, you might waste a bit of resources but it's more safe that way"

IMO it is still most probably less resources than what we have now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for CPU that make sense to me, but for memory not so sure

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, right, definitely needs to be tuned separately, and tested

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, as Vaclav said, 50% of CPU utilization but not memory.

@calvix
Copy link
Member

calvix commented Feb 4, 2019

I am just little bit scared how will this behave on real workloads as 50% memory can be reached very easily on clusters with many ingress definitions and will cause probably scaling to tons of pods just because of many buffers for nginx.

Can we at least somehow limit the number of pods to the number of nodes? that would be the most sane solution for now. We could use similar pod anti-affinity as we do for kvm pods

HPA for critical components shoudl be probabbly complete separate story and not sneaked into other stories. As the behavior is hard to predict from testing clusters.

@tuommaki
Copy link
Contributor

tuommaki commented Feb 4, 2019

I am just little bit scared how will this behave on real workloads as 50% memory can be reached very easily on clusters with many ingress definitions and will cause probably scaling to tons of pods just because of many buffers for nginx.

This is a good shout: When each IC instance must create some static resources for each ingress definition, it could be that HPA gets crazy no matter what the load is, when there's just enough ingresses. Another factor contributing to IC resource usage is number of CPU cores on machine. When IC is running on big machine with default config, the number of workers is equal to number of cores on the machine and this can contribute in some cases a lot to static memory usage. On some customers when there are 128 or 256 cores in node, there has been specific configuration to reduce number of workers, but debugging this is occasionally a bit hairy depending on overall situation / circumstances.

Just 2cents.

apiVersion: apps/v1
kind: Deployment
name: {{ .Values.controller.name }}
minReplicas: 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking we might want to make this configurable, and even if not I'd suggest changing it it to templated value with a default in values.yaml. same with max, and the 2 utilization values below (I'd do them separately)

use case for making min user configurable: some customers that run webscale loads know they never wanna go below x amount of ICs at least in certain times of year if not always.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree the templates should have values with sensible defaults.

Making this config user configurable makes sense to me. Since customers may need to tune the HPA settings on production clusters.

Copy link
Member

@puja108 puja108 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, besides needing more templated values in there.

We can add the user configurability of min (maybe even max values and utilization) in a 2nd PR. /cc @giantswarm/team-batman WDYT

Main thing before we roll this out: We need to test scenarios under load, maybe @giantswarm/sig-customer can help or has ideas?

Copy link
Contributor

@rossf7 rossf7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direction is good and making IC more dynamic is a big improvement. But yes needs careful testing before rollout.

apiVersion: apps/v1
kind: Deployment
name: {{ .Values.controller.name }}
minReplicas: 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree the templates should have values with sensible defaults.

Making this config user configurable makes sense to me. Since customers may need to tune the HPA settings on production clusters.

@puja108
Copy link
Member

puja108 commented Feb 4, 2019

we also talked in product that we need to be super careful here because of the nature of our customers, we'll be rolling out cluster-autoscaling even without this, so no rush

@@ -16,6 +16,12 @@ configmap:
server-tokens: "false"
worker-processes: "4"

autoscaling:
minReplicas: 2
maxReplicas: 15
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you go so low now? The thing is the default for now will be rolled to all customers. I'm pretty sure a lot of our customers go way higher than 15 (I've seen 64)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to check on Asgard talked to vaclav and he recommended lower. Going to test this a bit, none of these values are set in stone yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you go so low now? The thing is the default for now will be rolled to all customers. I'm pretty sure a lot of our customers go way higher than 15 (I've seen 64)

I'm genuinely interested in load metrics when someone wants to run even 15 nginx's not to say 64. I know on one customer there can be total of 64 but that's more of a mistake than a good practice. Single nginx can drive tons of load given that HW has enough resources and AFAIK we don't have a single tenant with hundreds of nodes which would justify that high number of instances. Please educate me 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, one more thing in here: Was it number of worker processes for single nginx IC or number of IC PODs? Ping @puja108

I'm asking this because default config is num_workers == num CPU cores in node which leads also to often wrong configuration when running on very big instance types (but this can be overridden with IC annotations).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IC pods, and partly that is not because of nginx being the bottle neck but the EC2 instance networking (which also depends on the type of machine), I think gitlab had 11, so might be ok, but like Tuomas is saying, it is dependent on the machine it runs on. Let's just play around and we'll see with time. maybe @giantswarm/sig-customer then needs to talk to customers and clarify that they might not need that many

@pipo02mix pipo02mix requested a review from rossf7 June 26, 2019 14:02
@pipo02mix
Copy link
Contributor

I added the HPA template flag. So by default, there is no HPA, we can enable when test shed good results and we are confident. But for now, it can be merged without affecting current state

@rossf7 rossf7 assigned pipo02mix and unassigned MarcelMue Jun 26, 2019
Copy link
Contributor

@rossf7 rossf7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM just a query on whether we can remove global from the templating.

@@ -0,0 +1,27 @@
{{- if .Values.global.hpa.enabled }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to use global. This was hacky and due to the k8scc migration. As we needed it to install the temp resources.

We still need to simplify the templates but because the IC migration is disabled for all providers I think it can be removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I do an extra section

@rossf7
Copy link
Contributor

rossf7 commented Jun 26, 2019

@pipo02mix Being extra paranoid what about bumping the release channel to 0-8-stable?

@pipo02mix
Copy link
Contributor

pipo02mix commented Jun 26, 2019

Tested in ginger.

Version bumped and flag in top-level field. Merging

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants