-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KubeAPIErrorBudgetBurn warnings fire after upgrade to 1.18.6 #9698
Comments
Looks like something's trying to talk to the apiserver with non-TLS HTTP. |
@demisx If you look what actually
Using those formulas I've found that in my case it's slow requests causing a problem. I've installed API Server dashboard (for convenience) and found that some LISTs are taking longer than they should: Apiserver logs around similar time (combined from 3 masters):
To investigate further I've enabled apiserver audit log but cannot determine problems yet. I would be happy to get the tips how to proceed with debugging or how to interpret those logs. |
I am not sure whether
I don't see any errors written to the api server log. Just these "LIST" requests from the flux daemon. Pretty lost at this point.
|
Has anyone been able to figure the source of the problem? We are constantly getting hit with warning alerts from all 1.18.6 clusters. |
After disabling Flux and Helm Operator, my LIST operation is around 50 ms now and consistent with the others. However, the warning is still firing. Do you guys know why the Prometheus query UI returns different values when queried by the record rule name and the expression it corresponds to? For example, let's take
When queried by the record rule name I get But, when queried by the expression itself I get Aren't these numbers supposed to be the same? |
One observation about the config, the etcd verrsion is incorrect. It should be 3.4.3. |
@hakman Thank you for noting that. Will this be something that kops will correct in future releases or do I need to edit the cluster manually? |
This is something you manually added to your cluster config and kops won't override it. You should change it manually to 3.4.3. |
I do not have etcd version specified but have the same problem so most probably it's not the root cause of this. |
@hakman Hmm.. 🤔 I've never touched the etcd version. I've always gone through |
There was a bug with the |
@hakman What would be the proper way to upgrade an existing cluster to v3.4.3? Wait for the next kops 1.18.x release? |
Not sure this change will make it to the next release. The effects are the same as setting the version to 3.4.3 in cluster config, nothing fancy. Upgrade should be probably done by going though 3.3.x also, to ensure the compatibility (at least this is what I think). |
@xoxbet Do you get different values too when querying by the record rule name (e.g. |
@demisx I get the same result when using raw expression and pre-generated record. Where from did you take expression? Yours is a bit different than mines ( Check if expression you use to query prometheus matches the one here: Interestingly, our clusters doesn't share too much in common, we are even using different networking components.
|
The fact we have different expressions was suspicious for me, so I've updated Prometheus operator from 8.13.8 to 9.3.1 and updated prometheusrules too (I have prometheus rules versioned in my git repo because I have some of alerts customised). For now (1 hour after the upgrade) I don't see |
Ok, I'm 99 percent sure that an issue is an old record expression. Compare the result of old (left) and updated (right) - the result very different @demisx try updating prometheus rules and let's see if we can close this issue. |
@xoxbet Oh, great catch 👍. I must've grabbed those rules from the wrong place. Today, I've copied them into the query UI from the cluster using your |
1. What
kops
version are you running? The commandkops version
, will displaythis information.
Version 1.18.0
2. What Kubernetes version are you running?
kubectl version
will print theversion if a cluster is running or provide the Kubernetes version specified as
a
kops
flag.Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-16T00:04:31Z", GoVersion:"go1.14.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-15T16:51:04Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
Cluster upgrade from 1.16 to 1.18
5. What happened after the commands executed?
The following warning alert started to fire (see below). I am not sure where to look for the problem and how to fix it.
6. What did you expect to happen?
No warning alerts
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml
to display your cluster manifest.You may want to remove your cluster name and other sensitive information.
9. Anything else do we need to know?
I am not seeing anything crazy written to the
kube-apiserver
log. Just this (see below). Should I check something else?The text was updated successfully, but these errors were encountered: