KubeAPIErrorBudgetBurn warnings fire after upgrade to 1.18.6 #9698

demisx · 2020-08-06T15:50:18Z

1. What kops version are you running? The command kops version, will display
this information.

Version 1.18.0

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-16T00:04:31Z", GoVersion:"go1.14.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-15T16:51:04Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

3. What cloud provider are you using?

AWS

4. What commands did you run? What is the simplest way to reproduce this issue?

Cluster upgrade from 1.16 to 1.18

5. What happened after the commands executed?
The following warning alert started to fire (see below). I am not sure where to look for the problem and how to fix it.

alert: KubeAPIErrorBudgetBurn
expr: sum(apiserver_request:burnrate3d)
  > (1 * 0.01) and sum(apiserver_request:burnrate6h) > (1 * 0.01)
for: 3h
labels:
  severity: warning
annotations:
  message: The API server is burning too much error budget
  runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorbudgetburn

6. What did you expect to happen?
No warning alerts

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: "2019-10-22T17:23:28Z"
  generation: 10
  name: prod.dl.org
spec:
  api:
    loadBalancer:
      type: Public
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws
  configBase: s3://kops-dl-org-prod/prod.dl.org
  dnsZone: Z1KK30M1ET94CJ
  etcdClusters:
  - cpuRequest: 200m
    etcdMembers:
    - encryptedVolume: true
      instanceGroup: master-us-west-2a
      name: a
    - encryptedVolume: true
      instanceGroup: master-us-west-2b
      name: b
    - encryptedVolume: true
      instanceGroup: master-us-west-2d
      name: d
    memoryRequest: 100Mi
    name: main
    version: 3.2.24
  - cpuRequest: 100m
    etcdMembers:
    - encryptedVolume: true
      instanceGroup: master-us-west-2a
      name: a
    - encryptedVolume: true
      instanceGroup: master-us-west-2b
      name: b
    - encryptedVolume: true
      instanceGroup: master-us-west-2d
      name: d
    memoryRequest: 100Mi
    name: events
    version: 3.2.24
  iam:
    allowContainerRegistry: true
    legacy: false
  kubeAPIServer:
    featureGates:
      TTLAfterFinished: "true"
  kubeControllerManager:
    featureGates:
      TTLAfterFinished: "true"
  kubeProxy:
    metricsBindAddress: 0.0.0.0
  kubelet:
    anonymousAuth: false
    featureGates:
      TTLAfterFinished: "true"
  kubernetesApiAccess:
  - 0.0.0.0/0
  kubernetesVersion: 1.18.6
  masterInternalName: api.internal.prod.dl.org
  masterPublicName: api.prod.dl.org
  networkCIDR: 172.20.0.0/16
  networking:
    calico:
      majorVersion: v3
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 0.0.0.0/0
  subnets:
  - cidr: 172.20.32.0/19
    name: us-west-2a
    type: Private
    zone: us-west-2a
  - cidr: 172.20.64.0/19
    name: us-west-2b
    type: Private
    zone: us-west-2b
  - cidr: 172.20.96.0/19
    name: us-west-2d
    type: Private
    zone: us-west-2d
  - cidr: 172.20.0.0/22
    name: utility-us-west-2a
    type: Utility
    zone: us-west-2a
  - cidr: 172.20.4.0/22
    name: utility-us-west-2b
    type: Utility
    zone: us-west-2b
  - cidr: 172.20.8.0/22
    name: utility-us-west-2d
    type: Utility
    zone: us-west-2d
  topology:
    dns:
      type: Public
    masters: private
    nodes: private

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2020-03-15T21:02:54Z"
  generation: 2
  labels:
    kops.k8s.io/cluster: prod.dl.org
  name: bastions
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20200716
  machineType: t2.micro
  maxSize: 0
  minSize: 0
  nodeLabels:
    kops.k8s.io/instancegroup: bastions
  role: Bastion
  subnets:
  - utility-us-west-2a

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2019-10-22T17:23:28Z"
  generation: 4
  labels:
    kops.k8s.io/cluster: prod.dl.org
  name: master-us-west-2a
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20200716
  machineType: t3a.medium
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-us-west-2a
  role: Master
  subnets:
  - us-west-2a

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2019-10-22T17:23:28Z"
  generation: 4
  labels:
    kops.k8s.io/cluster: prod.dl.org
  name: master-us-west-2b
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20200716
  machineType: t3a.medium
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-us-west-2b
  role: Master
  subnets:
  - us-west-2b

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2019-10-22T17:23:28Z"
  generation: 4
  labels:
    kops.k8s.io/cluster: prod.dl.org
  name: master-us-west-2d
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20200716
  machineType: t3a.medium
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-us-west-2d
  role: Master
  subnets:
  - us-west-2d

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2019-10-22T17:26:50Z"
  generation: 4
  labels:
    kops.k8s.io/cluster: prod.dl.org
  name: nodes-us-west-2a
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20200716
  machineType: t3a.large
  maxPrice: "0.05"
  maxSize: 5
  minSize: 1
  mixedInstancesPolicy:
    instances:
    - t3a.large
    - t3.large
    - r5a.large
    - m5.large
    onDemandAboveBase: 0
    spotInstancePools: 4
  nodeLabels:
    kops.k8s.io/instancegroup: nodes-us-west-2a
  role: Node
  subnets:
  - us-west-2a

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2019-10-22T17:27:42Z"
  generation: 4
  labels:
    kops.k8s.io/cluster: prod.dl.org
  name: nodes-us-west-2b
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20200716
  machineType: t3a.large
  maxPrice: "0.05"
  maxSize: 5
  minSize: 1
  mixedInstancesPolicy:
    instances:
    - t3a.large
    - t3.large
    - r5a.large
    - m5.large
    onDemandAboveBase: 0
    spotInstancePools: 4
  nodeLabels:
    kops.k8s.io/instancegroup: nodes-us-west-2b
  role: Node
  subnets:
  - us-west-2b

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2019-10-22T17:28:34Z"
  generation: 4
  labels:
    kops.k8s.io/cluster: prod.dl.org
  name: nodes-us-west-2d
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20200716
  machineType: t3a.large
  maxPrice: "0.05"
  maxSize: 5
  minSize: 1
  mixedInstancesPolicy:
    instances:
    - t3a.large
    - t3.large
    - r5a.large
    - m5.large
    onDemandAboveBase: 0
    spotInstancePools: 4
  nodeLabels:
    kops.k8s.io/instancegroup: nodes-us-west-2d
  role: Node
  subnets:
  - us-west-2d

9. Anything else do we need to know?
I am not seeing anything crazy written to the kube-apiserver log. Just this (see below). Should I check something else?

$ kubectl logs -n kube-system -lk8s-app=kube-apiserver -c kube-apiserver
I0806 10:39:27.118112       1 log.go:172] http: TLS handshake error from 172.20.0.190:39560: tls: first record does not look like a TLS handshake
I0806 10:39:27.125020       1 log.go:172] http: TLS handshake error from 172.20.0.190:39562: tls: first record does not look like a TLS handshake
I0806 10:39:27.131772       1 log.go:172] http: TLS handshake error from 172.20.0.190:39564: tls: first record does not look like a TLS handshake
I0806 13:49:34.220916       1 log.go:172] http: TLS handshake error from 172.20.0.190:48782: tls: first record does not look like a TLS handshake
I0806 15:46:47.807678       1 trace.go:116] Trace[1935067621]: "Get" url:/api/v1/namespaces/kube-system/pods/kube-apiserver-ip-172-20-39-122.us-west-2.compute.internal/log,user-agent:kubectl/v1.18.6 (darwin/amd64) kubernetes/dff82dc,client:172.20.0.190 (started: 2020-08-06 15:46:27.987322174 +0000 UTC m=+78736.098618310) (total time: 19.820232218s):
Trace[1935067621]: [19.820230198s] [19.80677796s] Transformed response object
I0806 15:46:47.807717       1 trace.go:116] Trace[439512417]: "Get" url:/api/v1/namespaces/kube-system/pods/kube-apiserver-ip-172-20-97-123.us-west-2.compute.internal/log,user-agent:kubectl/v1.18.6 (darwin/amd64) kubernetes/dff82dc,client:172.20.0.190 (started: 2020-08-06 15:46:27.992094089 +0000 UTC m=+78736.103390235) (total time: 19.815537094s):
Trace[439512417]: [19.815536244s] [19.804237986s] Transformed response object
I0806 15:46:47.807847       1 trace.go:116] Trace[2088981802]: "Get" url:/api/v1/namespaces/kube-system/pods/kube-apiserver-ip-172-20-94-219.us-west-2.compute.internal/log,user-agent:kubectl/v1.18.6 (darwin/amd64) kubernetes/dff82dc,client:172.20.0.190 (started: 2020-08-06 15:46:27.988739067 +0000 UTC m=+78736.100035203) (total time: 19.819060288s):
Trace[2088981802]: [19.819059348s] [19.804137435s] Transformed response object
I0806 15:38:30.929435       1 trace.go:116] Trace[788389340]: "List" url:/api/v1/secrets,user-agent:fluxd/v0.0.0 (linux/amd64) kubernetes/$Format,client:172.20.106.218 (started: 2020-08-06 15:38:30.348145008 +0000 UTC m=+77776.473287302) (total time: 581.201989ms):
Trace[788389340]: [581.200379ms] [528.735779ms] Writing http response done count:134
I0806 15:41:34.990938       1 trace.go:116] Trace[1399877601]: "List" url:/api/v1/secrets,user-agent:fluxd/v0.0.0 (linux/amd64) kubernetes/$Format,client:172.20.106.218 (started: 2020-08-06 15:41:34.478087614 +0000 UTC m=+77960.603229898) (total time: 512.803702ms):
Trace[1399877601]: [512.802272ms] [473.454048ms] Writing http response done count:134
I0806 15:41:50.210129       1 trace.go:116] Trace[1495977473]: "List" url:/api/v1/namespaces/monitoring/secrets,user-agent:helm-operator/v0.0.0 (linux/amd64) kubernetes/$Format,client:172.20.106.218 (started: 2020-08-06 15:41:49.696962617 +0000 UTC m=+77975.822104911) (total time: 513.125004ms):
Trace[1495977473]: [513.122854ms] [468.787573ms] Writing http response done count:5
I0806 15:41:50.542065       1 trace.go:116] Trace[345410093]: "List" url:/api/v1/namespaces/monitoring/secrets,user-agent:helm-operator/v0.0.0 (linux/amd64) kubernetes/$Format,client:172.20.106.218 (started: 2020-08-06 15:41:49.995633773 +0000 UTC m=+77976.120776067) (total time: 546.356231ms):
Trace[345410093]: [546.354241ms] [475.799666ms] Writing http response done count:5
I0806 15:44:39.187617       1 trace.go:116] Trace[590956055]: "List" url:/api/v1/secrets,user-agent:fluxd/v0.0.0 (linux/amd64) kubernetes/$Format,client:172.20.106.218 (started: 2020-08-06 15:44:38.591954763 +0000 UTC m=+78144.717097057) (total time: 595.608228ms):
Trace[590956055]: [595.606578ms] [546.292841ms] Writing http response done count:134
I0806 09:51:51.369290       1 log.go:172] http: TLS handshake error from 172.20.8.25:36724: tls: first record does not look like a TLS handshake
I0806 09:51:51.375823       1 log.go:172] http: TLS handshake error from 172.20.8.25:36726: tls: first record does not look like a TLS handshake
I0806 09:51:51.414232       1 log.go:172] http: TLS handshake error from 172.20.8.25:36738: tls: first record does not look like a TLS handshake
I0806 09:51:51.420446       1 log.go:172] http: TLS handshake error from 172.20.8.25:36740: tls: first record does not look like a TLS handshake
I0806 09:51:51.426999       1 log.go:172] http: TLS handshake error from 172.20.8.25:36742: tls: first record does not look like a TLS handshake
I0806 09:51:51.433691       1 log.go:172] http: TLS handshake error from 172.20.8.25:36744: tls: first record does not look like a TLS handshake
I0806 09:51:51.440140       1 log.go:172] http: TLS handshake error from 172.20.8.25:36746: tls: first record does not look like a TLS handshake
I0806 10:40:51.908353       1 log.go:172] http: TLS handshake error from 172.20.8.25:39314: tls: no cipher suite supported by both client and server
I0806 11:13:18.864421       1 log.go:172] http: TLS handshake error from 172.20.8.25:40998: tls: first record does not look like a TLS handshake
I0806 13:46:37.659591       1 log.go:172] http: TLS handshake error from 172.20.8.25:48924: local error: tls: bad record MAC

The text was updated successfully, but these errors were encountered:

johngmyers · 2020-08-07T05:03:18Z

Looks like something's trying to talk to the apiserver with non-TLS HTTP.

justinaslelys · 2020-08-07T07:35:02Z

@demisx
I'm facing exactly the same issue after the upgrade from 1.17 (whats different from your cluster - I'm using cilium).

If you look what actually apiserver_request:burnrate6h is, it's a combination of slow requests OR errors on LIST|GET.

        (
          (
            # too slow
            sum(rate(apiserver_request_duration_seconds_count{job="apiserver",verb=~"LIST|GET"}[6h]))
            -
            (
              sum(rate(apiserver_request_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",scope="resource",le="0.1"}[6h])) +
              sum(rate(apiserver_request_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",scope="namespace",le="0.5"}[6h])) +
              sum(rate(apiserver_request_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",scope="cluster",le="5"}[6h]))
            )
          )
          +
          # errors
          sum(rate(apiserver_request_total{job="apiserver",verb=~"LIST|GET",code=~"5.."}[6h]))
        )
        /
        sum(rate(apiserver_request_total{job="apiserver",verb=~"LIST|GET"}[6h]))
      labels:
        verb: read
      record: apiserver_request:burnrate6h

Using those formulas I've found that in my case it's slow requests causing a problem.

I've installed API Server dashboard (for convenience) and found that some LISTs are taking longer than they should:

Apiserver logs around similar time (combined from 3 masters):

Trace[1779858798]: [3.148645316s] [3.148591286s] Object stored in database
I0807 06:30:01.109419       1 trace.go:116] Trace[2060077304]: "Get" url:/api/v1/namespaces/ingress-nginx/configmaps/ingress-controller-leader-nginx-internal,user-agent:nginx-ingress-controller/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.193.31 (started: 2020-08-07 06:29:54.06573089 +0000 UTC m=+72866.991413790) (total time: 7.043661571s):
Trace[2060077304]: [7.04358989s] [7.04358317s] About to write a response
I0807 06:30:01.107672       1 trace.go:116] Trace[2090894920]: "Get" url:/api/v1/namespaces/ingress-nginx/configmaps/ingress-controller-leader-nginx-internal,user-agent:nginx-ingress-controller/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.147.123 (started: 2020-08-07 06:29:54.066965029 +0000 UTC m=+72866.992647919) (total time: 7.040680386s):
Trace[2090894920]: [7.040637156s] [7.040630436s] About to write a response
I0807 06:30:01.106044       1 trace.go:116] Trace[1639175304]: "Get" url:/api/v1/namespaces/kube-system/configmaps/kops-controller-leader,user-agent:kops-controller/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.201.143 (started: 2020-08-07 06:29:54.853707821 +0000 UTC m=+72867.779390711) (total time: 6.25231116s):
Trace[1639175304]: [6.252270029s] [6.252260849s] About to write a response
I0807 06:30:01.106478       1 trace.go:116] Trace[1483561289]: "Get" url:/api/v1/namespaces/ingress-nginx/configmaps/ingress-controller-leader-nginx-public,user-agent:nginx-ingress-controller/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.147.123 (started: 2020-08-07 06:29:54.230118394 +0000 UTC m=+72867.155801284) (total time: 6.876333434s):
Trace[1483561289]: [6.876292123s] [6.876283183s] About to write a response
I0807 06:30:01.106697       1 trace.go:116] Trace[491217306]: "Get" url:/api/v1/namespaces/ingress-nginx/configmaps/ingress-controller-leader-nginx-acceptance-cms,user-agent:nginx-ingress-controller/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.193.31 (started: 2020-08-07 06:29:54.065550567 +0000 UTC m=+72866.991233467) (total time: 7.041118764s):
Trace[491217306]: [7.041077133s] [7.041070722s] About to write a response
I0807 06:30:01.106882       1 trace.go:116] Trace[551629569]: "Get" url:/api/v1/namespaces/ingress-nginx/configmaps/ingress-controller-leader-nginx,user-agent:nginx-ingress-controller/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.147.123 (started: 2020-08-07 06:29:54.224949497 +0000 UTC m=+72867.150632407) (total time: 6.881908637s):
Trace[551629569]: [6.881841536s] [6.881829806s] About to write a response
I0807 06:30:01.105047       1 trace.go:116] Trace[410011724]: "Get" url:/api/v1/namespaces/ingress-nginx/configmaps/ingress-controller-leader-nginx-public,user-agent:nginx-ingress-controller/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.193.31 (started: 2020-08-07 06:29:59.254851022 +0000 UTC m=+72872.180533912) (total time: 1.850171504s):
Trace[410011724]: [1.850127263s] [1.850117713s] About to write a response
I0807 06:30:01.105502       1 trace.go:116] Trace[1333423977]: "Get" url:/api/v1/namespaces/kube-system/configmaps/coredns-autoscaler,user-agent:cluster-proportional-autoscaler/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.147.123 (started: 2020-08-07 06:29:57.539666801 +0000 UTC m=+72870.465349691) (total time: 3.565809602s):
Trace[1333423977]: [3.565774231s] [3.565763121s] About to write a response
I0807 06:30:01.104580       1 trace.go:116] Trace[1237990545]: "Get" url:/api/v1/namespaces/ingress-nginx/configmaps/ingress-controller-leader-nginx-acceptance-cms,user-agent:nginx-ingress-controller/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.147.123 (started: 2020-08-07 06:29:54.0650784 +0000 UTC m=+72866.990761290) (total time: 7.039469799s):
Trace[1237990545]: [7.039403388s] [7.039392667s] About to write a response
I0807 06:30:01.102643       1 trace.go:116] Trace[724288147]: "Get" url:/api/v1/namespaces/default,user-agent:kube-apiserver/v1.18.6 (linux/amd64) kubernetes/dff82dc,client:::1 (started: 2020-08-07 06:29:57.10929586 +0000 UTC m=+72870.034978750) (total time: 3.99332166s):
Trace[724288147]: [3.993290589s] [3.993279849s] About to write a response
I0807 06:30:01.102411       1 trace.go:116] Trace[1184972991]: "Get" url:/api/v1/namespaces/kube-system/endpoints/kube-controller-manager,user-agent:kube-controller-manager/v1.18.6 (linux/amd64) kubernetes/dff82dc/leader-election,client:127.0.0.1 (started: 2020-08-07 06:29:54.065299814 +0000 UTC m=+72866.990982714) (total time: 7.037081932s):
Trace[1184972991]: [7.037023501s] [7.037005781s] About to write a response
I0807 06:30:01.102496       1 trace.go:116] Trace[1914128814]: "Get" url:/api/v1/persistentvolumes/cms-efs-pv,user-agent:kubelet/v1.18.6 (linux/amd64) kubernetes/dff82dc,client:10.0.147.123 (started: 2020-08-07 06:29:54.085547597 +0000 UTC m=+72867.011230487) (total time: 7.016920151s):
Trace[1914128814]: [7.0168755s] [7.01686696s] About to write a response
I0807 06:30:01.101728       1 trace.go:116] Trace[1291545872]: "Get" url:/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-scheduler,user-agent:kube-scheduler/v1.18.6 (linux/amd64) kubernetes/dff82dc/leader-election,client:127.0.0.1 (started: 2020-08-07 06:29:54.09175972 +0000 UTC m=+72867.017442610) (total time: 7.009912566s):
Trace[1291545872]: [7.009850435s] [7.009824984s] About to write a response
I0807 06:30:00.944339       1 trace.go:116] Trace[1342146895]: "Get" url:/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cluster-autoscaler,user-agent:cluster-autoscaler/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.162.134 (started: 2020-08-07 06:29:54.06772783 +0000 UTC m=+72866.993410720) (total time: 6.876582547s):
Trace[1342146895]: [6.876582547s] [6.876558217s] END
E0807 06:30:00.942843       1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
I0807 06:30:00.943405       1 trace.go:116] Trace[1137492827]: "Get" url:/api/v1/namespaces/kube-system/configmaps/cluster-autoscaler-status,user-agent:cluster-autoscaler/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.162.134 (started: 2020-08-07 06:29:54.570202502 +0000 UTC m=+72867.495885392) (total time: 6.373164801s):
Trace[1137492827]: [6.373164801s] [6.373153501s] END
E0807 06:30:00.943847       1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
I0807 06:30:00.458438       1 healthz.go:234] [+]ping ok
[+]log ok
[-]etcd failed: reason withheld
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/rbac/bootstrap-roles ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
healthz check failed
I0807 06:29:59.506958       1 trace.go:116] Trace[1443714326]: "List" url:/apis/batch/v1/jobs,user-agent:kube-controller-manager/v1.18.6 (linux/amd64) kubernetes/dff82dc/system:serviceaccount:kube-system:cronjob-controller,client:127.0.0.1 (started: 2020-08-07 06:29:55.039163837 +0000 UTC m=+66196.225622958) (total time: 4.467766019s):
Trace[1443714326]: [4.466443983s] [4.466424913s] Listing from storage done
I0807 06:29:59.506073       1 trace.go:116] Trace[1011047555]: "Get" url:/api/v1/namespaces/kube-system/configmaps/kops-controller-leader,user-agent:kops-controller/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.146.71 (started: 2020-08-07 06:29:56.328473579 +0000 UTC m=+66197.514932690) (total time: 3.177572806s):
Trace[1011047555]: [3.177524426s] [3.177515316s] About to write a response
I0807 06:29:59.505442       1 trace.go:116] Trace[965879463]: "List etcd3" key:/jobs,resourceVersion:,limit:500,continue: (started: 2020-08-07 06:29:55.039188757 +0000 UTC m=+66196.225647868) (total time: 4.466226791s):
Trace[965879463]: [4.466226791s] [4.466226791s] END
I0807 06:29:59.503467       1 trace.go:116] Trace[1073531398]: "Get" url:/api/v1/nodes/ip-10-0-162-134.eu-west-1.compute.internal,user-agent:protokube/v0.0.0 (linux/amd64) kubernetes/$Format,client:127.0.0.1 (started: 2020-08-07 06:29:58.117318155 +0000 UTC m=+66199.303777276) (total time: 1.386118519s):
Trace[1073531398]: [1.385255018s] [1.385246168s] About to write a response
I0807 06:29:59.502273       1 trace.go:116] Trace[902744542]: "Get" url:/api/v1/namespaces/default,user-agent:kube-apiserver/v1.18.6 (linux/amd64) kubernetes/dff82dc,client:::1 (started: 2020-08-07 06:29:56.648017785 +0000 UTC m=+66197.834476906) (total time: 2.854213634s):
Trace[902744542]: [2.854159534s] [2.854148124s] About to write a response
I0807 06:29:59.502722       1 trace.go:116] Trace[1296014947]: "Get" url:/api/v1/namespaces/kube-system/endpoints/kube-controller-manager,user-agent:kube-controller-manager/v1.18.6 (linux/amd64) kubernetes/dff82dc/leader-election,client:127.0.0.1 (started: 2020-08-07 06:29:54.499288654 +0000 UTC m=+66195.685747765) (total time: 5.003411821s):
Trace[1296014947]: [5.00338469s] [5.003363679s] About to write a response
E0807 06:29:59.496664       1 status.go:71] apiserver received an error that is not an metav1.Status: rpctypes.EtcdError{code:0xe, desc:"etcdserver: request timed out"}
I0807 06:29:59.497002       1 trace.go:116] Trace[1393232789]: "Get" url:/api/v1/namespaces/kube-system/configmaps/kops-controller-leader,user-agent:kops-controller/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.162.134 (started: 2020-08-07 06:29:52.333973384 +0000 UTC m=+66193.520432495) (total time: 7.162986621s):
Trace[1393232789]: [7.162986621s] [7.162976631s] END
E0807 06:29:59.497423       1 status.go:71] apiserver received an error that is not an metav1.Status: rpctypes.EtcdError{code:0xe, desc:"etcdserver: request timed out"}
I0807 06:29:59.497987       1 trace.go:116] Trace[760182879]: "Get" url:/api/v1/namespaces/kube-system/endpoints/kube-scheduler,user-agent:kube-scheduler/v1.18.6 (linux/amd64) kubernetes/dff82dc/leader-election,client:127.0.0.1 (started: 2020-08-07 06:29:51.85116812 +0000 UTC m=+66193.037627241) (total time: 7.646787357s):
Trace[760182879]: [7.646787357s] [7.646764776s] END
I0807 06:29:58.617058       1 trace.go:116] Trace[1315296369]: "GuaranteedUpdate etcd3" type:*coordination.Lease (started: 2020-08-07 06:29:51.614274113 +0000 UTC m=+66192.800733244) (total time: 7.002733197s):
Trace[1315296369]: [7.002733197s] [7.00217732s] END
E0807 06:29:58.617196       1 status.go:71] apiserver received an error that is not an metav1.Status: rpctypes.EtcdError{code:0xe, desc:"etcdserver: request timed out, possibly due to previous leader failure"}
I0807 06:29:58.617560       1 trace.go:116] Trace[1907253726]: "Update" url:/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/ip-10-0-162-134.eu-west-1.compute.internal,user-agent:kubelet/v1.18.6 (linux/amd64) kubernetes/dff82dc,client:127.0.0.1 (started: 2020-08-07 06:29:51.614133131 +0000 UTC m=+66192.800592292) (total time: 7.003402225s):
Trace[1907253726]: [7.003402225s] [7.003305003s] END
I0807 06:29:54.081853       1 trace.go:116] Trace[656995652]: "Get" url:/api/v1/namespaces/acceptance/persistentvolumeclaims/cms-efs,user-agent:kubelet/v1.18.6 (linux/amd64) kubernetes/dff82dc,client:10.0.147.123 (started: 2020-08-07 06:29:52.840403673 +0000 UTC m=+72865.766086563) (total time: 1.241399278s):
Trace[656995652]: [1.241238086s] [1.241229686s] About to write a response
I0807 06:29:54.082774       1 trace.go:116] Trace[1352207043]: "Get" url:/api/v1/namespaces/kube-system/endpoints/kube-scheduler,user-agent:kube-scheduler/v1.18.6 (linux/amd64) kubernetes/dff82dc/leader-election,client:127.0.0.1 (started: 2020-08-07 06:29:52.070811947 +0000 UTC m=+72864.996494837) (total time: 2.011934148s):
Trace[1352207043]: [2.011896028s] [2.011876838s] About to write a response
I0807 06:29:54.088020       1 trace.go:116] Trace[1548824378]: "Update" url:/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/ip-10-0-201-143.eu-west-1.compute.internal,user-agent:kubelet/v1.18.6 (linux/amd64) kubernetes/dff82dc,client:127.0.0.1 (started: 2020-08-07 06:29:50.883341732 +0000 UTC m=+65659.906369537) (total time: 3.204643639s):
Trace[1548824378]: [3.204096422s] [3.203985161s] Object stored in database
I0807 06:29:54.087390       1 trace.go:116] Trace[1918609084]: "GuaranteedUpdate etcd3" type:*coordination.Lease (started: 2020-08-07 06:29:50.883498894 +0000 UTC m=+65659.906526709) (total time: 3.203852569s):
Trace[1918609084]: [3.203813418s] [3.20322758s] Transaction committed
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/rbac/bootstrap-roles ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
healthz check failed
[+]log ok
[-]etcd failed: reason withheld
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-apiextensions-informers ok
I0807 06:29:53.673092       1 healthz.go:234] [+]ping ok
I0807 06:29:52.498004       1 trace.go:116] Trace[163737924]: "List" url:/api/v1/pods,user-agent:cilium-operator/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.175.74 (started: 2020-08-07 06:29:51.041842333 +0000 UTC m=+65660.064870158) (total time: 1.456138657s):
Trace[163737924]: [1.456138657s] [1.456065366s] END
I0807 06:29:52.497536       1 trace.go:116] Trace[361053280]: "List etcd3" key:/pods,resourceVersion:,limit:0,continue: (started: 2020-08-07 06:29:51.041922804 +0000 UTC m=+65660.064950609) (total time: 1.45557445s):
Trace[361053280]: [1.45557445s] [1.45557445s] END
E0807 06:29:52.497573       1 status.go:71] apiserver received an error that is not an metav1.Status: rpctypes.EtcdError{code:0xe, desc:"etcdserver: leader changed"}
E0807 06:29:52.496426       1 status.go:71] apiserver received an error that is not an metav1.Status: rpctypes.EtcdError{code:0xe, desc:"etcdserver: leader changed"}
I0807 06:29:52.496833       1 trace.go:116] Trace[1481734084]: "Get" url:/api/v1/namespaces/kube-system/endpoints/kube-controller-manager,user-agent:kube-controller-manager/v1.18.6 (linux/amd64) kubernetes/dff82dc/leader-election,client:127.0.0.1 (started: 2020-08-07 06:29:51.717795216 +0000 UTC m=+66192.904254357) (total time: 778.999938ms):
Trace[1481734084]: [778.999938ms] [778.977508ms] END
I0807 06:19:12.584734       1 trace.go:116] Trace[1233625436]: "Get" url:/api/v1/namespaces/acceptance/pods/zeus-cron-gw1s-1596781140-6pcnn,user-agent:kube-controller-manager/v1.18.6 (linux/amd64) kubernetes/dff82dc/system:serviceaccount:kube-system:generic-garbage-collector,client:127.0.0.1 (started: 2020-08-07 06:19:11.952256671 +0000 UTC m=+65553.138715792) (total time: 632.450929ms):
Trace[1233625436]: [632.409079ms] [632.401929ms] About to write a response
I0807 06:19:12.581544       1 trace.go:116] Trace[1174886116]: "Get" url:/api/v1/namespaces/acceptance/pods/zeus-cron-gw1g-1596781140-4stjt,user-agent:kube-controller-manager/v1.18.6 (linux/amd64) kubernetes/dff82dc/system:serviceaccount:kube-system:generic-garbage-collector,client:127.0.0.1 (started: 2020-08-07 06:19:11.951993878 +0000 UTC m=+65553.138452999) (total time: 629.478642ms):
Trace[1174886116]: [629.382671ms] [629.375481ms] About to write a response
I0807 06:19:12.582095       1 trace.go:116] Trace[479205559]: "Get" url:/api/v1/namespaces/acceptance/pods/zeus-cron-app-1596781140-l9f9s,user-agent:kube-controller-manager/v1.18.6 (linux/amd64) kubernetes/dff82dc/system:serviceaccount:kube-system:generic-garbage-collector,client:127.0.0.1 (started: 2020-08-07 06:19:11.95139352 +0000 UTC m=+65553.137852631) (total time: 630.645807ms):
Trace[479205559]: [630.602037ms] [630.591346ms] About to write a response
I0807 06:19:12.582576       1 trace.go:116] Trace[649348523]: "Get" url:/api/v1/namespaces/acceptance/pods/zeus-cron-gwac-1596781140-75sdt,user-agent:kube-controller-manager/v1.18.6 (linux/amd64) kubernetes/dff82dc/system:serviceaccount:kube-system:generic-garbage-collector,client:127.0.0.1 (started: 2020-08-07 06:19:11.952885719 +0000 UTC m=+65553.139344840) (total time: 629.637374ms):
Trace[649348523]: [629.592004ms] [629.584564ms] About to write a response
I0807 06:19:12.583040       1 trace.go:116] Trace[241412352]: "Get" url:/api/v1/namespaces/acceptance/pods/zeus-cron-gw2g-1596781140-q4qhs,user-agent:kube-controller-manager/v1.18.6 (linux/amd64) kubernetes/dff82dc/system:serviceaccount:kube-system:generic-garbage-collector,client:127.0.0.1 (started: 2020-08-07 06:19:11.952543555 +0000 UTC m=+65553.139002666) (total time: 630.446144ms):
Trace[241412352]: [630.404723ms] [630.397043ms] About to write a response
I0807 06:19:12.584087       1 trace.go:116] Trace[2071769037]: "Get" url:/api/v1/namespaces/acceptance/pods/zeus-cron-gw1a-1596781140-h4nfq,user-agent:kube-controller-manager/v1.18.6 (linux/amd64) kubernetes/dff82dc/system:serviceaccount:kube-system:generic-garbage-collector,client:127.0.0.1 (started: 2020-08-07 06:19:11.951707724 +0000 UTC m=+65553.138166835) (total time: 632.343758ms):
Trace[2071769037]: [632.260537ms] [632.253527ms] About to write a response
I0807 06:19:12.576826       1 trace.go:116] Trace[1716152648]: "Create" url:/api/v1/namespaces/acceptance/events,user-agent:kube-controller-manager/v1.18.6 (linux/amd64) kubernetes/dff82dc/system:serviceaccount:kube-system:cronjob-controller,client:127.0.0.1 (started: 2020-08-07 06:19:11.845850729 +0000 UTC m=+65553.032309840) (total time: 730.932733ms):
Trace[1716152648]: [730.879013ms] [730.803792ms] Object stored in database
I0807 06:13:21.511467       1 trace.go:116] Trace[182093860]: "List" url:/api/v1/namespaces/monitoring/secrets,user-agent:operator/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.176.47 (started: 2020-08-07 06:13:20.832414292 +0000 UTC m=+71873.758097172) (total time: 678.977349ms):
Trace[182093860]: [678.974229ms] [632.95188ms] Writing http response done count:46
I0807 06:13:18.265880       1 trace.go:116] Trace[904391237]: "List" url:/api/v1/namespaces/monitoring/secrets,user-agent:operator/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.176.47 (started: 2020-08-07 06:13:17.090494217 +0000 UTC m=+71870.016177117) (total time: 1.175335154s):
Trace[904391237]: [1.175334004s] [1.096289682s] Writing http response done count:46
I0807 06:13:17.067468       1 trace.go:116] Trace[327754048]: "Get" url:/api/v1/namespaces/ingress-nginx/configmaps/ingress-controller-leader-nginx-internal,user-agent:nginx-ingress-controller/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.193.31 (started: 2020-08-07 06:13:06.428847134 +0000 UTC m=+71859.354530024) (total time: 10.638584087s):
Trace[327754048]: [10.638584087s] [10.638573467s] END
I0807 06:13:17.057091       1 trace.go:116] Trace[152040739]: "Get" url:/api/v1/namespaces/ingress-nginx/configmaps/ingress-controller-leader-nginx-acceptance-cms,user-agent:nginx-ingress-controller/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.193.31 (started: 2020-08-07 06:13:06.836436342 +0000 UTC m=+71859.762119222) (total time: 10.220601274s):
Trace[152040739]: [10.220601274s] [10.220595864s] END
I0807 06:13:17.050671       1 trace.go:116] Trace[735678549]: "Get" url:/api/v1/namespaces/kube-system/configmaps/coredns-autoscaler,user-agent:cluster-proportional-autoscaler/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.147.123 (started: 2020-08-07 06:13:07.539992527 +0000 UTC m=+71860.465675417) (total time: 9.510638823s):
Trace[735678549]: [9.510638823s] [9.510628623s] END
I0807 06:13:17.037834       1 trace.go:116] Trace[1032149158]: "Get" url:/api/v1/namespaces/ingress-nginx/configmaps/ingress-controller-leader-nginx,user-agent:nginx-ingress-controller/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.147.123 (started: 2020-08-07 06:13:07.722480747 +0000 UTC m=+71860.648163647) (total time: 9.315313601s):
Trace[1032149158]: [9.315313601s] [9.315303171s] END
I0807 06:13:17.018368       1 trace.go:116] Trace[1936293305]: "Delete" url:/apis/cilium.io/v2/ciliumidentities/45459,user-agent:cilium-operator/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.175.74 (started: 2020-08-07 06:13:09.9807415 +0000 UTC m=+64659.003769305) (total time: 7.037558341s):
Trace[1936293305]: [7.03748035s] [7.037381739s] Object deleted from database
I0807 06:13:17.015581       1 trace.go:116] Trace[1546788469]: "Get" url:/api/v1/namespaces/ingress-nginx/configmaps/ingress-controller-leader-nginx-public,user-agent:nginx-ingress-controller/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.147.123 (started: 2020-08-07 06:13:07.797198705 +0000 UTC m=+71860.722881595) (total time: 9.21834301s):
Trace[1546788469]: [9.21834301s] [9.21833172s] END
I0807 06:13:17.010438       1 trace.go:116] Trace[1769020839]: "Get" url:/api/v1/namespaces/kube-system/endpoints/kube-controller-manager,user-agent:kube-controller-manager/v1.18.6 (linux/amd64) kubernetes/dff82dc/leader-election,client:127.0.0.1 (started: 2020-08-07 06:13:08.42384498 +0000 UTC m=+71861.349527860) (total time: 8.586553258s):
Trace[1769020839]: [8.586553258s] [8.586531558s] END
I0807 06:13:16.995984       1 trace.go:116] Trace[681486628]: "Get" url:/api/v1/namespaces/ingress-nginx/configmaps/ingress-controller-leader-nginx-internal,user-agent:nginx-ingress-controller/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.147.123 (started: 2020-08-07 06:13:14.963539294 +0000 UTC m=+71867.889222194) (total time: 2.032409298s):
Trace[681486628]: [2.032330407s] [2.032320567s] About to write a response
E0807 06:13:16.996230       1 status.go:71] apiserver received an error that is not an metav1.Status: rpctypes.EtcdError{code:0xe, desc:"etcdserver: request timed out"}
E0807 06:13:16.996282       1 status.go:71] apiserver received an error that is not an metav1.Status: rpctypes.EtcdError{code:0xe, desc:"etcdserver: request timed out"}
E0807 06:13:16.996315       1 status.go:71] apiserver received an error that is not an metav1.Status: rpctypes.EtcdError{code:0xe, desc:"etcdserver: request timed out"}
E0807 06:13:16.996335       1 status.go:71] apiserver received an error that is not an metav1.Status: rpctypes.EtcdError{code:0xe, desc:"etcdserver: request timed out"}
E0807 06:13:16.996353       1 status.go:71] apiserver received an error that is not an metav1.Status: rpctypes.EtcdError{code:0xe, desc:"etcdserver: request timed out"}
I0807 06:13:16.996444       1 trace.go:116] Trace[1141472074]: "Get" url:/api/v1/namespaces/ingress-nginx/configmaps/ingress-controller-leader-nginx-public,user-agent:nginx-ingress-controller/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.193.31 (started: 2020-08-07 06:13:11.211903795 +0000 UTC m=+71864.137586685) (total time: 5.784519084s):
Trace[1141472074]: [5.784481724s] [5.784473884s] About to write a response
I0807 06:13:16.996698       1 trace.go:116] Trace[244452943]: "Get" url:/api/v1/namespaces/ingress-nginx/configmaps/ingress-controller-leader-nginx-acceptance-cms,user-agent:nginx-ingress-controller/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.147.123 (started: 2020-08-07 06:13:11.014513402 +0000 UTC m=+71863.940196332) (total time: 5.982158561s):
Trace[244452943]: [5.98211899s] [5.98211112s] About to write a response
I0807 06:13:16.997255       1 trace.go:116] Trace[1996820160]: "Get" url:/api/v1/namespaces/acceptance/pods/cms-68d9f76dc5-kxr89,user-agent:kubelet/v1.18.6 (linux/amd64) kubernetes/dff82dc,client:10.0.147.123 (started: 2020-08-07 06:13:07.858187447 +0000 UTC m=+71860.783870437) (total time: 9.139043154s):
Trace[1996820160]: [9.139043154s] [9.139032044s] END
E0807 06:13:16.986169       1 status.go:71] apiserver received an error that is not an metav1.Status: rpctypes.EtcdError{code:0xe, desc:"etcdserver: request timed out"}
I0807 06:13:16.983392       1 trace.go:116] Trace[783227481]: "Get" url:/api/v1/namespaces/acceptance/persistentvolumeclaims/cms-efs,user-agent:kubelet/v1.18.6 (linux/amd64) kubernetes/dff82dc,client:10.0.147.123 (started: 2020-08-07 06:13:07.874238297 +0000 UTC m=+71860.799921177) (total time: 9.109013705s):
Trace[783227481]: [9.109013705s] [9.109002825s] END
E0807 06:13:16.981072       1 status.go:71] apiserver received an error that is not an metav1.Status: rpctypes.EtcdError{code:0xe, desc:"etcdserver: request timed out"}
I0807 06:13:16.981330       1 trace.go:116] Trace[5262116]: "Get" url:/api/v1/namespaces/default,user-agent:kube-apiserver/v1.18.6 (linux/amd64) kubernetes/dff82dc,client:::1 (started: 2020-08-07 06:13:06.06229836 +0000 UTC m=+71858.987981260) (total time: 10.918992273s):
Trace[5262116]: [10.918992273s] [10.918983913s] END
E0807 06:13:16.981944       1 status.go:71] apiserver received an error that is not an metav1.Status: rpctypes.EtcdError{code:0xe, desc:"etcdserver: request timed out"}
I0807 06:13:16.982101       1 trace.go:116] Trace[1117900533]: "Get" url:/api/v1/namespaces/kube-system/pods/kube-scheduler-ip-10-0-146-71.eu-west-1.compute.internal,user-agent:kubelet/v1.18.6 (linux/amd64) kubernetes/dff82dc,client:127.0.0.1 (started: 2020-08-07 06:13:16.229272702 +0000 UTC m=+71869.154955592) (total time: 752.799553ms):
Trace[1117900533]: [752.474338ms] [752.464598ms] About to write a response
I0807 06:13:16.982532       1 trace.go:116] Trace[813457766]: "Get" url:/api/v1/namespaces/kube-system/pods/kube-controller-manager-ip-10-0-201-143.eu-west-1.compute.internal,user-agent:kubelet/v1.18.6 (linux/amd64) kubernetes/dff82dc,client:127.0.0.1 (started: 2020-08-07 06:13:16.026190106 +0000 UTC m=+64665.049217911) (total time: 956.280553ms):
Trace[813457766]: [949.780165ms] [949.770435ms] About to write a response
I0807 06:13:16.981381       1 trace.go:116] Trace[1657074742]: "List" url:/api/v1/pods,user-agent:cilium-operator/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.175.74 (started: 2020-08-07 06:13:07.900334945 +0000 UTC m=+64656.923362770) (total time: 9.081001058s):
Trace[1657074742]: [9.081001058s] [9.080962558s] END
E0807 06:13:16.974264       1 status.go:71] apiserver received an error that is not an metav1.Status: rpctypes.EtcdError{code:0xe, desc:"etcdserver: request timed out"}
I0807 06:13:16.973425       1 trace.go:116] Trace[726184448]: "Get" url:/api/v1/namespaces/default,user-agent:kube-apiserver/v1.18.6 (linux/amd64) kubernetes/dff82dc,client:::1 (started: 2020-08-07 06:13:12.460236007 +0000 UTC m=+64661.483263812) (total time: 4.513140119s):
Trace[726184448]: [4.513077448s] [4.513049747s] About to write a response
I0807 06:13:16.970299       1 trace.go:116] Trace[161780535]: "Get" url:/api/v1/namespaces/ingress-nginx/configmaps/ingress-controller-leader-nginx,user-agent:nginx-ingress-controller/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.175.74 (started: 2020-08-07 06:13:07.428445543 +0000 UTC m=+64656.451473348) (total time: 9.541819171s):
Trace[161780535]: [9.541819171s] [9.541807501s] END
I0807 06:13:16.968968       1 trace.go:116] Trace[475562247]: "Get" url:/api/v1/namespaces/acceptance/services/cms,user-agent:helm/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.247.139 (started: 2020-08-07 06:13:05.590485902 +0000 UTC m=+64654.613513717) (total time: 11.378438884s):
Trace[475562247]: [11.378438884s] [11.378426934s] END
E0807 06:13:16.969489       1 status.go:71] apiserver received an error that is not an metav1.Status: rpctypes.EtcdError{code:0xe, desc:"etcdserver: request timed out"}
I0807 06:13:16.969876       1 trace.go:116] Trace[51329129]: "List etcd3" key:/pods,resourceVersion:,limit:0,continue: (started: 2020-08-07 06:13:07.900380315 +0000 UTC m=+64656.923408540) (total time: 9.069464234s):
Trace[51329129]: [9.069464234s] [9.069464234s] END
E0807 06:13:16.969910       1 status.go:71] apiserver received an error that is not an metav1.Status: rpctypes.EtcdError{code:0xe, desc:"etcdserver: request timed out"}
E0807 06:13:16.968602       1 status.go:71] apiserver received an error that is not an metav1.Status: rpctypes.EtcdError{code:0xe, desc:"etcdserver: request timed out"}
Trace[1254346185]: [10.000486815s] [10.000462865s] END
E0807 06:13:16.653972       1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
I0807 06:13:16.654850       1 trace.go:116] Trace[1254346185]: "Get" url:/api/v1/namespaces/kube-system/endpoints/kube-scheduler,user-agent:kube-scheduler/v1.18.6 (linux/amd64) kubernetes/dff82dc/leader-election,client:127.0.0.1 (started: 2020-08-07 06:13:06.654186651 +0000 UTC m=+64655.677214456) (total time: 10.000486815s):
I0807 06:13:16.529083       1 trace.go:116] Trace[795141371]: "Get" url:/api/v1/namespaces/kube-system/configmaps/cluster-autoscaler-status,user-agent:cluster-autoscaler/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.162.134 (started: 2020-08-07 06:13:08.295100189 +0000 UTC m=+64657.318127984) (total time: 8.233948396s):
Trace[795141371]: [8.233948396s] [8.233938036s] END
I0807 06:13:16.527981       1 trace.go:116] Trace[567248037]: "Get" url:/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cluster-autoscaler,user-agent:cluster-autoscaler/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.162.134 (started: 2020-08-07 06:13:06.508873464 +0000 UTC m=+64655.531901269) (total time: 10.019072076s):
Trace[567248037]: [10.019072076s] [10.019062736s] END
E0807 06:13:16.528721       1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
E0807 06:13:16.527581       1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
I0807 06:13:16.056040       1 trace.go:116] Trace[971316149]: "Get" url:/api/v1/namespaces/kube-system/endpoints/kube-scheduler,user-agent:kube-scheduler/v1.18.6 (linux/amd64) kubernetes/dff82dc/leader-election,client:127.0.0.1 (started: 2020-08-07 06:13:11.967211506 +0000 UTC m=+71864.892894386) (total time: 4.088788043s):
Trace[971316149]: [4.088788043s] [4.088764152s] END
E0807 06:13:16.055513       1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
I0807 06:13:15.850956       1 trace.go:116] Trace[526413577]: "Get" url:/api/v1/namespaces/kube-system/configmaps/kops-controller-leader,user-agent:kops-controller/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.201.143 (started: 2020-08-07 06:13:05.851330883 +0000 UTC m=+64654.874358688) (total time: 9.999578604s):
Trace[526413577]: [9.999578604s] [9.999568574s] END
E0807 06:13:15.850603       1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
I0807 06:13:15.723729       1 trace.go:116] Trace[1481438935]: "List etcd3" key:/jobs,resourceVersion:,limit:500,continue: (started: 2020-08-07 06:13:10.769517388 +0000 UTC m=+64659.792545193) (total time: 4.954163027s):
Trace[1481438935]: [4.954163027s] [4.954163027s] END
E0807 06:13:15.723764       1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
E0807 06:13:15.723916       1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
E0807 06:13:15.723977       1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
I0807 06:13:15.724025       1 trace.go:116] Trace[1361963526]: "List" url:/apis/batch/v1/jobs,user-agent:kube-controller-manager/v1.18.6 (linux/amd64) kubernetes/dff82dc/system:serviceaccount:kube-system:cronjob-controller,client:127.0.0.1 (started: 2020-08-07 06:13:10.769486398 +0000 UTC m=+64659.792514203) (total time: 4.954508531s):
Trace[1361963526]: [4.954508531s] [4.954483431s] END
I0807 06:13:15.725151       1 trace.go:116] Trace[304666810]: "Get" url:/apis/batch/v1/namespaces/acceptance/jobs/zeus-cron-gw2g-1596780780,user-agent:kube-controller-manager/v1.18.6 (linux/amd64) kubernetes/dff82dc/system:serviceaccount:kube-system:job-controller,client:127.0.0.1 (started: 2020-08-07 06:13:09.977885772 +0000 UTC m=+64659.000913577) (total time: 5.747228012s):
Trace[304666810]: [5.747228012s] [5.747219762s] END
I0807 06:13:15.726188       1 trace.go:116] Trace[545611142]: "Get" url:/apis/batch/v1/namespaces/acceptance/jobs/zeus-cron-app-1596780780,user-agent:kube-controller-manager/v1.18.6 (linux/amd64) kubernetes/dff82dc/system:serviceaccount:kube-system:job-controller,client:127.0.0.1 (started: 2020-08-07 06:13:09.979131678 +0000 UTC m=+64659.002159483) (total time: 5.7470216s):
Trace[545611142]: [5.7470216s] [5.74701414s] END
E0807 06:13:15.599670       1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
I0807 06:13:15.599992       1 trace.go:116] Trace[886094614]: "Get" url:/api/v1/namespaces/kube-system/endpoints/kube-controller-manager,user-agent:kube-controller-manager/v1.18.6 (linux/amd64) kubernetes/dff82dc/leader-election,client:127.0.0.1 (started: 2020-08-07 06:13:05.599827317 +0000 UTC m=+64654.622855132) (total time: 10.000120861s):
Trace[886094614]: [10.000120861s] [10.00009684s] END
I0807 06:13:10.537937       1 healthz.go:234] [+]ping ok
[+]log ok
[-]etcd failed: reason withheld
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/rbac/bootstrap-roles ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
healthz check failed
I0807 06:13:10.458289       1 healthz.go:234] [+]ping ok
[+]log ok
[-]etcd failed: reason withheld
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/rbac/bootstrap-roles ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
healthz check failed
I0807 06:13:09.996217       1 trace.go:116] Trace[403515704]: "Delete" url:/apis/cilium.io/v2/namespaces/acceptance/ciliumendpoints/zeus-cron-gwtf-1596780780-5kmzk,user-agent:cilium-agent/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.143.159 (started: 2020-08-07 06:13:06.112703276 +0000 UTC m=+65187.299162387) (total time: 3.883470884s):
Trace[403515704]: [3.883391923s] [3.883297552s] Object deleted from database
I0807 06:13:09.977802       1 trace.go:116] Trace[1410702708]: "GuaranteedUpdate etcd3" type:*unstructured.Unstructured (started: 2020-08-07 06:13:08.661051795 +0000 UTC m=+64657.684079620) (total time: 1.316728635s):
Trace[1410702708]: [1.316233358s] [1.31414073s] Transaction committed
I0807 06:13:09.983594       1 trace.go:116] Trace[1749693897]: "Patch" url:/apis/cilium.io/v2/namespaces/acceptance/ciliumendpoints/cms-68d9f76dc5-kxr89/status,user-agent:cilium-agent/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.147.123 (started: 2020-08-07 06:13:08.660867023 +0000 UTC m=+64657.683894828) (total time: 1.322687755s):
Trace[1749693897]: [1.32233201s] [1.32080904s] Object stored in database
I0807 06:13:09.981203       1 trace.go:116] Trace[130153913]: "Update" url:/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/ip-10-0-201-143.eu-west-1.compute.internal,user-agent:kubelet/v1.18.6 (linux/amd64) kubernetes/dff82dc,client:127.0.0.1 (started: 2020-08-07 06:13:04.645504561 +0000 UTC m=+64653.668532366) (total time: 5.335659155s):
Trace[130153913]: [5.335590744s] [5.335474182s] Object stored in database
Trace[1956585047]: [1.937387283s] [1.937376643s] About to write a response
I0807 06:13:09.978666       1 trace.go:116] Trace[302583028]: "Get" url:/api/v1/namespaces/kube-system/endpoints/kube-controller-manager,user-agent:kube-controller-manager/v1.18.6 (linux/amd64) kubernetes/dff82dc/leader-election,client:127.0.0.1 (started: 2020-08-07 06:13:08.156852449 +0000 UTC m=+65189.343311570) (total time: 1.821781192s):
Trace[302583028]: [1.821745162s] [1.821727182s] About to write a response
I0807 06:13:09.979356       1 trace.go:116] Trace[1956585047]: "Get" url:/api/v1/namespaces/kube-system/configmaps/kops-controller-leader,user-agent:kops-controller/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.162.134 (started: 2020-08-07 06:13:08.041895866 +0000 UTC m=+65189.228354997) (total time: 1.937434723s):
I0807 06:13:09.976964       1 trace.go:116] Trace[676313116]: "GuaranteedUpdate etcd3" type:*coordination.Lease (started: 2020-08-07 06:13:04.645737865 +0000 UTC m=+64653.668765720) (total time: 5.331192724s):
Trace[676313116]: [5.331164203s] [5.330477654s] Transaction committed
I0807 06:13:09.976842       1 trace.go:116] Trace[192067124]: "Get" url:/api/v1/namespaces/kube-system/configmaps/kops-controller-leader,user-agent:kops-controller/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.146.71 (started: 2020-08-07 06:13:07.055196506 +0000 UTC m=+65188.241655627) (total time: 2.921623313s):
Trace[192067124]: [2.921577003s] [2.921566913s] About to write a response
I0807 06:13:09.975062       1 trace.go:116] Trace[661551568]: "Get" url:/api/v1/namespaces/default,user-agent:kube-apiserver/v1.18.6 (linux/amd64) kubernetes/dff82dc,client:::1 (started: 2020-08-07 06:13:06.629073509 +0000 UTC m=+65187.815532630) (total time: 3.345941179s):
Trace[661551568]: [3.345889458s] [3.345877818s] About to write a response
I0807 06:13:09.975897       1 trace.go:116] Trace[2101659364]: "Get" url:/api/v1/namespaces/kube-system/endpoints/kube-scheduler,user-agent:kube-scheduler/v1.18.6 (linux/amd64) kubernetes/dff82dc/leader-election,client:127.0.0.1 (started: 2020-08-07 06:13:08.508879743 +0000 UTC m=+65189.695338854) (total time: 1.466990185s):
Trace[2101659364]: [1.466957604s] [1.466939953s] About to write a response
I0807 06:13:09.970468       1 trace.go:116] Trace[2060751332]: "GuaranteedUpdate etcd3" type:*coordination.Lease (started: 2020-08-07 06:13:06.202295244 +0000 UTC m=+71859.127978134) (total time: 3.768139027s):
Trace[2060751332]: [3.768119017s] [3.767599139s] Transaction committed
I0807 06:13:09.970631       1 trace.go:116] Trace[1058289082]: "Update" url:/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/ip-10-0-147-123.eu-west-1.compute.internal,user-agent:kubelet/v1.18.6 (linux/amd64) kubernetes/dff82dc,client:10.0.147.123 (started: 2020-08-07 06:13:06.202160022 +0000 UTC m=+71859.127842912) (total time: 3.768428861s):
Trace[1058289082]: [3.76836041s] [3.768269028s] Object stored in database
E0807 06:13:09.965759       1 status.go:71] apiserver received an error that is not an metav1.Status: rpctypes.EtcdError{code:0xe, desc:"etcdserver: leader changed"}
I0807 06:13:09.966006       1 trace.go:116] Trace[2104184172]: "Get" url:/apis/batch/v1/namespaces/acceptance/jobs/zeus-cron-gwtf-1596780780,user-agent:kube-controller-manager/v1.18.6 (linux/amd64) kubernetes/dff82dc/system:serviceaccount:kube-system:job-controller,client:127.0.0.1 (started: 2020-08-07 06:13:04.739204483 +0000 UTC m=+64653.762232288) (total time: 5.226765548s):
Trace[2104184172]: [5.226765548s] [5.226756688s] END
E0807 06:13:09.964955       1 status.go:71] apiserver received an error that is not an metav1.Status: rpctypes.EtcdError{code:0xe, desc:"etcdserver: leader changed"}
I0807 06:13:09.965250       1 trace.go:116] Trace[1438200376]: "Get" url:/api/v1/namespaces/kube-system/endpoints/kube-scheduler,user-agent:kube-scheduler/v1.18.6 (linux/amd64) kubernetes/dff82dc/leader-election,client:127.0.0.1 (started: 2020-08-07 06:13:06.064100277 +0000 UTC m=+71858.989783177) (total time: 3.901113666s):
Trace[1438200376]: [3.901113666s] [3.901096376s] END
I0807 06:13:09.966167       1 trace.go:116] Trace[223385911]: "Patch" url:/apis/cilium.io/v2/ciliumidentities/39169/status,user-agent:cilium-agent/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.143.159 (started: 2020-08-07 06:13:06.081608525 +0000 UTC m=+65187.268067656) (total time: 3.884537427s):
Trace[223385911]: [3.884537427s] [3.884129462s] END
I0807 06:13:09.965645       1 trace.go:116] Trace[1205949895]: "GuaranteedUpdate etcd3" type:*unstructured.Unstructured (started: 2020-08-07 06:13:06.081849118 +0000 UTC m=+65187.268308239) (total time: 3.883720607s):
Trace[1205949895]: [3.883720607s] [3.883554395s] END
E0807 06:13:09.965793       1 status.go:71] apiserver received an error that is not an metav1.Status: rpctypes.EtcdError{code:0xe, desc:"etcdserver: leader changed"}
I0807 06:13:08.590529       1 healthz.go:234] [+]ping ok
[+]log ok
[-]etcd failed: reason withheld
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/rbac/bootstrap-roles ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
healthz check failed
I0807 06:13:06.077871       1 trace.go:116] Trace[773813581]: "GuaranteedUpdate etcd3" type:*unstructured.Unstructured (started: 2020-08-07 06:13:04.793682452 +0000 UTC m=+65185.980141603) (total time: 1.284154658s):
Trace[773813581]: [1.283704723s] [1.28262773s] Transaction committed
I0807 06:13:06.078256       1 trace.go:116] Trace[81245877]: "Update" url:/apis/cilium.io/v2/ciliumidentities/45459/status,user-agent:cilium-agent/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.143.159 (started: 2020-08-07 06:13:04.793090645 +0000 UTC m=+65185.979549766) (total time: 1.2851312s):
Trace[81245877]: [1.284832966s] [1.28434106s] Object stored in database
I0807 06:08:21.306999       1 trace.go:116] Trace[1291006070]: "List" url:/api/v1/namespaces/monitoring/secrets,user-agent:operator/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.176.47 (started: 2020-08-07 06:08:20.621147423 +0000 UTC m=+71573.546830313) (total time: 685.811853ms):
Trace[1291006070]: [685.810623ms] [629.556012ms] Writing http response done count:46
I0807 06:08:20.087495       1 trace.go:116] Trace[213798601]: "List" url:/api/v1/namespaces/monitoring/secrets,user-agent:operator/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.0.176.47 (started: 2020-08-07 06:08:19.321672335 +0000 UTC m=+71572.247355215) (total time: 765.779678ms):
Trace[213798601]: [765.777368ms] [688.677486ms] Writing http response done count:46
I0807 06:08:19.420525       1 controller.go:606] quota admission added evaluator for: ingresses.extensions
I0807 06:08:19.378044       1 controller.go:606] quota admission added evaluator for: deployments.apps

To investigate further I've enabled apiserver audit log but cannot determine problems yet.

I would be happy to get the tips how to proceed with debugging or how to interpret those logs.

demisx · 2020-08-08T15:24:10Z

I am not sure whether apiserver_request:burnrate3d or apiserver_request:burnrate6h is firing in my case since they are both present in the alert expression:

sum(apiserver_request:burnrate3d) > (1 * 0.01) and sum(apiserver_request:burnrate6h) > (1 * 0.01)`.

I don't see any errors written to the api server log. Just these "LIST" requests from the flux daemon. Pretty lost at this point.

I0808 15:22:43.816594       1 trace.go:116] Trace[1771355195]: "List" url:/api/v1/secrets,user-agent:fluxd/v0.0.0 (linux/amd64) kubernetes/$Format,client:172.20.106.218 (started: 2020-08-08 15:22:43.145114257 +0000 UTC m=+249629.270256551) (total time: 671.436227ms):
Trace[1771355195]: [671.434607ms] [627.688687ms] Writing http response done count:134

demisx · 2020-08-19T17:01:50Z

Has anyone been able to figure the source of the problem? We are constantly getting hit with warning alerts from all 1.18.6 clusters.

demisx · 2020-08-22T19:49:31Z

After disabling Flux and Helm Operator, my LIST operation is around 50 ms now and consistent with the others.

However, the warning is still firing. Do you guys know why the Prometheus query UI returns different values when queried by the record rule name and the expression it corresponds to? For example, let's take apiserver_request:burnrate6h:

    - expr: |
        (
          (
            # too slow
            sum(rate(apiserver_request_duration_seconds_count{job="apiserver",verb=~"LIST|GET"}[6h]))
            -
            (
              sum(rate(apiserver_request_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",scope=~"resource|",le="0.1"}[6h])) +
              sum(rate(apiserver_request_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",scope="namespace",le="0.5"}[6h])) +
              sum(rate(apiserver_request_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",scope="cluster",le="5"}[6h]))
            )
          )
          +
          # errors
          sum(rate(apiserver_request_total{job="apiserver",verb=~"LIST|GET",code=~"5.."}[6h]))
        )
        /
        sum(rate(apiserver_request_total{job="apiserver",verb=~"LIST|GET"}[6h]))
      labels:
        verb: read
      record: apiserver_request:burnrate6h

When queried by the record rule name I get 0.024... :

But, when queried by the expression itself I get 0.000002816...:

Aren't these numbers supposed to be the same?

hakman · 2020-08-23T04:24:53Z

One observation about the config, the etcd verrsion is incorrect. It should be 3.4.3.

demisx · 2020-08-23T15:11:39Z

One observation about the config, the etcd verrsion is incorrect. It should be 3.4.3.

@hakman Thank you for noting that. Will this be something that kops will correct in future releases or do I need to edit the cluster manually?

hakman · 2020-08-24T03:18:18Z

This is something you manually added to your cluster config and kops won't override it. You should change it manually to 3.4.3.

justinaslelys · 2020-08-24T12:18:22Z

I do not have etcd version specified but have the same problem so most probably it's not the root cause of this.
I'm still have the same issue and I do not have Flux and Helm Operator.

demisx · 2020-08-24T15:25:50Z

@hakman Hmm.. 🤔 I've never touched the etcd version. I've always gone through kops upgrade cluster && kops rolling-update cluster --yes to upgrade the cluster. Anyway, I've manually changed the etcd version to 3.4.3 and then tried to do the rolling update, but ran into a problem with calico node pods erring out not being able to connect to the store. I couldn't roll back to 3.2.24 either because of this error, so I am recreating my test cluster from scratch now.

hakman · 2020-08-24T15:37:28Z

There was a bug with the create cluster --networking calico that was always setting etcd.version: 3.2.24. I just fixed it today. Sorry for the confusion.

demisx · 2020-08-24T17:09:18Z

@hakman What would be the proper way to upgrade an existing cluster to v3.4.3? Wait for the next kops 1.18.x release?

hakman · 2020-08-24T17:16:27Z

Not sure this change will make it to the next release. The effects are the same as setting the version to 3.4.3 in cluster config, nothing fancy. Upgrade should be probably done by going though 3.3.x also, to ensure the compatibility (at least this is what I think).

demisx · 2020-08-24T17:19:09Z

@xoxbet Do you get different values too when querying by the record rule name (e.g. apiserver_request:burnrate6h) vs the expression it corresponds to? Just as I've shown in my comment above.

justinaslelys · 2020-08-25T06:55:45Z

@demisx I get the same result when using raw expression and pre-generated record. Where from did you take expression? Yours is a bit different than mines (scope=~"resource|" vs scope="resource"). If I use your expression, I get different result than the record.

Check if expression you use to query prometheus matches the one here:
kubectl describe prometheusrule prometheus-operator-kube-apiserver.rules -n monitoring

Interestingly, our clusters doesn't share too much in common, we are even using different networking components.
Here is my cluster too, maybe someone can spot something

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: "2020-02-25T21:50:18Z"
  generation: 38
  name: hostname.tld
spec:
  additionalPolicies:
    node: <some stuff>
  api:
    loadBalancer:
      type: Public
  authorization:
    rbac: {}
  channel: stable
  cloudLabels:
    env: dev
  cloudProvider: aws
  configBase: s3://bucket/hostname.tld
  dnsZone: hostname.tld
  docker:
    logDriver: json-file
    logOpt:
      - max-size=20m
      - max-file=1
  etcdClusters:
    - cpuRequest: 200m
      etcdMembers:
        - instanceGroup: master-eu-west-1a
          name: a
        - instanceGroup: master-eu-west-1b
          name: b
        - instanceGroup: master-eu-west-1c
          name: c
      memoryRequest: 100Mi
      name: main
    - cpuRequest: 100m
      etcdMembers:
        - instanceGroup: master-eu-west-1a
          name: a
        - instanceGroup: master-eu-west-1b
          name: b
        - instanceGroup: master-eu-west-1c
          name: c
      memoryRequest: 100Mi
      name: events
  fileAssets:
    - content: |
        # Log all requests at the Metadata level.
        apiVersion: audit.k8s.io/v1
        kind: Policy
        rules:
        - level: Metadata
      name: apiserver-audit
      path: /srv/kubernetes/audit.yaml
      roles:
        - Master
  iam:
    allowContainerRegistry: true
    legacy: false
  kubeAPIServer:
    auditLogMaxAge: 10
    auditLogMaxBackups: 1
    auditLogMaxSize: 100
    auditLogPath: /var/log/kube-apiserver-audit.log
    auditPolicyFile: /srv/kubernetes/audit.yaml
    disableBasicAuth: true
  kubeControllerManager:
    horizontalPodAutoscalerDownscaleDelay: 15m0s
    horizontalPodAutoscalerSyncPeriod: 8s
    horizontalPodAutoscalerUpscaleDelay: 1m0s
  kubeDNS:
    nodeLocalDNS:
      enabled: true
    provider: CoreDNS
  kubelet:
    anonymousAuth: false
    authenticationTokenWebhook: true
    authorizationMode: Webhook
    evictionHard: memory.available<900Mi,nodefs.available<10%,nodefs.inodesFree<5%,imagefs.available<10%,imagefs.inodesFree<5%
  kubernetesApiAccess:
    - 0.0.0.0/0
  kubernetesVersion: 1.18.6
  masterInternalName: api.internal.hostname.tld
  masterPublicName: api.hostname.tld
  networkCIDR: 10.0.0.0/16
  networkID: vpc-110011
  networking:
    cilium:
      IPTablesRulesNoinstall: false
      autoDirectNodeRoutes: false
      bpfCTGlobalAnyMax: 0
      bpfCTGlobalTCPMax: 0
      clusterName: ""
      cniBinPath: ""
      enableNodePort: false
      enablePrometheusMetrics: true
      enableRemoteNodeIdentity: false
      enableipv4: false
      enableipv6: false
      monitorAggregation: ""
      nodeInitBootstrapFile: ""
      preallocateBPFMaps: true
      reconfigureKubelet: false
      removeCbrBridge: false
      restartPods: false
      sidecarIstioProxyImage: ""
      toFqdnsEnablePoller: false
      version: v1.7.5
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
    - 0.0.0.0/0
  subnets:
    <skipping, long and boring>
  topology:
    bastion:
      bastionPublicName: bastion.hostname.tld
    dns:
      type: Public
    masters: private
    nodes: private

justinaslelys · 2020-08-25T09:49:06Z

The fact we have different expressions was suspicious for me, so I've updated Prometheus operator from 8.13.8 to 9.3.1 and updated prometheusrules too (I have prometheus rules versioned in my git repo because I have some of alerts customised).

For now (1 hour after the upgrade) I don't see KubeAPIErrorBudgetBurn pending in alerts. I will keep you posted.

justinaslelys · 2020-08-25T14:43:31Z

Ok, I'm 99 percent sure that an issue is an old record expression. Compare the result of old (left) and updated (right) - the result very different

@demisx try updating prometheus rules and let's see if we can close this issue.

demisx · 2020-08-25T15:49:48Z

@xoxbet Oh, great catch 👍. I must've grabbed those rules from the wrong place. Today, I've copied them into the query UI from the cluster using your kubectl command and the values did match. I've also upgraded to the latest v9.3.1 of prometheus-operator helm chart and my alerts are green so far! Feels SO GOOD not being constantly messaged with the KubeAPIErrorBudgetBurn error. Thank you! 🙏

rifelpet mentioned this issue Aug 10, 2020

Implement Openstack cloudmock, add integration test #9722

Merged

demisx mentioned this issue Aug 21, 2020

PrometheusMissingRuleEvaluations Warning Fired prometheus-operator/prometheus-operator#3388

Closed

demisx closed this as completed Aug 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KubeAPIErrorBudgetBurn warnings fire after upgrade to 1.18.6 #9698

KubeAPIErrorBudgetBurn warnings fire after upgrade to 1.18.6 #9698

demisx commented Aug 6, 2020

johngmyers commented Aug 7, 2020

justinaslelys commented Aug 7, 2020 •

edited

demisx commented Aug 8, 2020

demisx commented Aug 19, 2020

demisx commented Aug 22, 2020

hakman commented Aug 23, 2020

demisx commented Aug 23, 2020

hakman commented Aug 24, 2020

justinaslelys commented Aug 24, 2020

demisx commented Aug 24, 2020 •

edited

hakman commented Aug 24, 2020

demisx commented Aug 24, 2020 •

edited

hakman commented Aug 24, 2020 •

edited

demisx commented Aug 24, 2020

justinaslelys commented Aug 25, 2020

justinaslelys commented Aug 25, 2020

justinaslelys commented Aug 25, 2020

demisx commented Aug 25, 2020

KubeAPIErrorBudgetBurn warnings fire after upgrade to 1.18.6 #9698

KubeAPIErrorBudgetBurn warnings fire after upgrade to 1.18.6 #9698

Comments

demisx commented Aug 6, 2020

johngmyers commented Aug 7, 2020

justinaslelys commented Aug 7, 2020 • edited

demisx commented Aug 8, 2020

demisx commented Aug 19, 2020

demisx commented Aug 22, 2020

hakman commented Aug 23, 2020

demisx commented Aug 23, 2020

hakman commented Aug 24, 2020

justinaslelys commented Aug 24, 2020

demisx commented Aug 24, 2020 • edited

hakman commented Aug 24, 2020

demisx commented Aug 24, 2020 • edited

hakman commented Aug 24, 2020 • edited

demisx commented Aug 24, 2020

justinaslelys commented Aug 25, 2020

justinaslelys commented Aug 25, 2020

justinaslelys commented Aug 25, 2020

demisx commented Aug 25, 2020

justinaslelys commented Aug 7, 2020 •

edited

demisx commented Aug 24, 2020 •

edited

demisx commented Aug 24, 2020 •

edited

hakman commented Aug 24, 2020 •

edited