Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HCP api pod at 20 cores #175

Closed
DanielFroehlich opened this issue May 27, 2024 · 12 comments
Closed

HCP api pod at 20 cores #175

DanielFroehlich opened this issue May 27, 2024 · 12 comments
Assignees
Labels
bug Something isn't working cluster/isar BareMetal COE Cluter

Comments

@DanielFroehlich
Copy link

While playing with acm observability, I realised we have two HCP cluster where an API server pod is consuming >20cores.
That does not feel right!
See e.g. here:
image

From: https://console-openshift-console.apps.isar.coe.muc.redhat.com/k8s/ns/rbohne-hcp-sendling/replicasets/kube-apiserver-86548cbbbf/pods

Same for rbohne-hcp-sendling-ingress

@DanielFroehlich DanielFroehlich added bug Something isn't working cluster/isar BareMetal COE Cluter labels May 27, 2024
@DanielFroehlich
Copy link
Author

Its not really getting better, now we have two kube-apiserver pods >20 cores. May I kill one of those and see what happens? @rbo , wdyt?

@rbo
Copy link
Member

rbo commented Jun 10, 2024

 oc adm top pod  --sum --sort-by=cpu
NAME                                                  CPU(cores)   MEMORY(bytes)   
kube-apiserver-86548cbbbf-kzqgq                       18906m       4364Mi          
kube-apiserver-86548cbbbf-64q6x                       15647m       2307Mi          
virt-launcher-sendling-ff7bf3fd-pn5vs-q8w7v           6723m        16267Mi         
kube-apiserver-86548cbbbf-bhsfj                       3785m        3789Mi          
etcd-0                                                2127m        619Mi           
ignition-server-5b4567866-kl4xf                       1014m        414Mi           
ignition-server-5b4567866-w568x                       1013m        334Mi           
etcd-1                                                988m         878Mi           
ignition-server-5b4567866-b7ksx                       892m         372Mi           
olm-operator-54d975b8c4-xtlmk                         474m         404Mi           
etcd-2                                                303m         504Mi           
kube-controller-manager-5fc9d96bdf-d2mgv              119m         296Mi           
control-plane-operator-646886cd59-knl2g               77m          346Mi           
redhat-operators-catalog-5778b9f69d-9ppd4             19m          86Mi            
community-operators-catalog-79c49fb477-krzt8          18m          147Mi           
certified-operators-catalog-5f585b98cd-5xz5z          17m          142Mi           
redhat-marketplace-catalog-867f99df5-grcnn            15m          68Mi            
openshift-apiserver-778f9db75c-6pjb5                  15m          321Mi           
openshift-apiserver-778f9db75c-zjqcj                  14m          311Mi           
openshift-apiserver-778f9db75c-kc7m9                  13m          227Mi           
virt-launcher-sendling-10d195e8-j8czd-sjshq           13m          1495Mi          
openshift-route-controller-manager-869c7c988b-xrncf   10m          60Mi            
cluster-policy-controller-74964cb9d6-6svtg            10m          191Mi           
cluster-network-operator-9664cbc94-mcmng              9m           281Mi           
packageserver-7f976d7855-brh6g                        8m           277Mi           
hosted-cluster-config-operator-748dbf6695-6kng2       8m           156Mi           
openshift-oauth-apiserver-584f69b7f9-7c2mp            7m           69Mi            
openshift-oauth-apiserver-584f69b7f9-f4jmm            7m           102Mi           
openshift-controller-manager-5578f894bb-7dd7p         6m           174Mi           
openshift-oauth-apiserver-584f69b7f9-9qd99            5m           85Mi            
machine-approver-85cb867c5-hdr2z                      5m           97Mi            
cluster-storage-operator-7fcdf884fb-8t2mf             5m           80Mi            
packageserver-7f976d7855-tv5pb                        5m           300Mi           
kube-controller-manager-5fc9d96bdf-7wjrs              4m           57Mi            
openshift-route-controller-manager-869c7c988b-8rjtd   4m           76Mi            
cluster-api-c7b575bb4-zg64v                           3m           101Mi           
konnectivity-agent-8599fd5d6b-sr6pn                   3m           44Mi            
capi-provider-7f58f475dd-h68hg                        3m           70Mi            
openshift-controller-manager-5578f894bb-4c9vn         3m           61Mi            
konnectivity-agent-8599fd5d6b-2brl7                   2m           35Mi            
packageserver-7f976d7855-bp87j                        2m           270Mi           
kube-scheduler-5b5c9478f4-kpq4l                       2m           93Mi            
kubevirt-cloud-controller-manager-974969547-9kq6w     2m           67Mi            
openshift-route-controller-manager-869c7c988b-kp2kh   2m           77Mi            
ingress-operator-7878df55d7-mq7cm                     2m           199Mi           
multus-admission-controller-568bb6cd65-8wr2v          2m           95Mi            
kube-scheduler-5b5c9478f4-7rdng                       2m           51Mi            
csi-snapshot-controller-operator-797bf595d9-n5csv     2m           80Mi            
oauth-openshift-6b8fc486c9-q8d5m                      2m           82Mi            
kube-controller-manager-5fc9d96bdf-5x4lp              2m           60Mi            
oauth-openshift-6b8fc486c9-gl7s4                      2m           106Mi           
ignition-server-proxy-57c4f77c97-6qn9r                1m           115Mi           
ovnkube-control-plane-75bffb695c-dq6fc                1m           143Mi           
ignition-server-proxy-57c4f77c97-8n7f5                1m           135Mi           
catalog-operator-74c748d567-2h9h8                     1m           366Mi           
cluster-autoscaler-679d6fbdf6-p87xg                   1m           119Mi           
cluster-image-registry-operator-69754bbcc9-kchgs      1m           143Mi           
openshift-controller-manager-5578f894bb-6rkvb         1m           65Mi            
cluster-policy-controller-74964cb9d6-6pqmj            1m           35Mi            
kube-scheduler-5b5c9478f4-98vnz                       1m           62Mi            
ignition-server-proxy-57c4f77c97-frg8q                1m           107Mi           
cluster-policy-controller-74964cb9d6-v2jnb            1m           37Mi            
network-node-identity-8495fd79d9-sxnw9                0m           98Mi            
kubevirt-csi-controller-54d7884b4b-r8rtj              0m           142Mi           
cluster-version-operator-55f8dfbdd7-k2g4c             0m           167Mi           
ovnkube-control-plane-75bffb695c-bgm4m                0m           112Mi           
ovnkube-control-plane-75bffb695c-bxpzh                0m           106Mi           
csi-snapshot-controller-7cdd696bfd-hczrf              0m           35Mi            
network-node-identity-8495fd79d9-sqkbl                0m           102Mi           
csi-snapshot-webhook-7c66684757-br56b                 0m           36Mi            
cluster-node-tuning-operator-58764957cb-bhjwc         0m           67Mi            
network-node-identity-8495fd79d9-zw2r5                0m           159Mi           
dns-operator-5d6f5c64b9-2w8hn                         0m           52Mi            
konnectivity-agent-8599fd5d6b-f8nkz                   0m           28Mi            
oauth-openshift-6b8fc486c9-bgkpf                      0m           79Mi            
                                                      ________     ________        
                                                      52332m       39738Mi

@rbo
Copy link
Member

rbo commented Jun 10, 2024

Try to fix it very quick:

oc delete pod --wait=false kube-apiserver-86548cbbbf-kzqgq   kube-apiserver-86548cbbbf-64q6x   

@rbo
Copy link
Member

rbo commented Jun 10, 2024

oc adm top pod  --sum --sort-by=cpu -n rbohne-hcp-sendling
NAME                                                  CPU(cores)   MEMORY(bytes)   
kube-apiserver-86548cbbbf-r6grl                       18262m       2289Mi          
kube-apiserver-86548cbbbf-btmct                       15798m       2114Mi          
virt-launcher-sendling-ff7bf3fd-pn5vs-q8w7v           6153m        16178Mi         
kube-apiserver-86548cbbbf-bhsfj                       5517m        4611Mi          
etcd-0                                                2303m        616Mi           
openshift-apiserver-778f9db75c-6pjb5                  1254m        333Mi           
etcd-1                                                1109m        861Mi           
ignition-server-5b4567866-kl4xf                       1064m        414Mi           
ignition-server-5b4567866-w568x                       1017m        333Mi           
ignition-server-5b4567866-b7ksx                       990m         372Mi           
olm-operator-54d975b8c4-xtlmk                         800m         404Mi           
hosted-cluster-config-operator-748dbf6695-6kng2       537m         151Mi           
etcd-2                                                338m         536Mi           
konnectivity-agent-8599fd5d6b-sr6pn                   272m         45Mi            
openshift-apiserver-778f9db75c-kc7m9                  241m         272Mi           
konnectivity-agent-8599fd5d6b-f8nkz                   199m         29Mi            
openshift-oauth-apiserver-584f69b7f9-f4jmm            182m         117Mi           
openshift-oauth-apiserver-584f69b7f9-9qd99            167m         96Mi            
kube-controller-manager-5fc9d96bdf-d2mgv              153m         300Mi           
packageserver-7f976d7855-brh6g                        89m          289Mi           
packageserver-7f976d7855-bp87j                        79m          259Mi           
openshift-oauth-apiserver-584f69b7f9-7c2mp            69m          82Mi            
control-plane-operator-646886cd59-knl2g               50m          338Mi           
redhat-operators-catalog-5778b9f69d-9ppd4             21m          84Mi            
certified-operators-catalog-5f585b98cd-5xz5z          17m          133Mi           
redhat-marketplace-catalog-867f99df5-grcnn            16m          68Mi            
openshift-apiserver-778f9db75c-zjqcj                  16m          325Mi           
community-operators-catalog-79c49fb477-krzt8          15m          143Mi           
openshift-route-controller-manager-869c7c988b-xrncf   15m          60Mi            
packageserver-7f976d7855-tv5pb                        14m          258Mi           
virt-launcher-sendling-10d195e8-j8czd-sjshq           13m          1495Mi          
cluster-network-operator-9664cbc94-mcmng              10m          286Mi           
machine-approver-85cb867c5-hdr2z                      10m          99Mi            
cluster-policy-controller-74964cb9d6-6svtg            8m           192Mi           
openshift-controller-manager-5578f894bb-4c9vn         6m           60Mi            
openshift-controller-manager-5578f894bb-7dd7p         6m           175Mi           
kube-controller-manager-5fc9d96bdf-7wjrs              5m           60Mi            
kube-scheduler-5b5c9478f4-7rdng                       5m           51Mi            
capi-provider-7f58f475dd-h68hg                        4m           70Mi            
cluster-storage-operator-7fcdf884fb-8t2mf             4m           81Mi            
openshift-route-controller-manager-869c7c988b-8rjtd   4m           76Mi            
cluster-api-c7b575bb4-zg64v                           3m           101Mi           
ovnkube-control-plane-75bffb695c-dq6fc                3m           143Mi           
ignition-server-proxy-57c4f77c97-6qn9r                2m           115Mi           
csi-snapshot-controller-operator-797bf595d9-n5csv     2m           81Mi            
multus-admission-controller-568bb6cd65-8wr2v          2m           93Mi            
catalog-operator-74c748d567-2h9h8                     2m           313Mi           
kube-controller-manager-5fc9d96bdf-5x4lp              2m           60Mi            
openshift-route-controller-manager-869c7c988b-kp2kh   2m           78Mi            
kube-scheduler-5b5c9478f4-kpq4l                       2m           93Mi            
oauth-openshift-6b8fc486c9-gl7s4                      2m           106Mi           
oauth-openshift-6b8fc486c9-q8d5m                      2m           82Mi            
ingress-operator-7878df55d7-mq7cm                     2m           193Mi           
cluster-autoscaler-679d6fbdf6-p87xg                   1m           120Mi           
ignition-server-proxy-57c4f77c97-8n7f5                1m           135Mi           
cluster-image-registry-operator-69754bbcc9-kchgs      1m           141Mi           
cluster-policy-controller-74964cb9d6-v2jnb            1m           37Mi            
openshift-controller-manager-5578f894bb-6rkvb         1m           62Mi            
kube-scheduler-5b5c9478f4-98vnz                       1m           63Mi            
ignition-server-proxy-57c4f77c97-frg8q                1m           107Mi           
oauth-openshift-6b8fc486c9-bgkpf                      1m           79Mi            
kubevirt-cloud-controller-manager-974969547-9kq6w     1m           67Mi            
csi-snapshot-webhook-7c66684757-br56b                 0m           36Mi            
cluster-version-operator-55f8dfbdd7-k2g4c             0m           169Mi           
csi-snapshot-controller-7cdd696bfd-hczrf              0m           35Mi            
ovnkube-control-plane-75bffb695c-bgm4m                0m           112Mi           
ovnkube-control-plane-75bffb695c-bxpzh                0m           106Mi           
konnectivity-agent-8599fd5d6b-2brl7                   0m           34Mi            
dns-operator-5d6f5c64b9-2w8hn                         0m           53Mi            
cluster-policy-controller-74964cb9d6-6pqmj            0m           35Mi            
cluster-node-tuning-operator-58764957cb-bhjwc         0m           73Mi            
kubevirt-csi-controller-54d7884b4b-r8rtj              0m           142Mi           
network-node-identity-8495fd79d9-zw2r5                0m           160Mi           
network-node-identity-8495fd79d9-sxnw9                0m           98Mi            
network-node-identity-8495fd79d9-sqkbl                0m           102Mi           
                                                      ________     ________        
                                                      56867m       38218Mi

Nothing changed..

@DanielFroehlich
Copy link
Author

DanielFroehlich commented Jun 10, 2024 via email

@rbo
Copy link
Member

rbo commented Jun 10, 2024

@DanielFroehlich Agree!

Looks like a upgrade is running:

image

@rbo
Copy link
Member

rbo commented Jun 10, 2024

Red Hat SSO config is broken on that cluster. Let's fix this first.

@rbo
Copy link
Member

rbo commented Jun 10, 2024

Update stuck because one of the two nodes stuck joining:
image

@rbo
Copy link
Member

rbo commented Jun 10, 2024

Try to fetch from the pod where the VM is running:

sh-5.1$ curl https://ignition-server-rbohne-hcp-sendling.apps.isar.coe.muc.redhat.com/ignition
curl: (6) Could not resolve host: ignition-server-rbohne-hcp-sendling.apps.isar.coe.muc.redhat.com
sh-5.1$ 

@rbo rbo closed this as completed Jun 10, 2024
@rbo rbo reopened this Jun 10, 2024
@rbo
Copy link
Member

rbo commented Jun 10, 2024

$ oc rsh virt-launcher-sendling-ff7bf3fd-pn5vs-q8w7v
sh-5.1$ curl -kvvv https://ignition-server-rbohne-hcp-sendling.apps.isar.coe.muc.redhat.com/ignition
* Could not resolve host: ignition-server-rbohne-hcp-sendling.apps.isar.coe.muc.redhat.com
* Closing connection 0
curl: (6) Could not resolve host: ignition-server-rbohne-hcp-sendling.apps.isar.coe.muc.redhat.com
sh-5.1$ 

The other vm pod can not curl as well..

@rbo
Copy link
Member

rbo commented Jun 10, 2024

Sorry noo time anymore and sendling is not importend. Deleted. Problem solved :-/

@rbo rbo closed this as completed Jun 10, 2024
@rbo
Copy link
Member

rbo commented Jun 10, 2024

Documented at the wrong issue:

Try the curl from another pod on the same node:

$ oc project rbohne-hcp-sendling
Now using project "rbohne-hcp-sendling" on server "https://api.isar.coe.muc.redhat.com:6443".
$ oc get pods -o wide | grep sendling-10d195e8-j8czd
virt-launcher-sendling-10d195e8-j8czd-sjshq           1/1     Running     0              11d     10.128.10.152   inf44   <none>           1/1
$ oc rsh virt-launcher-sendling-10d195e8-j8czd-sjshq
sh-5.1$ curl -k https://ignition-server-rbohne-hcp-sendling.apps.isar.coe.muc.redhat.com/ignition
curl: (6) Could not resolve host: ignition-server-rbohne-hcp-sendling.apps.isar.coe.muc.redhat.com
sh-5.1$ exit
exit
command terminated with exit code 6
$ oc get pods -o wide | grep inf44
cluster-policy-controller-74964cb9d6-6pqmj            1/1     Running     6 (9d ago)     14d     10.128.10.55    inf44   <none>           <none>
etcd-1                                                3/3     Running     4 (13d ago)    13d     10.128.10.17    inf44   <none>           <none>
ignition-server-5b4567866-b7ksx                       1/1     Running     0              14d     10.128.10.56    inf44   <none>           <none>
ignition-server-proxy-57c4f77c97-6qn9r                1/1     Running     0              14d     10.128.10.57    inf44   <none>           <none>
konnectivity-agent-8599fd5d6b-f8nkz                   1/1     Running     0              14d     10.128.10.58    inf44   <none>           <none>
kube-apiserver-86548cbbbf-bhsfj                       4/4     Running     0              14d     10.128.10.29    inf44   <none>           <none>
kube-controller-manager-5fc9d96bdf-5x4lp              1/1     Running     6 (9d ago)     14d     10.128.10.59    inf44   <none>           <none>
kube-scheduler-5b5c9478f4-98vnz                       1/1     Running     1 (9d ago)     14d     10.128.10.51    inf44   <none>           <none>
network-node-identity-8495fd79d9-sqkbl                3/3     Running     16 (13d ago)   14d     10.128.10.30    inf44   <none>           <none>
oauth-openshift-6b8fc486c9-xb6s8                      2/2     Running     0              10m     10.128.10.252   inf44   <none>           <none>
openshift-apiserver-778f9db75c-6pjb5                  3/3     Running     6 (9d ago)     14d     10.128.10.32    inf44   <none>           <none>
openshift-controller-manager-5578f894bb-6rkvb         1/1     Running     0              14d     10.128.10.61    inf44   <none>           <none>
openshift-oauth-apiserver-584f69b7f9-9qd99            2/2     Running     6 (9d ago)     14d     10.128.10.33    inf44   <none>           <none>
openshift-route-controller-manager-869c7c988b-kp2kh   1/1     Running     10 (9d ago)    14d     10.128.10.60    inf44   <none>           <none>
ovnkube-control-plane-75bffb695c-bxpzh                3/3     Running     17 (13d ago)   14d     10.128.10.34    inf44   <none>           <none>
packageserver-7f976d7855-bp87j                        2/2     Running     0              14d     10.128.10.35    inf44   <none>           <none>
virt-launcher-sendling-10d195e8-j8czd-sjshq           1/1     Running     0              11d     10.128.10.152   inf44   <none>           1/1
$ oc rsh packageserver-7f976d7855-bp87j
Defaulted container "packageserver" out of: packageserver, socks5-proxy, availability-prober (init)
sh-4.4$ curl -k https://ignition-server-rbohne-hcp-sendling.apps.isar.coe.muc.redhat.com/ignition
Unauthorized
sh-4.4$ cat /etc/resolv.conf 
search rbohne-hcp-sendling.svc.cluster.local svc.cluster.local cluster.local isar.coe.muc.redhat.com coe.muc.redhat.com
nameserver 172.30.0.10
options ndots:5
sh-4.4$ 

=> works, Unauthorized is expected.

Resolve conf is the same.

$ oc get pods -o yaml packageserver-7f976d7855-bp87j | grep -i dns
          "dns": {}
  dnsPolicy: ClusterFirst
$ oc get pods -o yaml virt-launcher-sendling-10d195e8-j8czd-sjshq | grep -i dns
          "dns": {}
  dnsPolicy: ClusterFirst
$ 
``

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cluster/isar BareMetal COE Cluter
Projects
None yet
Development

No branches or pull requests

2 participants