Found two or more with same ip #7897

bvboca · 2021-07-24T15:29:37Z

Upgraded my cluster from 1.18 to 1.19
Run jx admin to install JenkinsX
Wait until all-set
JX Dashboard can be accessed successfully, but some of the pods keep recreating.
I log most of the failure pods. It seems something wrong with the health check mechanism.

logs from deployment pod in kuberhealthy

time="2021-07-24T15:01:05Z" level=info msg="Successfully hit service endpoint."
time="2021-07-24T15:01:05Z" level=info msg="Rolling update option is enabled. Performing roll."
time="2021-07-24T15:01:05Z" level=info msg="Creating deployment resource with 4 replica(s) in kuberhealthy namespace using image [nginxinc/nginx-unprivileged:1.17.9] with environment variables: map[]"
time="2021-07-24T15:01:05Z" level=info msg="Creating container using image [nginxinc/nginx-unprivileged:1.17.9] with environment variables: map[]"
time="2021-07-24T15:01:05Z" level=info msg="Created rolling-update deployment resource."
time="2021-07-24T15:01:05Z" level=info msg="Performing rolling-update on deployment deployment-deployment to [nginxinc/nginx-unprivileged:1.17.9]"
time="2021-07-24T15:01:26Z" level=info msg="Rolled deployment in kuberhealthy namespace: deployment-deployment"
time="2021-07-24T15:01:26Z" level=info msg="Looking for a response from the endpoint."
time="2021-07-24T15:01:26Z" level=info msg="Beginning backoff loop for HTTP GET request."
time="2021-07-24T15:01:26Z" level=info msg="Successfully made an HTTP request on attempt: 1"
time="2021-07-24T15:01:26Z" level=info msg="Got a 200 with a GET to http://10.108.6.162"
time="2021-07-24T15:01:26Z" level=info msg="Got a result from GET request backoff: 200 OK"
time="2021-07-24T15:01:26Z" level=info msg="Successfully hit service endpoint after rolling-update."
time="2021-07-24T15:01:26Z" level=info msg="Cleaning up deployment and service."
time="2021-07-24T15:01:26Z" level=info msg="Attempting to delete service deployment-svc in kuberhealthy namespace."
time="2021-07-24T15:01:31Z" level=info msg="Attempting to delete deployment in kuberhealthy namespace."
time="2021-07-24T15:01:36Z" level=info msg="Attempting to delete deployment in kuberhealthy namespace."
time="2021-07-24T15:01:41Z" level=info msg="Finished clean up process."
time="2021-07-24T15:01:41Z" level=info msg="Reporting success to Kuberhealthy."
time="2021-07-24T15:02:42Z" level=fatal msg="error reporting to kuberhealthy: bad status code from kuberhealthy status reporting url: [400] 400 Bad Request

bvboca · 2021-07-28T13:53:59Z

Found some logs from kuberhealthy
time="2021-07-28T13:34:11Z" level=info msg="047fe940-68df-4da6-b01f-d8ed6d21968e jx/jx-webhook-events: [Last report time was: 2021-07-28 13:30:08.885051131 +0000 UTC vs 2021-07-28 13:30:08.885051131 +0000 UTC]"
time="2021-07-28T13:34:11Z" level=info msg="047fe940-68df-4da6-b01f-d8ed6d21968e jx/jx-webhook-events: [have not yet seen pod update since 2021-07-28 13:30:08.885051131 +0000 UTC]"
time="2021-07-28T13:34:11Z" level=info msg="499d2931-7418-493d-ac76-b095cea1183c kuberhealthy/jx-pod-status: [waiting for external checker pod to report in...]"
time="2021-07-28T13:34:11Z" level=info msg="499d2931-7418-493d-ac76-b095cea1183c kuberhealthy/jx-pod-status: [Last report time was: 2021-07-28 13:29:21.391488049 +0000 UTC vs 2021-07-28 13:29:21.391488049 +0000 UTC]"
time="2021-07-28T13:34:11Z" level=info msg="499d2931-7418-493d-ac76-b095cea1183c kuberhealthy/jx-pod-status: [have not yet seen pod update since 2021-07-28 13:29:21.391488049 +0000 UTC]"
time="2021-07-28T13:34:11Z" level=warning msg="was unable to find calling pod with remote IP 192.168.31.123 while watching for duration. Error: failed to fetch pod with remote ip 192.168.31.123 - found two or more with same ip"
time="2021-07-28T13:34:12Z" level=warning msg="was unable to find calling pod with remote IP 192.168.31.123 while watching for duration. Error: failed to fetch pod with remote ip 192.168.31.123 - found two or more with same ip"
time="2021-07-28T13:34:12Z" level=warning msg="was unable to find calling pod with remote IP 192.168.31.123 while watching for duration. Error: failed to fetch pod with remote ip 192.168.31.123 - found two or more with same ip"
time="2021-07-28T13:34:13Z" level=warning msg="was unable to find calling pod with remote IP 192.168.31.123 while watching for duration. Error: failed to fetch pod with remote ip 192.168.31.123 - found two or more with same ip"

It seems to be related to the issue.
https://github.com/kuberhealthy/kuberhealthy/issues/870

jstrachan · 2021-07-28T15:00:51Z

did the last boot job succeed (see the end of the log of the most recent jx admin log)

jstrachan · 2021-07-28T15:01:33Z

which install instructions are you following? how did you install kuberhealthy?

bvboca · 2021-07-28T15:17:31Z

HI James,

I installed the server including kuberhealthy by steps of https://jenkins-x.io/v3/admin/setup/operator/
jx admin operator --url=...

It seems my installation boot job succeeds with some errors:

jstrachan · 2021-07-28T15:20:26Z

could you do a dummy commit in your git repository (e.g. modify the README.md) and do jx admin log -w and post the last page or so please?

bvboca · 2021-07-29T03:58:36Z

James,

Here's my log:

jstrachan · 2021-07-29T06:24:37Z

which git repository template did you start from? the on premise one right? https://github.com/jx3-gitops-repositories/jx3-kubernetes

whats the output of:

kubectl get pod --all-namespaces

I don't really understand why kuberhealthy is not working in your cluster. The daemonset / deployment checks (the first 2 health checks) are pure vanilla kuberhealthy + k8s checks and have absolutely nothing to do with jenkins x at all - they verify k8s stuff

jstrachan · 2021-07-29T06:25:45Z

also try

kubectl top node

it could be your cluster is out of capacity?

bvboca · 2021-07-29T08:51:30Z

Yes I'm using the on premise one. https://github.com/bvboca/jx3-kubernetes
es
READY STATUS RESTARTS AGE
1/1 Running 165 17d
1/1 Running 3 6d18h
1/1 Running 3 6d18h
1/1 Running 101 391d
1/1 Running 102 391d
1/1 Running 99 390d
79c8ddf5c-vlxhr 1/1 Running 62 218d
1/1 Running 3 6d18h
1/1 Running 3 6d18h
1/1 Running 35 157d
1/1 Running 87 207d
1/1 Running 51 207d
1/1 Running 51 207d
1/1 Running 73 228d
1/1 Running 120 457d
1/1 Running 120 457d
1/1 Running 120 457d
1/1 Running 120 457d
1/1 Running 120 457d
1/1 Running 72 226d
1/1 Running 72 226d
1/1 Running 72 226d
1/1 Running 120 457d
1/1 Running 121 457d
2/2 Running 90 193d
2/2 Running 90 193d
1/1 Running 114 430d
1/1 Running 113 430d
0/1 Evicted 0 430d
1/1 Running 74 228d
1/1 Running 114 428d
50c-f5341a42ac03-6pgnw 0/1 Error 0 31h
50c-f5341a42ac03-p2v5w 0/1 Completed 0 31h
b6a-21ea5fdfcaec-f6qlw 0/1 Completed 0 5h57m
e78-5f4106e75fae-dcmlp 0/1 Completed 0 31h
9c6-4cbe07f2f7b2-wrdcd 0/1 Completed 0 5h31m
2db-6ca406ca60df-5f4h9 0/1 Completed 0 5h24m
446-3926f9cf8ecb-9hq2c 0/1 Completed 0 5h48m
1/1 Running 3 31h
1/1 Running 1 31h
0/1 Error 0 23m
0/1 Error 0 18m
0/1 Error 0 13m
0/1 Error 0 8m39s
0/1 Error 0 3m39s
1/1 Running 2 31h
0/1 Completed 0 156m
0/1 Completed 0 126m
0/1 Completed 0 156m
0/1 Completed 0 126m
0/1 Error 0 126m
1/1 Running 1 31h
0/1 Completed 0 26m
0/1 Completed 0 16m
0/1 Completed 0 6m29s
0/1 Error 0 11m
0/1 Error 0 9m17s
0/1 Error 0 7m16s
0/1 Error 0 5m15s
0/1 Error 0 3m14s
0/1 Error 0 73s
0/1 Error 0 23m
0/1 Error 0 18m
0/1 Error 0 13m
0/1 Error 0 8m39s
0/1 Error 0 3m39s
1/1 Running 1 31h
0/1 Completed 0 66m
0/1 Completed 0 36m
0/1 Completed 0 6m29s
1/1 Running 1 31h
-7fd9c75449-lfr59 1/1 Running 1 31h
1/1 Running 0 5h18m
1/1 Running 114 430d
1/1 Running 113 430d
1/1 Running 114 430d
1/1 Running 114 430d
1/1 Running 114 430d
1/1 Running 519 428d
1/1 Running 518 428d
1/1 Running 113 428d
1/1 Running 72 226d
1/1 Running 115 428d
2/2 Running 159 428d
2/2 Running 232 428d
0/1 CrashLoopBackOff 1116 226d
0/1 Running 23279 226d
1/1 Running 248 430d
1/1 Running 115 430d
1/1 Running 114 430d
0/1 Completed 0 430d
1/1 Running 113 430d
1/1 Running 114 430d
1/1 Running 5 6d19h
1/1 Running 133 457d
1/1 Running 74 472d
1/1 Running 2 6d17h
1/1 Running 2 6d18h
1/1 Running 3 6d18h
1/1 Running 5 6d18h
1/1 Running 3 6d18h
1/1 Running 2 6d17h
1/1 Running 3 6d18h
1/1 Running 3 6d18h
1/1 Running 107 413d
0/1 Error 0 63m
0/1 Error 0 48m
0/1 Error 0 33m
0/1 Error 0 18m
0/1 Error 0 3m39s
0/1 Error 0 68m
0/1 Error 0 53m
0/1 Error 0 38m
0/1 Error 0 23m
0/1 Error 0 8m16s
0/1 Completed 0 63m
0/1 Completed 0 48m
0/1 Completed 0 33m
0/1 Completed 0 18m
0/1 Completed 0 3m12s
0/1 Error 0 73m
0/1 Error 0 58m
0/1 Error 0 43m
0/1 Error 0 28m
0/1 Error 0 13m
0/1 Error 0 23m
0/1 Error 0 18m
0/1 Error 0 13m
0/1 Error 0 8m24s
0/1 Error 0 3m23s
1/1 Running 1 31h
1/1 Running 1 31h
0/1 Completed 0 123m
0/1 Completed 0 93m
0/1 Completed 0 63m
0/1 Completed 0 33m
0/1 Completed 0 3m38s
4884c9d5-sg4s9 1/1 Running 5 6d19h
1/1 Running 7 6d19h
1/1 Running 51 157d
1/1 Running 113 430d
1/1 Running 86 228d
1/1 Running 71 430d
1/1 Running 127 429d
1/1 Running 129 429d
0/1 Completed 0 31h
0/1 Completed 1 31h
1/1 Running 1 31h
1/1 Running 1 31h
1/1 Running 1 31h
1/1 Running 103 394d
0/1 Evicted 0 228d
0/1 Evicted 0 394d
0/1 Evicted 0 228d
1/1 Running 74 228d
0/1 Evicted 0 228d
1/1 Running 11 6d19h
1/1 Running 0 5h34m
1/1 Running 0 5h32m
77578b9fb-q5bmh 1/1 Running 1 31h
1/1 Running 1 31h

bvboca · 2021-07-29T08:53:10Z

It seems the resources are fine now.

Resource Requests Limits

cpu 5270m (65%) 17320m (216%)
memory 4737096Ki (19%) 15317280Ki (61%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)

bvboca · 2021-07-29T09:01:49Z

James,

I checked the failed pods' logs. Most of them are about the kuberhealthy status reporting :

Here's the log from jx-bot-token:
time="2021-07-29T08:27:56Z" level=info msg="Found instance namespace: jx"
time="2021-07-29T08:27:56Z" level=info msg="Kuberhealthy is located in the jx namespace."
starting jx-bot-token health checks
received 200
FATAL: failed to report success status bad status code from kuberhealthy status reporting url: [400] 400 Bad Request

Log from jx-webhooks:
time="2021-07-29T08:47:28Z" level=info msg="Found instance namespace: jx"
time="2021-07-29T08:47:28Z" level=info msg="Kuberhealthy is located in the jx namespace."
starting jx-webhooks health checks
FATAL: failed to report success status bad status code from kuberhealthy status reporting url: [400] 400 Bad Request

Log from jx-webhook-events:
time="2021-07-29T08:32:55Z" level=info msg="Found instance namespace: jx"
time="2021-07-29T08:32:55Z" level=info msg="Kuberhealthy is located in the jx namespace."
starting jx-webhook-events health checks
FATAL: failed to report failure status bad status code from kuberhealthy status reporting url: [400] 400 Bad Request

Log from jx-pod-status:
time="2021-07-29T08:23:14Z" level=info msg="Found instance namespace: kuberhealthy"
time="2021-07-29T08:23:14Z" level=info msg="Kuberhealthy is located in the kuberhealthy namespace."
starting jx-install health checks
skipping checks on pod because it is too young: jx/jx-preview-gc-jobs-1627546800-jh274
2021/07/29 08:23:14 checkClient: DEBUG: Reporting SUCCESS
2021/07/29 08:23:14 checkClient: DEBUG: Sending report with error length of:0
2021/07/29 08:23:14 checkClient: DEBUG: Sending report with ok state of:true
2021/07/29 08:23:14 checkClient: INFO: Using kuberhealthy reporting URL:http://kuberhealthy.kuberhealthy.svc.cluster.local/externalCheckStatus
2021/07/29 08:24:15 checkClient: ERROR: got a bad status code from kuberhealthy:400400 Bad Request
FATAL: failed to report success status bad status code from kuberhealthy status reporting url: [400] 400 Bad Request

jstrachan · 2021-07-29T15:49:36Z

there is a kuberhealthy service running in the kuberhealthy namespace right? can you try curl http://kuberhealthy.kuberhealthy.svc.cluster.local from inside a pod in the cluster? I wonder if there's an issue with service + DNS in your cluster?

jstrachan · 2021-07-29T15:52:01Z

e.g. run

kubectl exec -it jx-build-controller-XXXX bash

then run

curl -v http://kuberhealthy.kuberhealthy.svc.cluster.local

jstrachan · 2021-07-29T15:52:12Z

you should get a 200 with json output

bvboca · 2021-07-30T01:15:36Z

$ curl http://kuberhealthy.kuberhealthy.svc.cluster.local
{
"OK": false,
"Errors": [
"Check execution error: kuberhealthy/deployment: timed out waiting for checker pod to report in",
"Check execution error: kuberhealthy/daemonset: timed out waiting for checker pod to report in",
"Check execution error: kuberhealthy/network-connection-check: timed out waiting for checker pod to report in",
"Check execution error: jx/jx-bot-token: timed out waiting for checker pod to report in",
"Check execution error: kuberhealthy/jx-pod-status: timed out waiting for checker pod to report in",
"Check execution error: jx/jx-webhook-events: timed out waiting for checker pod to report in",
"Check execution error: jx/jx-webhook: timed out waiting for checker pod to report in",
"Check execution error: kuberhealthy/jx-secrets: timed out waiting for checker pod to report in",
"Check execution error: kuberhealthy/dns-status-internal: timed out waiting for checker pod to report in"
],
"CheckDetails": {
"jx/jx-bot-token": {
"OK": false,
"Errors": [
"Check execution error: jx/jx-bot-token: timed out waiting for checker pod to report in"
],
"RunDuration": "",
"Namespace": "jx",
"LastRun": "2021-07-30T01:04:45.32803228Z",
"AuthoritativePod": "kuberhealthy-7667c57ff7-7mc9q",
"uuid": "034834f2-fc69-43fc-8e71-22193d500004"
},
"jx/jx-webhook": {
"OK": false,
"Errors": [
"Check execution error: jx/jx-webhook: timed out waiting for checker pod to report in"
],
"RunDuration": "",
"Namespace": "jx",
"LastRun": "2021-07-30T01:08:03.229217849Z",
"AuthoritativePod": "kuberhealthy-7667c57ff7-7mc9q",
"uuid": "0098a96f-a167-460c-b01d-65aa2f6df2d8"
},
"jx/jx-webhook-events": {
"OK": false,
"Errors": [
"Check execution error: jx/jx-webhook-events: timed out waiting for checker pod to report in"
],
"RunDuration": "",
"Namespace": "jx",
"LastRun": "2021-07-30T01:04:45.140197347Z",
"AuthoritativePod": "kuberhealthy-7667c57ff7-7mc9q",
"uuid": "23b4ce09-b608-46a4-bed5-970a321b59b6"
},
"kuberhealthy/daemonset": {
"OK": false,
"Errors": [
"Check execution error: kuberhealthy/daemonset: timed out waiting for checker pod to report in"
],
"RunDuration": "",
"Namespace": "kuberhealthy",
"LastRun": "2021-07-30T00:59:46.134773453Z",
"AuthoritativePod": "kuberhealthy-7667c57ff7-7mc9q",
"uuid": "99b273bb-b0b9-42a4-b00d-0a7fd01442a4"
},
"kuberhealthy/deployment": {
"OK": false,
"Errors": [
"Check execution error: kuberhealthy/deployment: timed out waiting for checker pod to report in"
],
"RunDuration": "",
"Namespace": "kuberhealthy",
"LastRun": "2021-07-30T00:59:19.52312946Z",
"AuthoritativePod": "kuberhealthy-7667c57ff7-7mc9q",
"uuid": "e7693f4f-6ca5-46a4-baec-55c4c982672a"
},
"kuberhealthy/dns-status-internal": {
"OK": false,
"Errors": [
"Check execution error: kuberhealthy/dns-status-internal: timed out waiting for checker pod to report in"
],
"RunDuration": "",
"Namespace": "kuberhealthy",
"LastRun": "2021-07-30T01:04:22.883599436Z",
"AuthoritativePod": "kuberhealthy-7667c57ff7-7mc9q",
"uuid": "c325572e-51bc-4992-a490-c357a58ff011"
},
"kuberhealthy/jx-pod-status": {
"OK": false,
"Errors": [
"Check execution error: kuberhealthy/jx-pod-status: timed out waiting for checker pod to report in"
],
"RunDuration": "",
"Namespace": "kuberhealthy",
"LastRun": "2021-07-30T01:09:19.389733671Z",
"AuthoritativePod": "kuberhealthy-7667c57ff7-7mc9q",
"uuid": "3fbd8795-6938-4c14-ae21-f9b657c2e0e0"
},
"kuberhealthy/jx-secrets": {
"OK": false,
"Errors": [
"Check execution error: kuberhealthy/jx-secrets: timed out waiting for checker pod to report in"
],
"RunDuration": "",
"Namespace": "kuberhealthy",
"LastRun": "2021-07-30T01:06:36.588579586Z",
"AuthoritativePod": "kuberhealthy-7667c57ff7-7mc9q",
"uuid": "3bd3190e-e1a6-4bcb-8753-08335e728a0e"
},
"kuberhealthy/network-connection-check": {
"OK": false,
"Errors": [
"Check execution error: kuberhealthy/network-connection-check: timed out waiting for checker pod to report in"
],
"RunDuration": "",
"Namespace": "kuberhealthy",
"LastRun": "2021-07-30T00:42:45.501532474Z",
"AuthoritativePod": "kuberhealthy-7667c57ff7-7mc9q",
"uuid": "3dabdfb2-c8cf-49e0-b345-708016631ecd"
}
},
"JobDetails": {},
"CurrentMaster": "kuberhealthy-7667c57ff7-7mc9q"

bvboca · 2021-07-30T01:41:56Z

James,

I think the pod DNS should be fine. The above comment shows the response from kuberhealthy service. And I have other apps depending on DNS and running well.

Is it related to kuberhealthy/kuberhealthy#858?

msvticket · 2023-08-30T08:39:22Z

We are now disabling kuberhealthy. It seems to cause more headaches that it solves.

bvboca changed the title ~~Error reporting to kuberhealthy~~ Found two or more with same ip Jul 28, 2021

msvticket closed this as completed Aug 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Found two or more with same ip #7897

Found two or more with same ip #7897

bvboca commented Jul 24, 2021

bvboca commented Jul 28, 2021

jstrachan commented Jul 28, 2021

jstrachan commented Jul 28, 2021

bvboca commented Jul 28, 2021 •

edited

Loading

jstrachan commented Jul 28, 2021

bvboca commented Jul 29, 2021

jstrachan commented Jul 29, 2021

jstrachan commented Jul 29, 2021

bvboca commented Jul 29, 2021

bvboca commented Jul 29, 2021

bvboca commented Jul 29, 2021

jstrachan commented Jul 29, 2021

jstrachan commented Jul 29, 2021

jstrachan commented Jul 29, 2021

bvboca commented Jul 30, 2021

bvboca commented Jul 30, 2021

msvticket commented Aug 30, 2023

Found two or more with same ip #7897

Found two or more with same ip #7897

Comments

bvboca commented Jul 24, 2021

bvboca commented Jul 28, 2021

jstrachan commented Jul 28, 2021

jstrachan commented Jul 28, 2021

bvboca commented Jul 28, 2021 • edited Loading

jstrachan commented Jul 28, 2021

bvboca commented Jul 29, 2021

jstrachan commented Jul 29, 2021

jstrachan commented Jul 29, 2021

bvboca commented Jul 29, 2021

bvboca commented Jul 29, 2021

bvboca commented Jul 29, 2021

jstrachan commented Jul 29, 2021

jstrachan commented Jul 29, 2021

jstrachan commented Jul 29, 2021

bvboca commented Jul 30, 2021

bvboca commented Jul 30, 2021

msvticket commented Aug 30, 2023

bvboca commented Jul 28, 2021 •

edited

Loading