api logs not very precise in regards to cert issues on endpoints #122639

KlavsKlavsen · 2024-01-08T08:02:24Z

What happened?

our kube-apiserver just started failing to start.. The log from containerd only says:
2023-12-16T07:35:01.426019865+01:00 stderr F }. Err: connection error: desc = "transport: authentication handshake failed: x509: certificate has expired or is not yet valid: current time 2023-12-16T06:35:01Z is after 2023-12-03T03:37:19Z"

we checked apiserver certs etc. -and they were all fine.
We finally checked etcd endpoint - and that had an old cert

/sig api-machinery

a repost - for only log issue - per @neolit123 suggestion kubernetes/kubeadm#2989 (comment)

What did you expect to happen?

a logmessage that told me which endpoint had the bad cert and details about cert.

How can we reproduce it (as minimally and precisely as possible)?

renew api certs and not etcd certs - and then move time forward so etcd certs have run out - and restart kubelet

Anything else we need to know?

No response

Kubernetes version

1.26.4

Cloud provider

hetzner.com - physical servers

OS version

ubuntu 22.04

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

leilajal · 2024-01-09T21:10:58Z

/help
/triage accepted

k8s-ci-robot · 2024-01-09T21:10:59Z

@leilajal:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

Why are we solving this issue?
To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
Does this issue have zero to low barrier of entry?
How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help
/triage accepted

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Affan-7 · 2024-01-15T11:01:16Z

I would love to work on this.
/assign

Affan-7 · 2024-01-15T11:30:14Z

@leilajal

I think I need to update the log message to be more specific. Like etcd certificate has expired or is not yet valid. Am I right?

KlavsKlavsen · 2024-01-15T11:55:25Z

I'd put details for which endpoint "dns name and/or ip" thats being accessed. specificly the IP is important too. It might be a DNS issue (if dns name has multiple ips f.ex. - it may be only one of them)

Affan-7 · 2024-01-18T07:48:09Z

Hi @neolit123

I am unable to find the code that is logging this error message. Can you please help me with that?

neolit123 · 2024-01-18T07:55:42Z

try asking in #sig-api-machinery on k8s slack.

holgerson97 · 2024-02-02T15:13:08Z

/assign

KlavsKlavsen added the kind/bug Categorizes issue or PR as related to a bug. label Jan 8, 2024

k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 8, 2024

k8s-ci-robot assigned Affan-7 Jan 15, 2024

Affan-7 removed their assignment Jan 27, 2024

k8s-ci-robot assigned holgerson97 Feb 2, 2024

holgerson97 mentioned this issue Feb 2, 2024

add: information about endpoint connection when auth handshake fails #123094

Closed

holgerson97 mentioned this issue Apr 18, 2024

add: information about endpoint connection when auth handshake fails grpc/grpc-go#7150

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

api logs not very precise in regards to cert issues on endpoints #122639

api logs not very precise in regards to cert issues on endpoints #122639

KlavsKlavsen commented Jan 8, 2024

leilajal commented Jan 9, 2024

k8s-ci-robot commented Jan 9, 2024

Affan-7 commented Jan 15, 2024

Affan-7 commented Jan 15, 2024

KlavsKlavsen commented Jan 15, 2024

Affan-7 commented Jan 18, 2024

neolit123 commented Jan 18, 2024

holgerson97 commented Feb 2, 2024

api logs not very precise in regards to cert issues on endpoints #122639

api logs not very precise in regards to cert issues on endpoints #122639

Comments

KlavsKlavsen commented Jan 8, 2024

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

leilajal commented Jan 9, 2024

k8s-ci-robot commented Jan 9, 2024

Guidelines

Affan-7 commented Jan 15, 2024

Affan-7 commented Jan 15, 2024

KlavsKlavsen commented Jan 15, 2024

Affan-7 commented Jan 18, 2024

neolit123 commented Jan 18, 2024

holgerson97 commented Feb 2, 2024