Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add etcd status condition #8724

Merged
merged 1 commit into from
Nov 13, 2023
Merged

Conversation

vitorsavian
Copy link
Member

@vitorsavian vitorsavian commented Oct 25, 2023

Proposed Changes

Types of Changes

  • New Feature

Verification

You can run a two control-planes with etcd and use this command

kubectl get node {NODE_NAME} -o=jsonpath='Node Name: {.metadata.name}{"\n"}Conditions:{"\n"}{range .status.conditions[*]}- Type: {.type}{"\n"}  Status: {.status}{"\n"}  LastHeartbeatTime: {.lastHeartbeatTime}{"\n"}  LastTransitionTime: {.lastTransitionTime}{"\n"}  Reason: {.reason}{"\n"}  Message: {.message}{"\n\n"}{end}'

Testing

Linked Issues

User-Facing Change

Now the user can see the etcd status from each node in a simple way

Further Comments

pkg/etcd/etcd.go Outdated Show resolved Hide resolved
pkg/etcd/etcd.go Outdated Show resolved Hide resolved
pkg/etcd/etcd.go Outdated Show resolved Hide resolved
pkg/etcd/etcd.go Outdated Show resolved Hide resolved
pkg/etcd/etcd.go Outdated Show resolved Hide resolved
pkg/etcd/etcd.go Outdated Show resolved Hide resolved
pkg/etcd/etcd.go Outdated Show resolved Hide resolved
@vitorsavian vitorsavian force-pushed the etcd-status branch 2 times, most recently from 1e052d1 to 28aad7e Compare October 31, 2023 09:40
@codecov
Copy link

codecov bot commented Oct 31, 2023

Codecov Report

Attention: 28 lines in your changes are missing coverage. Please review.

Comparison is base (b8dc955) 47.03% compared to head (7c16229) 47.42%.
Report is 13 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8724      +/-   ##
==========================================
+ Coverage   47.03%   47.42%   +0.38%     
==========================================
  Files         148      145       -3     
  Lines       15658    15672      +14     
==========================================
+ Hits         7365     7432      +67     
+ Misses       7089     6981     -108     
- Partials     1204     1259      +55     
Flag Coverage Δ
e2etests 47.15% <55.76%> (?)
inttests 42.48% <52.45%> (-2.03%) ⬇️
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
pkg/etcd/etcd.go 45.32% <54.09%> (+4.26%) ⬆️

... and 65 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pkg/etcd/etcd.go Outdated Show resolved Hide resolved
pkg/etcd/etcd.go Outdated Show resolved Hide resolved
pkg/etcd/etcd.go Outdated Show resolved Hide resolved
pkg/etcd/etcd.go Outdated Show resolved Hide resolved
pkg/etcd/etcd.go Outdated Show resolved Hide resolved
pkg/etcd/etcd.go Outdated Show resolved Hide resolved
pkg/etcd/etcd.go Outdated Show resolved Hide resolved
pkg/etcd/etcd.go Outdated Show resolved Hide resolved
pkg/etcd/etcd.go Outdated Show resolved Hide resolved
pkg/etcd/etcd.go Outdated Show resolved Hide resolved
Signed-off-by: Vitor <vitor.savian@suse.com>
@martinhv
Copy link

martinhv commented Jan 6, 2024

@vitorsavian @brandond
The standard convention for node conditions is:

  • FALSE, if everything is "ok"
  • TRUE, if it is a problem.

The only exception is the READY condition of a node. If you take Memorypressure, diskpressure, pidpressue, networkunavailable as the standard conditions K8s supplies, you will see that each of them will report FALSE as the NON-FAULTY condition.

Please stick to this convention, it makes life down the line much easier.

If the "ok" / "not ok" classification doesn't apply for your etcd status condition (e.g. being a learner is per se neither bad nor good), then the NodeCondition is the wrong API field to use, but it should rather be reported via the etcd endpoint.

@brandond
Copy link
Contributor

brandond commented Jan 6, 2024

The standard convention for node conditions is:
FALSE, if everything is "ok"
TRUE, if it is a problem.

I don't agree. The convention is to use true=bad for error conditions, and true=good for readiness conditions.

This etcd status condition is intended to indicate the readiness of the etcd cluster member, so for us true=good. We are not planning on changing it at this point.

then the NodeCondition is the wrong API field to use, but it should rather be reported via the etcd endpoint.

The whole point of this is to remove the need to query etcd directly.

@martinhv
Copy link

martinhv commented Jan 9, 2024

Unfortunately K8s is itself inconsistent, thus we are at this situation now. They should have just called the condition Ready instead NotReady and then everything would be fine. Because the differentiation between an error condition and a readiness condition is a philosophical one. If NetworkUnavailable would be called NetworkAvailable, it would be a readiness condition (is the network ready) .

A etcd member to be a voting member is for me not a readiness condition of a node. The node is ready even if the etcd member status would be learner. The EtcdIsVoter falls under the same category as the pressure and the network condition. I am not getting the argument, why it should be classified as a readiness condition.

As an example, node problem detector I believe follows also the convention: true=bad

@brandond
Copy link
Contributor

brandond commented Jan 9, 2024

Absent a standard, its a matter of maintainer preference. We've chosen to do it this way, and are highly unlikely to invert the logic now that it's been working this way for several releases.

@vitorsavian vitorsavian deleted the etcd-status branch May 20, 2024 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants