Skip to content
This repository is currently being migrated. It's locked while the migration is in progress.

report if etcd peer is unreachable #20

Merged
merged 2 commits into from Jun 6, 2022
Merged

Conversation

Mojachieee
Copy link
Collaborator

Adds a new status field that indicates whether the etcd peer is healthy.

Also adds some logging + an event for if the peer is unhealthy

@Mojachieee Mojachieee requested a review from a team as a code owner May 31, 2022 11:25
@Mojachieee
Copy link
Collaborator Author

I'm going to change this so we also store last healthy time on the CR object

@Mojachieee Mojachieee marked this pull request as draft May 31, 2022 12:32
@Mojachieee Mojachieee marked this pull request as ready for review June 6, 2022 08:54
@Mojachieee
Copy link
Collaborator Author

I'm going to change this so we also store last healthy time on the CR object

I've done this in #22

func (r *EtcdPeerReconciler) Reconcile(ctx context.Context, req ctrl.Request) (_ ctrl.Result, reterr error) {
ctx, cancel := context.WithTimeout(ctx, 10*time.Second)
defer cancel()
log := r.Log.WithValues("peer", req.NamespacedName)
if r.lastHealthy == nil {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not gonna race ? The calls to Reconcile will always be serialised?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, fixed in #22 I'll merge it as one commit

szank
szank previously approved these changes Jun 6, 2022
healthy = false
r.Recorder.Event(&peer, "Warning", "MemberConnectionFailure", "Cannot connect to etcd member")
if r.lastHealthy[peer.Name].IsZero() {
log.V(2).Info("Peer has never been healthy")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the (2) level, don't think we use this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good point. #22 changes it so it's always logged

} else {
defer c.Close()
if version, err := c.GetVersion(ctx); err != nil {
log.Error(err, "Unable to get Etcd version", "error", err)
} else {
serverVersion = version.Server
}
healthy = true

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very very nitpick but in case the c.GetVersion takes a long time for some reason it would be better to have recorded the lastHealthy just before that call

Ricardo-Osorio
Ricardo-Osorio previously approved these changes Jun 6, 2022
@Mojachieee Mojachieee merged commit 8f7b8c9 into main Jun 6, 2022
@Mojachieee Mojachieee deleted the report-etcd-peer-health branch June 6, 2022 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants