Skip to content

Conversation

davissp14
Copy link
Contributor

@davissp14 davissp14 commented Jan 14, 2023

Few notable changes:

  1. Consolidates member state information under a single key.
{
  "members":[
    {"id":396610339,"hostname":"fdaa:0:2e26:a7b:bf88:6920:c8e0:2","region":"lax","primary":true},
    {"id":1949217720,"hostname":"fdaa:0:2e26:a7b:7d16:819f:c83e:2","region":"lax","primary":false},
    {"id":2119071075,"hostname":"fdaa:0:2e26:a7b:9adb:c779:c868:2","region":"ord","primary":false}
   ]
}
  1. Implemented CAS for state updates w/ automated retries to ensure state updates play nice across multiple nodes.

@davissp14 davissp14 changed the title WIP: Playing around with state consolidation Playing around with state consolidation Jan 15, 2023
Comment on lines +53 to +57
for _, members := range cluster.Members {
if members.ID == id {
return nil
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An interesting state that I haven't fully though about is the case where a member is registered, but for whatever reason the hostname, region, org primary fields might be different. I don't think this should be possible though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably safe to assume they will be different. If they aren't, something is broken with Machines.

Comment on lines +67 to +69
if errors.Is(err, ErrCAS) {
c.RegisterMember(id, hostname, region, primary)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possible infinite loop in the case of consul misbehaving? feels unlikely though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. We don't want to be spamming Consul if it's under load. I'll see about addressing this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this should be fairly safe. We will only retry in the event of a CAS error, which should be super rare on its own.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will only retry in the event of a CAS error, which should be super rare on its own.

ah cool yeah. I'm not familiar with consul failure modes so I wasn't sure if a degraded cluster can cause CAS errors or something. Seems like no.

@davissp14 davissp14 merged commit 234cd08 into master Jan 15, 2023
@davissp14 davissp14 deleted the cluster-state branch February 25, 2023 01:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants