Skip to content
This repository has been archived by the owner on Nov 22, 2023. It is now read-only.

More robust network backend #30

Open
mcpherrinm opened this issue Jun 7, 2017 · 8 comments
Open

More robust network backend #30

mcpherrinm opened this issue Jun 7, 2017 · 8 comments
Assignees

Comments

@mcpherrinm
Copy link
Contributor

We'd like keysync to handle several server failure scenarios, so we need a more robust backend.

  • Retry support
    • During a sync, we should try to fetch the secret list multiple times if it fails
    • We should try to fetch each secret multiple times before moving on
  • Failover support
    • If there are too many consecutive failures talking to a server, we should try a second one
    • Probably an MX-record like weighted priority list.
  • backoff between failover/retries
  • Any one sync should occur against the same server
    • Avoids issues with lagging mysql replication and inconsistent server view-of-the-world
  • info on individual retries
  • warn on failover
  • error if all servers fail
@mcpherrinm mcpherrinm self-assigned this Jun 7, 2017
@mcpherrinm
Copy link
Contributor Author

A lot of the client is straight from keywhiz-fs and should be refactored to make this easier to implement too

@mcpherrinm
Copy link
Contributor Author

All the values here should be tweakable via configuration.

@mcpherrinm
Copy link
Contributor Author

We probably want a global ratelimit too -- avoid hammering the server too fast, eg if we're asked to sync repeatedly.

@madtrax
Copy link

madtrax commented Jun 14, 2017

Quick question, why going over a full synchronisation every time ? I feel like it would be much more efficient to trigger a sync based on the /secrets response.

Request the /secrets endpoint at the configured poll interval and only sync. secrets that needs to be synced. You could do that by going over all the secrets from the response payload and only sync secrets having creationDate or updateDate after the last successful sync. (which could be null) or compare their checksum.

Finally, do a 'cleaning' pass where it simply remove files that are present on the filesystem but not in the server response.

@mcpherrinm
Copy link
Contributor Author

What you describe is already implemented. I am using the term "sync" to refer to that behavior - fetch secrets as-needed based on the server response.

@mcpherrinm
Copy link
Contributor Author

In particular, the secretState struct, found here, https://github.com/square/keysync/blob/master/syncer.go#L41-L52 is where the data used to make that decision is stored

@madtrax
Copy link

madtrax commented Jun 14, 2017

Great, I noticed this behaviour from an old version I was playing with few weeks/months ago. I'll check the latest version.

On another note, I believe we should be able to configure the pollIntervalFailureThresholdMultiplier specially if your poll interval is very high.

Thanks!

@mcpherrinm
Copy link
Contributor Author

Yeah. I think we're going to want a handful of tunables here. I'll probably implement this next week and figure out what exactly those are.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants