Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: change encryption keys #199

Merged
merged 65 commits into from Apr 18, 2014
Merged

feature: change encryption keys #199

merged 65 commits into from Apr 18, 2014

Conversation

@ryanuber
Copy link
Member

@ryanuber ryanuber commented Apr 11, 2014

For a week or so I've been chipping away at this feature little by little. It uses the new memberlist keyring feature to allow Serf to handle encryption key changes in a running cluster. I feel like it's feature-complete, and ready to start gathering some suggestions on the code itself.

There is more conversation around the feature in general in #194.

The bottom line on what it does can be seen with a few command examples. All of the operations are idempotent, so you can just keep running them without negative consequences if they fail.

Install a new key

$ serf key -install E4SoXyZSJkcg37n42U2i2Q==
==> Installing key on all members...
==> Successfully installed key!

Change the key used to encrypt messages:

$ serf key -use E4SoXyZSJkcg37n42U2i2Q==
==> Changing primary key on all members...
==> Successfully changed primary key!

List keys in use on the cluster:

$ serf key -list
==> Asking all members for installed keys...
==> Keys gathered, listing cluster keys...

E4SoXyZSJkcg37n42U2i2Q==
9kHC3w5eswavy1iU56YG8g==

Remove a key:

$ serf key -remove 9kHC3w5eswavy1iU56YG8g==
==> Removing key on all members...
==> Successfully removed key!

Error conditions during key operations:

$ serf key -remove E4SoXyZSJkcg37n42U2i2Q==
==> Removing key on all members...
failed:  db9    Removing the primary key is not allowed
failed:  web15  Removing the primary key is not allowed
failed:  db8    Removing the primary key is not allowed
failed:  web21  Removing the primary key is not allowed
failed:  lb31   Removing the primary key is not allowed
failed:  lb3    Removing the primary key is not allowed

Error removing key: 6/102 nodes reported failure

I know this pull request is pretty massive, so if splitting it into smaller chunks is preferred, I can try doing that. Just let me know!

@armon
Copy link
Member

@armon armon commented Apr 11, 2014

@ryanuber Wow! This is a huge PR! Awesome work! I will try to comb through this over the weekend.

With the -list mode, does it indicate if some nodes are missing certain keys? e.g. How does the operator verify all nodes have a given key before trying to -use that key?

@ryanuber
Copy link
Member Author

@ryanuber ryanuber commented Apr 11, 2014

@armon the -list command just gives you a list of all unique keys in use on the cluster. It doesn't tell you if they are in use on some but not others. You could get that confidence using the -install command though, since it will only return 0 if every node replies with success.

My thinking was that if the commands are idempotent, you would just do a -install until you succeeded. Similarly, you would want to do a -use until it succeeded before doing a -remove of the old key. So essentially, -install could also be looked at as a "verify" of sorts, "verify this key exists on all nodes".

@armon
Copy link
Member

@armon armon commented Apr 11, 2014

@ryanuber Got it. Would it be possible to add a very simple indicator to list, just something thats like [58/60] or something that indicates how many nodes have that key? This way an operator can quickly see if they need to run -install again.

@ryanuber
Copy link
Member Author

@ryanuber ryanuber commented Apr 11, 2014

@armon definitely! I'll take a stab at that over the weekend.

goto SEND
}

s.logger.Printf("[DEBUG] serf: Received install-key query")

This comment has been minimized.

@armon

armon Apr 14, 2014
Member

I'd make this INFO level

cmdFlags.Usage = func() { c.Ui.Output(c.Help()) }
cmdFlags.StringVar(&installKey, "install", "", "install a new key")
cmdFlags.StringVar(&useKey, "use", "", "change primary encryption key")
cmdFlags.StringVar(&removeKey, "remove", "", "remove a key")

This comment has been minimized.

@armon

armon Apr 14, 2014
Member

I'd add a check to make sure one and only one of -install -use or -remove is provided. Rather odd otherwise.

func (a *Agent) loadKeyringFile(keyringFile string) error {
// Avoid passing an encryption key and a keyring file at the same time
if len(a.agentConf.EncryptKey) > 0 {
return fmt.Errorf("Encryption key not allowed while using a keyring")

This comment has been minimized.

@armon

armon Apr 14, 2014
Member

Any reason to not just set the encrypt key as the primary key in the key ring? I imagine that generally -encrypt is provided with the primary key, and the -keyring-file is used only as a temp store while keys are being swapped

This comment has been minimized.

@ryanuber

ryanuber Apr 15, 2014
Author Member

We could definitely do this, but my thinking here was that providing a -encrypt might become brittle if the keys are updated at any point. As an example, if I have a script that starts serf with serf agent -keyring-file /tmp/ring.json -encrypt xxxxxxxxxx, then each time that Serf starts, the primary key will again be set to xxxxxxxxxx, even if a newer key had been set/broadcasted from a member at some other point and saved into the -keyring-file.

The -keyring-file in its current form is intended to work almost exactly like the -tags-file.

goto SEND
}

s.logger.Printf("[DEBUG] serf: Received use-key query")

This comment has been minimized.

@armon

armon Apr 14, 2014
Member

Also INFO level

goto SEND
}

s.logger.Printf("[DEBUG] serf: Received remove-key query")

This comment has been minimized.

@armon

armon Apr 14, 2014
Member

Lets make this INFO as well. Anything that is a modifying operation is good to have logged.

return nil, err
}

qParam := &QueryParam{}

This comment has been minimized.

@armon

armon Apr 14, 2014
Member

Lets use DefaultQueryParams() to guard against future changes.

// RemoveKey handles broadcasting a key to the cluster for removal. Each member
// will receive this event, and if they have the key in their keyring, remove
// it. If any errors are encountered, RemoveKey will collect and relay them.
func (k *keyManager) RemoveKey(key string) (*ModifyKeyResponse, error) {

This comment has been minimized.

@armon

armon Apr 14, 2014
Member

We need to abstract away the commonality in this file. There is a lot of code repetition, since all these methods are effectively doing the same thing.

@@ -1537,3 +1556,30 @@ func (s *Serf) Stats() map[string]string {
}
return stats
}

// WriteKeyringFile will serialize the current keyring and save it to a file.
func (s *Serf) WriteKeyringFile(keyring *memberlist.Keyring) error {

This comment has been minimized.

@armon

armon Apr 14, 2014
Member

No reason to make this public. Also we should be able to get the keyring from our state instead of an argument.

// is enabled, which can be possible in one of 2 cases:
// - Single encryption key passed at agent start (no persistence)
// - Keyring file provided at agent start
func (s *Serf) EncryptionEnabled() bool {

This comment has been minimized.

@armon

armon Apr 14, 2014
Member

Don't think we need this public either

This comment has been minimized.

@ryanuber

ryanuber Apr 15, 2014
Author Member

This is currently public because it is called at agent start in command/agent/command.go in order to display Encrypted: <true/false>.

@armon
Copy link
Member

@armon armon commented Apr 14, 2014

@ryanuber I've made a few notes, but overall this looks super solid! Very excited to have this in for the next release. Thanks for the hard work!

@ryanuber
Copy link
Member Author

@ryanuber ryanuber commented Apr 15, 2014

@armon Thanks for the review! I made some adjustments based on your recommendations and added a few response comments. Let me know what you think and we can adjust further from here.

keyring file helps persist changes to the encryption keyring, allowing the
agent to start and rejoin the cluster successfully later on, even if key
rotations had been initiated by other members in the cluster. NOTE: this
option is not compatible with the `-encrypt` option.

This comment has been minimized.

@armon

armon Apr 16, 2014
Member

We should probably add a section below that documents the format of this file. Just so people know how to setup the keys, that the first key is the primary, etc.

@@ -18,6 +18,10 @@ const (
messageQueryType
messageQueryResponseType
messageConflictResponseType
messageInstallKeyType

This comment has been minimized.

@armon

armon Apr 16, 2014
Member

These 3 don't seem to be used as far as I can tell?

This comment has been minimized.

@ryanuber

ryanuber Apr 16, 2014
Author Member

Ah, missed removing those... fixed in d756172.

}

qParam := k.serf.DefaultQueryParams()
queryResp, err := k.serf.Query(qName, rawKey, qParam)

This comment has been minimized.

@armon

armon Apr 16, 2014
Member

I hesitate to use the rawKey directly as the payload. We should instead use an encoded struct. The reason being if we need to add new fields later it will be much easier if there is already a struct that is using msgpack since it is flexible to new fields. Changing from a raw key to a struct will be much harder.

// them into a KeyResponse. It will update a KeyResponse *in place* and
// therefore has nothing to return.
func (k *keyManager) streamKeyResp(resp *KeyResponse, ch <-chan NodeResponse) {
for r := range ch {

This comment has been minimized.

@armon

armon Apr 16, 2014
Member

I think we also have a change to fast-path an early exit. The default query timeout is VERY conservative. We should pass in the number of nodes, and if we get that many responses, just exit. Otherwise we will wait like 10x longer than needed for the 95% case.

This comment has been minimized.

@ryanuber

ryanuber Apr 16, 2014
Author Member

Awesome idea! Added in db060d5 and f28567f.

}
defer client.Close()

if listKeys {

This comment has been minimized.

@armon

armon Apr 16, 2014
Member

We should probably do a check that only one of -list -install -use or -remove is specified. Really strange if they do something like "-list -install XXX"

This comment has been minimized.

@ryanuber

ryanuber Apr 16, 2014
Author Member

Agreed, added in 17e8fb1

@armon
Copy link
Member

@armon armon commented Apr 16, 2014

@ryanuber This is looking really good. I think it's very close. I think we are also missing some tests for the RPC layer, but I can always add those after a merge if you want.

ryanuber added 25 commits Apr 11, 2014
@ryanuber
Copy link
Member Author

@ryanuber ryanuber commented Apr 18, 2014

@armon I took a swing at the RPC tests and added some command tests. If there is anything you think we are missing just let me know!

@armon
Copy link
Member

@armon armon commented Apr 18, 2014

@ryanuber Looks awesome! Thanks so much for all the hard work!

armon added a commit that referenced this pull request Apr 18, 2014
feature: change encryption keys
@armon armon merged commit 069de11 into hashicorp:master Apr 18, 2014
1 check passed
1 check passed
continuous-integration/travis-ci The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.