Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomad agent reload TLS configuration on SIGHUP #3479

Merged
merged 24 commits into from Nov 15, 2017

Conversation

chelseakomlo
Copy link
Contributor

@chelseakomlo chelseakomlo commented Nov 1, 2017

This creates a KeyLoader object as part of an agent's configuration, to cache previously-loaded certificates and provide on-the-fly reloading.

This requires that a server already have been previously configured with TLS enabled.

@slackpad
Copy link
Contributor

slackpad commented Nov 1, 2017

Ooooh - Consul would ❤️ a port of this one once it lands!

"github.com/hashicorp/nomad/helper/flag-helpers"
"github.com/hashicorp/nomad/helper/gated-writer"
flaghelper "github.com/hashicorp/nomad/helper/flag-helpers"
gatedwriter "github.com/hashicorp/nomad/helper/gated-writer"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this named import necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are automatically added from https://godoc.org/golang.org/x/tools/cmd/goimports

@chelseakomlo chelseakomlo changed the title Nomad agent reload on TLS sighup Nomad agent reload TLS configuration on SIGHIP Nov 1, 2017
@chelseakomlo chelseakomlo changed the title Nomad agent reload TLS configuration on SIGHIP Nomad agent reload TLS configuration on SIGHUP Nov 1, 2017
func (k *KeyLoader) LoadKeyPair(certFile, keyFile string) (*tls.Certificate, error) {
cert, err := tls.LoadX509KeyPair(certFile, keyFile)
if err != nil {
return nil, fmt.Errorf("Failed to load cert/key pair: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't k.Certificate be set to nil and the load error captured in a variable so that GetOutGoingCertificate starts to return nil after that? The error case I am thinking of is when you load a cert successfully, use it for a while, then need to reload because it expired and the new cert fails to load. In that case, if I understand this code correctly, its going to return the old cert in GetOutgoingCertificate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. This also made me realize that we don't handle downgrading with this either (if that is something we want to handle), it is always assumed that the reload is either upgrading to TLS or changing certificates.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ya, we need to handle downgrades too.

return err
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this handle the reverse case too? Something like

if a.config.TLSConfig != nil && newConfig.TLSConfig == nil {
      //unload certs and disable tls
}

@chelseakomlo chelseakomlo changed the title Nomad agent reload TLS configuration on SIGHUP WIP: Nomad agent reload TLS configuration on SIGHUP Nov 1, 2017
@dadgar
Copy link
Contributor

dadgar commented Nov 2, 2017

--- FAIL: TestHTTP_VerifyHTTPSClient_AfterConfigReload (0.92s)
	assertions.go:239: 
                          
	Error Trace:	http_test.go:597
		
	Error:      	Not equal: 
		
	            	expected: ""
		
	            	actual: "Get https://127.0.0.1:12169/v1/agent/self: http: server gave HTTP response to HTTPS client"
	assertions.go:239: 
                          
	Error Trace:	http_test.go:598
		
	Error:      	Expected nil, but got: &url.Error{Op:"Get", URL:"https://127.0.0.1:12169/v1/agent/self", Err:(*errors.errorString)(0xc420a882f0)}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x10893f7]

goroutine 769 [running]:
testing.tRunner.func1(0xc420138a50)
	/home/travis/.gimme/versions/go1.9.linux.amd64/src/testing/testing.go:711 +0x2d2
panic(0x126e8a0, 0x1de06b0)
	/home/travis/.gimme/versions/go1.9.linux.amd64/src/runtime/panic.go:491 +0x283
github.com/hashicorp/nomad/command/agent.TestHTTP_VerifyHTTPSClient_AfterConfigReload(0xc420138a50)
	/home/travis/gopath/src/github.com/hashicorp/nomad/command/agent/http_test.go:600 +0x857
testing.tRunner(0xc420138a50, 0x14b08c8)
	/home/travis/.gimme/versions/go1.9.linux.amd64/src/testing/testing.go:746 +0xd0
created by testing.(*T).Run
	/home/travis/.gimme/versions/go1.9.linux.amd64/src/testing/testing.go:789 +0x2de

@@ -719,6 +719,16 @@ func (a *Agent) Stats() map[string]map[string]string {
return stats
}

func (a *Agent) Reload(newConfig *Config) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put a comment on this explaining how it is different than handleReload

@@ -656,6 +656,11 @@ func (c *Command) handleReload(config *Config) *Config {
}
}

err := c.agent.Reload(newConf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment that you are reloading agent object that can affect both client and server. It is probably worth doing this before the Server reload in case these may affect the new config on the server

func (a *Agent) Reload(newConfig *Config) error {
if a.config != nil && newConfig.TLSConfig != nil {
if err := a.config.SetTLSConfig(newConfig.TLSConfig); err != nil {
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if there is an error. Shouldn't this never back to the existing TLS config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm- I think we decided that if there is an error in loading a new TLS configuration, the existing (old) TLS config will continue to be used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree with that desired behavior but I am not sure that is what is happening. You have reset the key loader by the time there is an error: https://github.com/hashicorp/nomad/pull/3479/files#diff-fdee252bfeea98d87971d63faa2fd863R355

}

k.Certificate = &cert
return k.Certificate, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this have to be locked?

@chelseakomlo chelseakomlo changed the title WIP: Nomad agent reload TLS configuration on SIGHUP Nomad agent reload TLS configuration on SIGHUP Nov 2, 2017
return err

// If the agent is already running with TLS enabled, we need to only reload
// its certificates. In a later PR, we will introduce the ability to reload
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the in a later PR to:

// TODO(chelsea). so you can find it. Comments should be long lived

func (c *Config) SetTLSConfig(newConfig *config.TLSConfig) error {
c.TLSConfig = newConfig
c.TLSConfig.KeyLoader = &config.KeyLoader{}
if c.TLSConfig == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe SetTLSConfig isn't the right name since I would expect this to be fine

nomad/config.go Outdated
KeyLoader: &config.KeyLoader{},
},
ReplicationBackoff: 30 * time.Second,
SentinelGCInterval: 30 * time.Second,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidentally deleted these?

func (t *TLSConfig) GetKeyLoader() *KeyLoader {
// If the keyloader has not yet been initialized, do it here
if t.KeyLoader == nil {
t.KeyLoader = &KeyLoader{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cleans up so much! Probably should lock this too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this would have to be outside the if statement. It is ensuring that if you have two concurrent calls where the keyloader isn't made they both return the same object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, of course. Will fix that up.

Copy link
Contributor

@dadgar dadgar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lint failure:

command/agent/http_test.go:563:2:warning: ineffectual assignment to req (ineffassign)

Are tests failing because of changes in this PR?

Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just some comment style nitpicks and locking fun.

// Reload handles configuration changes for the agent. Provides a method that
// is easier to unit test, as this action is invoked via SIGHUP.
func (a *Agent) Reload(newConfig *Config) error {
if a.config != nil && newConfig.TLSConfig != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a.config won't be nil here.


keyLoader := c.TLSConfig.GetKeyLoader()
_, err := keyLoader.LoadKeyPair(newConfig.CertFile, newConfig.KeyFile)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra space

@@ -60,6 +62,9 @@ type Config struct {
// KeyFile is used to provide a TLS key that is used for serving TLS connections.
// Must be provided to serve TLS connections.
KeyFile string

// Utility to dynamically reload TLS configuration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go's style is to put the name of the var/func as the first word in the description so that searching for terms jumps to their comment first (and some tools even take advantage of this idiom!):

// KeyLoader dynamically reloads TLS configuration

or similar.

lru "github.com/hashicorp/golang-lru"
"github.com/hashicorp/nomad/command/agent/consul"
"github.com/hashicorp/nomad/helper/tlsutil"
"github.com/hashicorp/nomad/nomad/deploymentwatcher"
"github.com/hashicorp/nomad/nomad/state"
"github.com/hashicorp/nomad/nomad/structs"
"github.com/hashicorp/raft"
"github.com/hashicorp/raft-boltdb"
raftboltdb "github.com/hashicorp/raft-boltdb"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, I wonder why your goimports disagrees with our goimports? Are you using this? https://godoc.org/golang.org/x/tools/cmd/goimports

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using vim-go! Standardizing on an import format would be great.

return k.Certificate, nil
}

func (k *KeyLoader) GetOutgoingCertificate(*tls.ClientHelloInfo) (*tls.Certificate, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment to exported funcs

new.VerifyServerHostname = t.VerifyServerHostname
new.CAFile = t.CAFile
new.CertFile = t.CertFile
new.KeyLoader = t.KeyLoader
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to guard access to KeyLoader with configLock?

// TLSConfig provides TLS related configuration
type TLSConfig struct {
configLock sync.Mutex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this only guards KeyLoader, so perhaps rename it keyLoaderLock and put the field below keyloader (we try to put lock field declarations near what they lock).

@@ -38,9 +48,66 @@ type TLSConfig struct {
VerifyHTTPSClient bool `mapstructure:"verify_https_client"`
}

type KeyLoader struct {
cacheLock sync.Mutex
Certificate *tls.Certificate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not export this if it can only be accessed using a private lock.

func (k *KeyLoader) LoadKeyPair(certFile, keyFile string) (*tls.Certificate, error) {
// Allow downgrading
if certFile == "" && keyFile == "" {
k.Certificate = nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs locking

}

func (k *KeyLoader) GetOutgoingCertificate(*tls.ClientHelloInfo) (*tls.Certificate, error) {
return k.Certificate, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lock around certificate access

// current configuration and return an error.
// This only allows reloading the certificate and keyfile- other TLSConfig
// fields are ignored.
func (c *Config) UpdateTLSConfig(newConfig *config.TLSConfig) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this method need locking? If readers are accessing c.TLSConfig.CertFile concurrently with Reload()s then there's a race.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, added.

func (t *TLSConfig) Copy() *TLSConfig {

new := &TLSConfig{}
new.EnableHTTP = t.EnableHTTP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*new = *t is the pattern for getting all non-pointer values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also copy lock values, which is a warning in govet. If there is another way that is more idiomatic that doesn't copy lock values, would definitely prefer that.

new.CertFile = t.CertFile

t.keyloaderLock.Lock()
new.KeyLoader = t.KeyLoader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really want to do this? Or you want a copy of the object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'm not sure it matters that much, as it seems Merge is called for two configurations before Keyloader would be initialized, but doing a copy would probably be safer. I'll make that change.

@chelseakomlo chelseakomlo force-pushed the f-client-reload-tls-sighup branch 4 times, most recently from 121e169 to 1712d6a Compare November 14, 2017 16:46
@dadgar dadgar merged commit fa9fd44 into master Nov 15, 2017
@dadgar dadgar deleted the f-client-reload-tls-sighup branch November 15, 2017 01:53
hanshasselberg pushed a commit to hashicorp/consul that referenced this pull request Feb 15, 2019
Implementation is based heavily on similar functionality in Nomad.

hashicorp/nomad#3479
hanshasselberg pushed a commit to hashicorp/consul that referenced this pull request Feb 18, 2019
Implementation is based heavily on similar functionality in Nomad.

hashicorp/nomad#3479
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants