Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS renew #807

Merged
merged 10 commits into from
Jul 2, 2019
Merged

TLS renew #807

merged 10 commits into from
Jul 2, 2019

Conversation

Ulexus
Copy link
Contributor

@Ulexus Ulexus commented Jul 1, 2019

Implement TLS certificate renewal for OSD.

This is definitely a WIP and needs architectural review. As discussed, this is a temporary option until we can move to an ephemeral-by-default system.

Ticker in cert generation poll loop was never stopped.

Signed-off-by: Seán C McCord <ulexus@gmail.com>
- Implements part of #67
- Create option functions for gRPC TLS config
- Create CertificateProvider interface
- Create (legacy) userdata CertificateProvider
- Create cert renewing, file-caching CertificateProvider
- Create tls.NewConfigWithOpts() to specify custom gRPC TLS options
- Adapt existing tls.NewConfig to use options
- Update osd to use dynamic CertificateProvider
- Add a constant for location of Talos node certificate

Signed-off-by: Seán C McCord <ulexus@gmail.com>
Using TODO context for now.  Signal-bound root context?

Signed-off-by: Seán C McCord <ulexus@gmail.com>
Signed-off-by: Seán C McCord <ulexus@gmail.com>
@codecov
Copy link

codecov bot commented Jul 1, 2019

Codecov Report

Merging #807 into master will decrease coverage by 0.67%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #807      +/-   ##
==========================================
- Coverage   31.46%   30.79%   -0.68%     
==========================================
  Files          84       85       +1     
  Lines        4846     4946     +100     
==========================================
- Hits         1525     1523       -2     
- Misses       3182     3285     +103     
+ Partials      139      138       -1
Impacted Files Coverage Δ
internal/pkg/grpc/tls/cert.go 0% <0%> (ø)
internal/pkg/grpc/tls/tls.go 0% <0%> (ø) ⬆️
...pp/init/pkg/system/runner/containerd/containerd.go 78.21% <0%> (-1.99%) ⬇️
internal/pkg/chunker/file/file.go 65.21% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bfe3f24...8af82c9. Read the comment docs.

@andrewrynhard
Copy link
Member

Closes #67.

@Ulexus
Copy link
Contributor Author

Ulexus commented Jul 1, 2019

Well, the cert rotation definitely works. It's now rotating as fast as it possibly can, so every web request is seeing an updated cert. :(

The current TLS certs do not store the pre-parsed form in cert.Leaf;
thus, we have to parse the certificate first, to get the expiry.  This
also limits the maximum TLS certificate renewal period to a global
constant, presently set to one hour.

Signed-off-by: Seán C McCord <ulexus@gmail.com>
@Ulexus Ulexus marked this pull request as ready for review July 2, 2019 02:54
Copy link
Member

@andrewrynhard andrewrynhard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still digesting it a bit (baby in arms), but overall looking good.

"github.com/talos-systems/talos/pkg/userdata"
)

/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these commented out intentionally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... it was a draft thought, and I didn't want to get rid of the code entirely. I should probably just delete it.

return nil
}

/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing; old code that I wasn't quite ready to throw away.

if ok := certPool.AppendCertsFromPEM(data.CA.Crt); !ok {
return nil, fmt.Errorf("failed to append client certs")
// ConfigOptionFunc describes a configuration option function for the TLS config
type ConfigOptionFunc func(*tls.Config) error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a little silly, right now, since it returns a standard public object which can then be directly modified. However, I suspect that may change in the future, or that we may want to add more project-specific options.

Signed-off-by: Seán C McCord <ulexus@gmail.com>
Copy link
Member

@andrewrynhard andrewrynhard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Gives a great start to iterate on over the v0.2 cycle.

NodeCertFile = "/var/talos-node.crt"

// NodeCertRenewalInterval is the default interval at which Talos Node Certifications should be renewed
NodeCertRenewalInterval = time.Minute
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is defaulting to a minute intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. That's a bit aggressive!

if len(c.Certificate) > 0 {
cert, err := x509.ParseCertificate(c.Certificate[0])
if err == nil {
nextRenewal = time.Until(cert.NotAfter) / 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I think I understand. We default to a minute, but then transition to half the expiry date.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but even then, 1 minute is too aggressive. That was just for testing.

@andrewrynhard
Copy link
Member

andrewrynhard commented Jul 2, 2019

@Ulexus It looks like we have a couple place we need to set the NotAfter: https://github.com/talos-systems/talos/search?q=NotAfter&unscoped_q=NotAfter

(not osctl though)

@Ulexus
Copy link
Contributor Author

Ulexus commented Jul 2, 2019

Yeah, I wasn't planning to get into the certificate production (and hence expiration) on this run; all I'm doing is checking when it expires, and setting the renewal to be before then.

@andrewrynhard
Copy link
Member

Yeah, I wasn't planning to get into the certificate production (and hence expiration) on this run; all I'm doing is checking when it expires, and setting the renewal to be before then.

Works for me.

@andrewrynhard andrewrynhard merged commit 91d5e7e into siderolabs:master Jul 2, 2019
@Ulexus Ulexus deleted the tls-renew branch July 17, 2019 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants