-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Pluggable TLS Storage #913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Note, all comments between my initial description at the top and this comment were based on an initial review of the first commit. |
Nice, I'll look at this in more scrutiny later! Thanks for the PR. Really looking good from what I saw earlier.
Right. That's what Regarding InMemoryStorage:
Ah, gotcha. Yeah, let's remove it if it's not used; feel free to keep what you need for tests in the test file. :) Is this level of abstraction sufficient for your needs? Do you foresee maybe even a general-purpose plugin that we could put on the download page for others to use, thanks to this change? 😄 |
Once you peek at the code you'll see where I just made the in memory store as a tool for other testers in the Also, I think this level of abstraction is perfect. I think once I (or someone else) develops a good shared storage plugin (i.e. one that has its own directives and what not) then it definitely would be worth including in the download. If only Go had better shared lib support it wouldn't have to be bundled. |
|
||
// InMemoryStorageCreator is a caddytls.Storage.StorageCreator to create | ||
// InMemoryStorage instances for testing. | ||
func InMemoryStorageCreator(caURL *url.URL) (caddytls.Storage, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type is only used for testing; can we move it into a test file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured this was used for testing like those items in iotest
and httptest
are only used for testing. This can be used by other packages for their testing (same with the test harness), ref this SO answer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, okay. I get you now.
@mholt Before I dig too deep in the code, can you help me understand cache invalidation and auto renewal? Specifically, it seems like there is a Specifically, take the following scenario. Say I am using a multi-server setup (i.e. server A and server B) w/ shared storage to some DB backend or something. When server A and server B both feel the need to renew, what kind of race conditions persist here I wonder. Are overwrites and multiple challenges acceptable to the CA? Should I add some logic in Now I'm thinking that I may need two more abstractions in the storage interface: |
Correct; that doesn't synchronize, just loads from storage into memory.
Each Caddy operates independently, renewing the certificates it is managing separately from other Caddy instances. Caddy would have to be programmed to coordinate with a fleet in order to synchronize renewals and other changes. (I'm working on a tool to do this, independently of Caddy, which will one day plug into Caddy. I do not believe this kind of logic belongs in the caddytls package, as it is highly diverse and complicated on its own...) Edit: If what you need in caddytls is really as simple as adding some sort of mutex/locking to the interface, then that shouldn't be too burdensome and might be more reasonable indeed. |
So, my whole idea of using shared storage to solve HA-multi-server HTTP challenges fails under race condition here (technically keys and certs stored at the same time could get out of order)? Would you mind if I added those two functions? I don't want Caddy to do the coordination either (but I do want my plugin to), but I do need to solve storage race conditions. Otherwise, do you foresee an issue of two servers issuing renewals at the same time? I am open to any solution here, but my whole purpose for pluggable storage was to allow multiple servers to share certs (and the renewal of them). I suppose one solution in my storage provider is to do leader election w/ consul and if it's not the leader, just refuse to store the new cert (without error). Seems hacky. Edit: another option is I could do the consul semaphone on Store* and only unlock it on the last of the 3 calls in Edit: another option is to have a channel to invalidate the in memory cert cache. Still not enough to prevent race conditions, but at least it can update if the storage provider wants it to (e.g. a consul watch) |
Yep, right. Across multiple machines, each Caddy would do their renewals independently which would run into LE rate limits faster. So yeah, you'll need a way to sync their access (including checking their expiration date when scanning for renewals - making sure to update the in-memory cached cert inside the lock).
Go for it.
Good clarification. That's fine. |
Welp, I may have been a bit too eager with this PR...I'll have some changes incoming at some point. |
I probably should have mentioned it, I just hadn't thought much of Caddy sharing its certificates with other processes on different machines. Anyway, looking forward to the revisions whenever they show up. :) |
* Change storage interface to persist all site or user data in one call * Add lock/unlock calls for renewal and cert obtaining
Ok, I did my best to try to alleviate my problems. I reduced site/user loading/storing to one call each. I also added a locking mechanism at strategic parts of the code to prevent duplicate renewals. I hope this hasn't introduced too much complexity. If it has, we can take this PR off the table. I fear the locking parts I added to the somewhat-confusing code paths already taken for obtain/renew cert only make the code look a bit more confusing. I took care to remain compatible with how it works today. In my impl, I'll also have to watch for changes and call Review appreciated and if the benefit of including this layer doesn't outweigh the cost of complexity added, we can kill this PR and I will find other approaches. |
// emailUsername returns the username portion of an email address (part before | ||
// '@') or the original input if it can't find the "@" symbol. | ||
func emailUsername(email string) string { | ||
at := strings.Index(email, "@") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will be cleaner using strings.SplitN(email, "@", 2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While that might be true, this is taken verbatim from https://github.com/mholt/caddy/blob/fdb6d64f9d2afdeb6f843862da7dd243abb2399b/caddytls/storage.go#L119 and I wasn't wanting to change any of that logic during this refactoring. I think to clean up that should be a separate effort.
FYI, take my comments as suggestions. This is looking like it's going in the right direction. 😄 |
Gladly. Thank you for looking! :-) |
I want this to go in beta 3. @cretz Is this a change you are willing to maintain (at least in the foreseeable future) if problems or questions arise? I'll make you a collaborator so you have push rights when I merge this PR. |
Sure, I can maintain for the near future, though I can't promise I'll be able to for a really long time (though I hope it's clear enough for anyone to understand/contribute). I also plan on providing a consul plugin right away. I may request help w/ integration testing the multi-server-HTTP-challenge setup w/ the CA. |
Thanks @cretz for all your work! Looking forward to what this makes possible. |
// Note, the renewal inside here may not actually occur and no error will be returned | ||
// due to renewal lock (i.e. because a renewal is already happening). This lack of | ||
// error is by intention to force cache invalidation as though it has renewed. | ||
err := cert.Config.RenewCert(allowPrompts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cretz Just a thought: what if another renewal process updates the certificate between the beginning of this for loop (above on line 67) and actually calling RenewCert? Even though we have a read lock on the certCache, something else could have renewed it by now, and finished, which would cause this certificate to be renewed twice.
Could you double-check my thinking here and see if that's a plausible 'race' condition? I wonder if the lock needs to be put over this whole function. (It's still early here; I could be wrong...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this. It appears RenewCert re-obtains the site data from storage but yeah, it doesn't check if it still needs to be renewed. So yes, in theory it could double renew here. With how the RenewCertificate function is set up, I am not sure of the best place to place the lock. Open to suggestions. If I place the lock in the for loop (remember it is a domain specific lock), you still suffer the problem where it might need to reload from cert cache. Instead, maybe we just do a "continue" on renewal lock? Or just make sure the re-fetched cert is the one we are expecting to renew inside of RenewCert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just looking at some options... Can you move the logic that checks the expiration date into that method, where the expiration check happens inside the lock, maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I'm thinking. I'll send across another PR at some point.
return err | ||
} else if !lockObtained { | ||
log.Printf("[INFO] Certificate for %v is already being renewed elsewhere", name) | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cretz I've been looking at this for a new class of Caddy plugins for TLS storage. In this branch, some other instance is already renewing the certificate, but here we return nil instead of waiting until the renew finishes and then loading it below and continuing along. When does the renewed certificate get loaded? What if the lock was blocking instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mholt - I simply added this for good measure (like if a plugin used it or something) but best I can see, this method is never called anyways. See Config::obtainCertName
and Config::renewCertName
for where renewal actually appears to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thanks @cretz. Was there a specific reason that LockRegister() was not put into ACMEClient.Obtain()
but was put into ACMEClient.Renew()
? (Note: I'm not criticizing, just understanding since I think we're going to make some changes.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mholt - Tbh, I don't remember exactly why. If I had to guess, it is because I wanted to lock the storage more than the renewal/registration process. Obtain
only uses a single (atomic) StoreSite
whereas Renew
does a LoadSite
and then a StoreSite
later and I need to make that combination atomic.
This addresses issue #906. I may have gone a bit overboard w/ the test harness :-). Quick notes:
storagetest
) in the same vein asiotest
orhttptest
since I knew I would end up creating one anyways when I get around to writing my own implstoragetest
package for use by others during testing if desiredhttpserver.GetConfig(c).TLS.StorageCreator
(or course w/TLS
field null check) in a similar way to how middleware is added. This allows plugins to register themselves as normal, create directives as normal, etc.As a reminder the whole point of wanting to do this, at least for me, is so I can finally have multiple load balanced end-to-end encrypted web servers that respond to HTTP challenges. Basically we'll be able to update the auto-https faq at https://github.com/caddyserver/caddyserver.com/blob/f27a911a9276cdb6d15ce762504b22db8d082721/public/docs/automatic-https.html#L71 to include the fact that if you use an alternative shared storage plugin, your stuff will work.
Another thing I want to ask: I didn't dig too deeply, but I trust that Caddy obviously caches the cert loaded from storage. So remote storage wouldn't constantly be hit, right?