-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document process for creating a new CT log shard #589
Comments
To rotate a log, the following needs to occur:
This is a simpler process than Rekor, since we don't maintain a virtual index in front of all shards. Our tooling does not access the logs directly, it simply verifies SCTs on signing and verification. As long as the log signing key does not change, SCTs will continue to be verified without issue for all shards. |
Also should add a prober pinging |
Chatted with @k4leung4 about the process for sharding a CT log. To summarize, we need to add support for creating an arbitrary number of CT log instances, where each will have its own Trillian tree and configmap. One option is to create separate GCP instances for the database that backs Trillian. I opened up an issue to discuss separating Rekor and the CT log's infrastructure first. I'd be fine with having all of the CT logs' trees in a single DB, but I would prefer it be isolated from Rekor. Assuming any Terraform changes are done outside of the scope of this work, we will focus on updating the Helm configurations. We will need to:
For freezing the log, looks like we've already set up infrastructure to do this, which is documented in the sharding playbook and uses the |
I'd be happy to help with this effort if help is needed :) Since this issue is under Fulcio, I'd like to clarify the discussion about having multiple CT Log instances and 'ingress routing'. I'm not clear if we're talking about adding support for a single Fulcio to be able to write to multiple CTLogs based on some criteria (hence the question about ingress routing). Bear with me while I get my understanding of what's left to do :) Today the CTLog endpoint is a flag like:
Question: Are we expecting (as part of this effort or in the future to be able to write to multiple CTLogs?). Just trying to make sure I understand if this requires changes to Fulcio or not. For the 'ingress routing', is that different from the flag above? As in, any Fulcio instance has 1:1 to a CTLog, or again are there some changes required to Fulcio? But, from the comment above: #589 (comment) I think we are saying that "we" as in Sigstore needs to be able to handle operating / writing to multiple CT Log instances. If we have multiple Fulcio instances running at the same time, each of them would still be writing 1:1 to a CT Log instance. Is that correct? |
@vaikas That would be very appreciated if you would like to help! My knowledge of Helm is lacking :) Happy to sync with you to chat more about this and review any PRs.
No, this is not in scope. Fulcio only needs to write to one CT log. Maybe we'd consider writing to an external one at a later point, but that should be a simple change, to just make
This is correct. The purpose of this work is to be able to rotate in fresh shards so we don't indefinitely grow a single CT instance (which will have performance degradations over time). One instance of Fulcio writes to a single CT log instance at a point in time.
When I talked about routing, I was referring to the public URL for accessing the CT log,
One other detail - In the same vein as https://github.com/sigstore/public-good-instance/issues/343, we should ideally use a separate database for each CT log instance so we don't have to indefinitely grow the same database. |
That all sounds great, thanks! That's how I roughly understood things, but got confused by some comment in some other bug, so just wanted to double-check :) The one other thing (that's probably discussed elsewhere) is the "reverse" of this. When Fulcio Cert rotates, the new cert must be added to the trusted certs on the CT Log side. I looked quickly, but didn't see an issue for this, is there one for it somewhere? read-only mode for the logs == 'freeze' of the trillian, or is there a knob for that in CTLog also? |
Re: separate database, if we do that, then we'll basically have 1:1 of So all four get operated as a "single entity"? |
That is a good question. Right now, the root is automatically fetched when createctconfig is run. https://github.com/haydentherapper/scaffolding/blob/079be7cd54dd47bb0df9ac1af3193f765986f3bc/cmd/ctlog/createctconfig/main.go#L106
It should be the same configuration since CT is backed by Trillian.
Yes, that would be the plan. Right now, the same Trillian (and mysql) instance operates Rekor and CT. There's an open conversation right now if we will take on the work of separating the two before GA, but I think we can if we set up the CT log sharding to use separate Trillian/mysql instances. |
Yeah, I remember that code :) That's part of the reason I was asking. In particular if we have the 1:1 stack that gets operated as a single entity, then we'll have a case where we might need to rotate a Fulcio key. Would that trigger a new Stack creation (new ctlog, trillian, etc.), or merely we upgrade the cert for Fulcio and roll it out. If we do that, then we need to add that new cert of Fulcio to CTLog roots PEM. Current code works great but assumes there's only one. I think if we rotate they and launch new instances then we have to add the new one so ctlog will accept from both old and new, and then eventually we'll need to remove the old one once the roll out completes, I think. |
I would say no. I separate the two - Fulcio cert would be rotated due to expiration for example, which might happen mid year, or in an emergency due to compromise. The log sharding will happen yearly to keep size down (or in the event of a compromise of the CT log key). I think we need to do the work you specified in the ticket. Maybe allowing for you to manually specify the root certificates in addition to fetching the certificate from Fulcio? Something like:
Is that doable? I'm not familiar with scaffolding so there might be a better way. |
Something to mention, the root rotation will be very infrequent. Fulcio is configured with an intermediate certificate - that might change if we change the signing key for fulcio, but the intermediate doesn’t have to be distributed to the log. Still need a mechanism in place, but it’ll be used not very often. |
Haven’t dug into this much to see if it’s useful, but there is some configuration options for limiting when logs will accept entires https://github.com/google/certificate-transparency-go/blob/master/trillian/docs/Operation.md#temporal-sharding |
Yeah, I was looking at: Which had links to here: For some prior art as well. |
Sounds good to me, I'll tackle next week, getting late here🤣
…On Fri, Aug 12, 2022, 17:19 Hayden B ***@***.***> wrote:
So, I think the question really is: If we need to rotate fulcio, will that
create a new stack or not?
I would say no. I separate the two - Fulcio cert would be rotated due to
expiration for example, which might happen mid year, or in an emergency due
to compromise. The log sharding will happen yearly to keep size down (or in
the event of a compromise of the CT log key).
I think we need to do the work you specified in the ticket. Maybe allowing
for you to manually specify the root certificates in addition to fetching
the certificate from Fulcio? Something like:
- Create new Fulcio root
- Manual job to append Fulcio root to trusted CT log roots
- Change configuration for Fulcio, redeploy with new root
- Manual job to re-sync Fulcio root to CT log (removing the old root)
Is that doable?
—
Reply to this email directly, view it on GitHub
<#589 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACWB45G7AZ334OXIZSRMN4TVYZMOFANCNFSM5WDGGGZA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Description
The text was updated successfully, but these errors were encountered: