Document process for creating a new CT log shard #589

haydentherapper · 2022-05-17T00:42:38Z

Description

haydentherapper · 2022-06-01T20:52:07Z

To rotate a log, the following needs to occur:

Spin up a new Trillian instance (log server and signer) and MySQL database
- The log should use the same signing key unless there was a compromise. If there was a compromise, the log's verification key must first be distributed to clients via TUF
- The log's prefix will be changed to the current year. Currently, the prefix is test. For the first sharding, it will either 2022
Verify the log's health
Update Fulcio's configuration to point to the new log
Roll out Fulcio. Fulcio will not dual write to both logs. One instance of Fulcio may write to a different log than another instance as it's rolling out, but this is not an issue.
Freeze the old log

This is a simpler process than Rekor, since we don't maintain a virtual index in front of all shards. Our tooling does not access the logs directly, it simply verifies SCTs on signing and verification. As long as the log signing key does not change, SCTs will continue to be verified without issue for all shards.

haydentherapper · 2022-06-01T20:56:55Z

Also should add a prober pinging ct/v1/get-sth for each log shard

haydentherapper · 2022-06-03T17:30:36Z

Chatted with @k4leung4 about the process for sharding a CT log. To summarize, we need to add support for creating an arbitrary number of CT log instances, where each will have its own Trillian tree and configmap.

One option is to create separate GCP instances for the database that backs Trillian. I opened up an issue to discuss separating Rekor and the CT log's infrastructure first. I'd be fine with having all of the CT logs' trees in a single DB, but I would prefer it be isolated from Rekor.

Assuming any Terraform changes are done outside of the scope of this work, we will focus on updating the Helm configurations. We will need to:

Add support to run an arbitrary number of CT log instances that share the same Trillian backend (Hopefully we can follow an example from the sharding work)
Handle ingress routing - Each log will have its own prefix, e.g ctfe.sigstore.dev/2022, ctfe.sigstore.dev/2023

For freezing the log, looks like we've already set up infrastructure to do this, which is documented in the sharding playbook and uses the updatetree job.

vaikas · 2022-08-11T10:52:05Z

I'd be happy to help with this effort if help is needed :)

Since this issue is under Fulcio, I'd like to clarify the discussion about having multiple CT Log instances and 'ingress routing'. I'm not clear if we're talking about adding support for a single Fulcio to be able to write to multiple CTLogs based on some criteria (hence the question about ingress routing). Bear with me while I get my understanding of what's left to do :)

Today the CTLog endpoint is a flag like:

--ct-log-url=http://ctlog.ctlog-system.svc/sigstorescaffolding

Question: Are we expecting (as part of this effort or in the future to be able to write to multiple CTLogs?). Just trying to make sure I understand if this requires changes to Fulcio or not.

For the 'ingress routing', is that different from the flag above? As in, any Fulcio instance has 1:1 to a CTLog, or again are there some changes required to Fulcio?

But, from the comment above: #589 (comment)

I think we are saying that "we" as in Sigstore needs to be able to handle operating / writing to multiple CT Log instances. If we have multiple Fulcio instances running at the same time, each of them would still be writing 1:1 to a CT Log instance. Is that correct?

haydentherapper · 2022-08-11T16:00:16Z

@vaikas That would be very appreciated if you would like to help! My knowledge of Helm is lacking :) Happy to sync with you to chat more about this and review any PRs.

Question: Are we expecting (as part of this effort or in the future to be able to write to multiple CTLogs?). Just trying to make sure I understand if this requires changes to Fulcio or not.

No, this is not in scope. Fulcio only needs to write to one CT log. Maybe we'd consider writing to an external one at a later point, but that should be a simple change, to just make ct-log-url repeated.

I think we are saying that "we" as in Sigstore needs to be able to handle operating / writing to multiple CT Log instances. If we have multiple Fulcio instances running at the same time, each of them would still be writing 1:1 to a CT Log instance. Is that correct?

This is correct. The purpose of this work is to be able to rotate in fresh shards so we don't indefinitely grow a single CT instance (which will have performance degradations over time). One instance of Fulcio writes to a single CT log instance at a point in time.

For the 'ingress routing', is that different from the flag above? As in, any Fulcio instance has 1:1 to a CTLog, or again are there some changes required to Fulcio?

When I talked about routing, I was referring to the public URL for accessing the CT log, ctfe.sigstore.dev/<id>, currently ctfe.sigstore.dev/test. Here's how I view it, lemme know if this sounds good:

There is a CT log, publicly accessible on ctfe.sigstore.dev/<id>, accessible within the cluster at http://ctlog.ctlog-system.svc/<id>
Fulcio is configured to make requests to http://ctlog.ctlog-system.svc/<id>
Each year, we will need to create a new CT log instance, accessible at ctfe.sigstore.dev/<other ID> and http://ctlog.ctlog-system.svc/<other ID> (we'll use the current year for the ID)
We will create the CT log instance manually, and it will be unused until we update the Fulcio configuration
Once the new CT log instance is up, we will update the Fulcio configuration to point to the new CT log (< other ID>).
We will not turn down old logs - This is critical, old logs must still be publicly accessible for monitors. (We'll decide the life of the log later, probably 5 years).
Once the Fulcio configuration has rolled out, the old log should be put into a read-only mode.

One other detail - In the same vein as https://github.com/sigstore/public-good-instance/issues/343, we should ideally use a separate database for each CT log instance so we don't have to indefinitely grow the same database.

vaikas · 2022-08-11T17:39:06Z

That all sounds great, thanks! That's how I roughly understood things, but got confused by some comment in some other bug, so just wanted to double-check :)

The one other thing (that's probably discussed elsewhere) is the "reverse" of this. When Fulcio Cert rotates, the new cert must be added to the trusted certs on the CT Log side. I looked quickly, but didn't see an issue for this, is there one for it somewhere?

read-only mode for the logs == 'freeze' of the trillian, or is there a knob for that in CTLog also?

vaikas · 2022-08-11T17:41:44Z

Re: separate database, if we do that, then we'll basically have 1:1 of
Fulcio - CTLog - Trillian - mysql

So all four get operated as a "single entity"?
If so, kneejerk response is that I think it makes things easier to operate.

haydentherapper · 2022-08-11T19:29:48Z

The one other thing (that's probably discussed elsewhere) is the "reverse" of this. When Fulcio Cert rotates, the new cert must be added to the trusted certs on the CT Log side. I looked quickly, but didn't see an issue for this, is there one for it somewhere?

That is a good question. Right now, the root is automatically fetched when createctconfig is run. https://github.com/haydentherapper/scaffolding/blob/079be7cd54dd47bb0df9ac1af3193f765986f3bc/cmd/ctlog/createctconfig/main.go#L106
Can it be rerun to fetch the latest root? If not, can you create an issue for this?

read-only mode for the logs == 'freeze' of the trillian, or is there a knob for that in CTLog also?

It should be the same configuration since CT is backed by Trillian.

So all four get operated as a "single entity"?

Yes, that would be the plan. Right now, the same Trillian (and mysql) instance operates Rekor and CT. There's an open conversation right now if we will take on the work of separating the two before GA, but I think we can if we set up the CT log sharding to use separate Trillian/mysql instances.

vaikas · 2022-08-12T03:45:47Z

Yeah, I remember that code :) That's part of the reason I was asking. In particular if we have the 1:1 stack that gets operated as a single entity, then we'll have a case where we might need to rotate a Fulcio key. Would that trigger a new Stack creation (new ctlog, trillian, etc.), or merely we upgrade the cert for Fulcio and roll it out. If we do that, then we need to add that new cert of Fulcio to CTLog roots PEM. Current code works great but assumes there's only one. I think if we rotate they and launch new instances then we have to add the new one so ctlog will accept from both old and new, and then eventually we'll need to remove the old one once the roll out completes, I think.
So, I think the question really is: If we need to rotate fulcio, will that create a new stack or not?
If not, we'll need to do some work in createctconfig as well as add a cleanup step after fulcio rollout completes.

haydentherapper · 2022-08-12T14:19:04Z

So, I think the question really is: If we need to rotate fulcio, will that create a new stack or not?

I would say no. I separate the two - Fulcio cert would be rotated due to expiration for example, which might happen mid year, or in an emergency due to compromise. The log sharding will happen yearly to keep size down (or in the event of a compromise of the CT log key).

I think we need to do the work you specified in the ticket. Maybe allowing for you to manually specify the root certificates in addition to fetching the certificate from Fulcio? Something like:

Create new Fulcio root
Manual job to append Fulcio root to trusted CT log roots
Change configuration for Fulcio, redeploy with new root
Manual job to re-sync Fulcio root to CT log (removing the old root)

Is that doable? I'm not familiar with scaffolding so there might be a better way.

haydentherapper · 2022-08-12T17:02:45Z

Something to mention, the root rotation will be very infrequent. Fulcio is configured with an intermediate certificate - that might change if we change the signing key for fulcio, but the intermediate doesn’t have to be distributed to the log. Still need a mechanism in place, but it’ll be used not very often.

haydentherapper · 2022-08-14T22:06:37Z

Haven’t dug into this much to see if it’s useful, but there is some configuration options for limiting when logs will accept entires https://github.com/google/certificate-transparency-go/blob/master/trillian/docs/Operation.md#temporal-sharding

vaikas · 2022-08-15T06:59:12Z

Yeah, I was looking at:
https://letsencrypt.org/2019/11/20/how-le-runs-ct-logs.html

Which had links to here:
https://www.venafi.com/blog/how-temporal-sharding-helps-ease-challenge-growing-log-scale
https://www.digicert.com/blog/scaling-certificate-transparency-logs-temporal-sharding

For some prior art as well.

vaikas · 2022-10-11T09:27:11Z

Sounds good to me, I'll tackle next week, getting late here🤣

…

On Fri, Aug 12, 2022, 17:19 Hayden B ***@***.***> wrote: So, I think the question really is: If we need to rotate fulcio, will that create a new stack or not? I would say no. I separate the two - Fulcio cert would be rotated due to expiration for example, which might happen mid year, or in an emergency due to compromise. The log sharding will happen yearly to keep size down (or in the event of a compromise of the CT log key). I think we need to do the work you specified in the ticket. Maybe allowing for you to manually specify the root certificates in addition to fetching the certificate from Fulcio? Something like: - Create new Fulcio root - Manual job to append Fulcio root to trusted CT log roots - Change configuration for Fulcio, redeploy with new root - Manual job to re-sync Fulcio root to CT log (removing the old root) Is that doable? — Reply to this email directly, view it on GitHub <#589 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACWB45G7AZ334OXIZSRMN4TVYZMOFANCNFSM5WDGGGZA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

haydentherapper added the enhancement New feature or request label May 17, 2022

haydentherapper mentioned this issue Jun 1, 2022

Log rotation #43

Closed

haydentherapper self-assigned this Jun 1, 2022

vaikas mentioned this issue Aug 12, 2022

Add ability to handle multiple Fulcio certs in the createctconfig. sigstore/scaffolding#292

Closed

haydentherapper assigned vaikas Aug 15, 2022

vaikas mentioned this issue Sep 2, 2022

Refactor the CTLog config code, add multi-fulcio support. sigstore/scaffolding#334

Merged

haydentherapper added the ga-blocker label Sep 2, 2022

haydentherapper mentioned this issue Sep 2, 2022

Fulcio 1.0 #766

Closed

2 tasks

haydentherapper closed this as completed Oct 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document process for creating a new CT log shard #589

Document process for creating a new CT log shard #589

haydentherapper commented May 17, 2022

haydentherapper commented Jun 1, 2022

haydentherapper commented Jun 1, 2022

haydentherapper commented Jun 3, 2022

vaikas commented Aug 11, 2022

haydentherapper commented Aug 11, 2022 •

edited

Loading

vaikas commented Aug 11, 2022

vaikas commented Aug 11, 2022

haydentherapper commented Aug 11, 2022

vaikas commented Aug 12, 2022

haydentherapper commented Aug 12, 2022 •

edited

Loading

haydentherapper commented Aug 12, 2022

haydentherapper commented Aug 14, 2022

vaikas commented Aug 15, 2022

vaikas commented Oct 11, 2022 via email

Document process for creating a new CT log shard #589

Document process for creating a new CT log shard #589

Comments

haydentherapper commented May 17, 2022

haydentherapper commented Jun 1, 2022

haydentherapper commented Jun 1, 2022

haydentherapper commented Jun 3, 2022

vaikas commented Aug 11, 2022

haydentherapper commented Aug 11, 2022 • edited Loading

vaikas commented Aug 11, 2022

vaikas commented Aug 11, 2022

haydentherapper commented Aug 11, 2022

vaikas commented Aug 12, 2022

haydentherapper commented Aug 12, 2022 • edited Loading

haydentherapper commented Aug 12, 2022

haydentherapper commented Aug 14, 2022

vaikas commented Aug 15, 2022

vaikas commented Oct 11, 2022 via email

haydentherapper commented Aug 11, 2022 •

edited

Loading

haydentherapper commented Aug 12, 2022 •

edited

Loading