Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using CCADB as the source of truth #37

Closed
cpu opened this issue Aug 8, 2023 · 6 comments · Fixed by #41
Closed

Using CCADB as the source of truth #37

cpu opened this issue Aug 8, 2023 · 6 comments · Fixed by #41

Comments

@cpu
Copy link
Member

cpu commented Aug 8, 2023

I think it makes sense to transition the webpki-roots generation process to use ccadb as its source of truth. There are a few advantages in my mind:

  1. it removes a middle-man, the mkcert.org service called out as potentially problematic as a dependency in Question: what assurance do you offer the Root CA store is 'trustworthy' #25.
  2. it will generalize beyond Mozilla's specific requirements. Other root programs are participating in CCADB.
  3. it offers richer metadata, e.g. about distrust after dates, applied name constraints, etc.

With this goal in mind I've been (slowly, on the side) creating a set of CCADB related tools over in cpu/ccadb-utils:

  • the ccadb-csv crate is a very thin abstraction around raw CSV reports available on CCADB. It does no parsing above destructuring CSV records and presents the content "as-is", in string form.
  • the ccadb-csv-fetch crate is a small binary for fetching the CSV reports to disk. It bakes in exactly one trust anchor, trying to minimize the trust surface required for acquiring metadata from CCADB.
  • the ccadb-webpki-roots crate converts the CCADB Mozilla included roots report into a .rs for use inside webpki-roots.
  • and lastly, not of use to this repo but there's a ccadb-crl-fetch crate for downloading disclosed CRLs, mostly as a test corpus.

Some open questions I'd like to discuss:

  1. Do folks agree we should switch to CCADB as the data source for this crate?
  2. What's the best way to integrate my work if folks agree to the above?
    1. We could use ccadb-utils as part of a build process in this repo.
    2. We could move the ccadb-utils repo into the Rustls organization and use it as part of the build process in this repo.
    3. We could lift something smaller out of the ccadb-utils repo into this repo.
    4. some other option.

I think I lightly prefer keeping the tooling in a separate repository (and have no strong feelings about whether it lives in the rustls org or not) and using it as a dev-dependency in this repo. If we go this route I would love reviews on the code in ccadb-utils - it was written solo and could probably use polish/love from more experienced rustaceans.

My thinking is that there's likely to be continued value in having a richer way to work with the CCADB and we can consolidate that work in a separate repo as opposed to having to maintain the webpki-roots specific parts here, and the rest elsewhere. As one example of a future use-case there's in-progress work in the TLS working group to use CCADB data to execute certificate compression (draft-jackson-tls-cert-abridge).

@cpu
Copy link
Member Author

cpu commented Aug 8, 2023

it offers richer metadata, e.g. about distrust after dates, applied name constraints, etc.

I think this will help with situations like #39 where we had stale unused name constraint constants, and a manually encoded constraint that was incorrect. I think it would be better to have code generate the name constraint DER from the CCADB report (though there is some work to do verifying the upstream format matches my expectations, particularly for how multiple permitted subtrees would be expressed).

Similarly we won't need to maintain the excluded CA list by hand. The CCADB report includes current trust status and distrust dates.

@ctz
Copy link
Member

ctz commented Aug 8, 2023

  • Do folks agree we should switch to CCADB as the data source for this crate?

+1 from me

What's the best way to integrate my work if folks agree to the above?

I think having ccadb-utils as a dev-dependency (similar to how we currently have reqwest etc.) works for me.

It makes sense for me that the -utils crate could live in the rustls org and be published on crates.io in the usual way?

@djc
Copy link
Member

djc commented Aug 8, 2023

Switching to CCADB seems like a good idea.

Personally I appreciate the sort of more compact approach of having a fairly self-contained codegen.rs that contains all the code (building on well-known dependencies -- whether that's reqwest or ureq) lacking abstraction layers that are unnecessary to doing the job that needs to be done here. Finding out which URL data is fetched from and which columns of the CSV file are actually being used as input is harder with multiple layers of crates with elaborate error handling compared to the needs we have in this integration test, so I would prefer something that more closely mimics the current setup compared to just relying on the crates you've written so far.

BTW, do the CCADB records somehow carry the imposed name constraints, or do we still have to bring those in manually?

@cpu
Copy link
Member Author

cpu commented Aug 8, 2023

Personally I appreciate the sort of more compact approach of having a fairly self-contained codegen.rs that contains all the code (building on well-known dependencies -- whether that's reqwest or ureq) lacking abstraction layers that are unnecessary to doing the job that needs to be done here

That's reasonable. I think it would be better from a code review perspective to build a purpose built generator in this repo vs trying to post-hoc review the more substantial amount of code that's spread across the ccadb-utils crates.

BTW, do the CCADB records somehow carry the imposed name constraints, or do we still have to bring those in manually?

The IncludedCACertificateReportPEMCSV report that I propose we use as our data source for the webpki roots does include a "Mozilla Applied Constraints" column.

Presently the only record that has a value in that column is the Government of Turkey, Kamu Sertifikasyon Merkezi (Kamu SM) root from #39. It's listed with the "Mozilla Applied Constraints" value of "*.tr".

Mechanically I've worked out how to convert that value to a DER encoded name constraint with a permitted subtree base dNSName of ".tr" but I have two open questions that should be brought to the CCADB operators before we rely on that process:

  1. How would multiple permitted subtrees be expressed? In other report formats where there was a need for plural fields (e.g. for CRL shards) they're listed as a JSON encoded array of strings inside the CSV column.
  2. Would there ever be excluded subtrees expressed? Would that be a different column?

@cpu
Copy link
Member Author

cpu commented Aug 8, 2023

I have two open questions that should be brought to the CCADB operators before we rely on that process:

https://groups.google.com/a/ccadb.org/g/public/c/TlDivISPVT4/m/jbWGuM4YAgAJ

@cpu
Copy link
Member Author

cpu commented Aug 9, 2023

Personally I appreciate the sort of more compact approach of having a fairly self-contained codegen.rs that contains all the code (building on well-known dependencies -- whether that's reqwest or ureq) lacking abstraction layers that are unnecessary to doing the job that needs to be done here

That's reasonable. I think it would be better from a code review perspective to build a purpose built generator in this repo vs trying to post-hoc review the more substantial amount of code that's spread across the ccadb-utils crates.

PTAL: #41

I have two open questions that should be brought to the CCADB operators before we rely on that process:

I went ahead and added the code I've written to use yasna to build the DER encoding of the name constraints from the stringified Mozilla Imposed Name Constraints field.

I think it's safe to use in the current state, the *.tr constraint is the only value present and I've confirmed the generator code produces the correct DER encoding we landed in #39.

If there are changes in the upstream CSV to add/update additional Mozilla Imposed Name Constraints the codegen integration test will begin to fail based on the produced diff and we can investigate whether the DER generation code needs to be adjusted. We'll always have a human in the loop for updates to the authoritative data.

@cpu cpu closed this as completed in #41 Aug 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants