Skip to content
J.C. Jones edited this page Jan 9, 2020 · 7 revisions

See the Technology Overview for details on the tools


How much does CRLite compress data?

CRLite promises substantial compression of the dataset; the binary form of all unexpired certificate serial numbers comprises about 16 GB of memory in Redis; the hexadecimal form of all enrolled and unexpired certificate serial numbers comprises about 6.7 GB on disk, while the resulting binary Bloom filter compresses to approximately 1.3 MB.

TODO: get a binary form of the 6.7 GB number

Why is CRLite able to compress so much data?

Bloom filters are proabilistic data structures with an error rate due to data collisions. However, if you know the whole range of data that might be tested against the filter, you can compute all the false positives and build another layer to resolve those. Then you keep going until there are no more false positives. In practice, this happens in 7 to 10 layers, which results in substantial compression.

Bloom filters have a false-positive rate; how can CRLite be relied upon?

The key innovation for CRLite is that Certificate Transparency (CT) data can be used as a stand-in for "all the certificates in the Web PKI". It's reasonably easy to tell if a certificate is in Certificate Transparency: Was it delivered with a Signed Certificate Timestamp (SCT) from a CT log? Similarly, it's reasonably easy to tell that a certificate was known to a CT log at the time that the CRLite filter was constructed: Was the SCT at least one Maximum Merge Delay older than the CRLite filter?

The remaining issues are whether the Issuer is included/enrolled in the CRLite filter set, which is provided as a flag along with the Firefox Intermediate Preloading data.

Why doesn't CRLite use SHA2/SH3/etc?

We’re using MurmurHash3 because it’s fast and there’s no currently-known need for a cryptographically secure hash function. Even though Murmur is not designed to be cryptographically secure, as a mitigating factor the input data for Murmur includes a SHA256 hash to provide pseudo-random information into its calculation.

There are few hashes needed for Firefox clients to check CRLite (one per level), so if in the future we need to move to a more secure hash function, the majority of the additional complexity will happen at the infrastructure-side, which can more easily scale up.

The obvious threat model against the input data involves manipulating hashes through manipulation of certificate serial numbers -- which have certain requirements on them by the CABForum Baseline Requirements, making them difficult as a vector of attack. Nevertheless, this is an area of active research.

How large are the delta updates for CRLite?

Currently we're using bsdiff4 to produce deltas, but the output filter's layer count jumps around a bit and makes the deltas unreasonably large. The data format will have to change somewhat to avoid having to delete and re-create layers unnecessarily, which is among the next major avenues of improvement.

How do you pick what CAs are included in CRLite?

While we initially thought we would hand-pick some issuing CAs, at this point all CAs that have fresh Certificate Revocation Lists (CRLs) encoded into their issued certificates get included into CRLite. Freshness meaning that the CRLs' signatures are valid and that they aren't passed their NextUpdate time. Analysis why issuers become unenrolled in CRLite is still active, but the usual culprit in the logs is that the next CRL simply can't be downloaded by the CRLite aggregate-crls tooling, which has limited retry and resume functionality. As that gets more robust -- and as issuing CA's CRL hosting infrastructures get more robust -- we expect the churn to reduce substantially.

What happens if a certificate is too new?

Firefox will use OCSP (stapled or actively queried) if the certificate's Signed Certificate Timestamps are too new for the current filter.

What happens if an issuer is unknown?

CRLite won't be used. If the issuer is truly unknown, Firefox will give an unknown issuer warning like always, nothing there will change. If the issuer is not in the Mozilla Root Program, then it won't be eligibile for CRLite.

How can you know if a given issuer has its data in CRLite?

CRLite will only run on issuers that are annotated as enrolled in CRLite in Firefox's Intermediate Preloading data. The list can be examined directly using your favorite JSON tooling at this URL:

For details on downloading the attached data file, see the Kinto Attachment plugin for Kinto, used by Firefox Remote Settings.

What happens if CRLite says a certificate is revoked but OCSP says it's valid?

In the short term, we're interested in gathering telemetry on these cases, though no such telemetry is currently defined. That said, at Internet-scale, this is likely a common occurrence: Certificate Authorities generally have lag in updating revocation information, and there's no requirement that CRLs and OCSP update together.

If CRLite proves robust enough, in this scenario we would expect that the CRLite revocation would take precedence, and OCSP would never be checked.

Where can I get CRLite data that Firefox uses?

The CRLite filters are published manually at Firefox Remote Settings. You can examine the data using JSON tooling at this URL:

For details on downloading the attached data file, see the Kinto Attachment plugin for Kinto, used by Firefox Remote Settings. But using jq and httpie, one can chain commands together to obtain the current filter by:

base_url=$(http | jq -r '.capabilities.attachments.base_url')
path=$(http | jq -r  '.data[0].attachment.location')
http --download --output filter.mlbf ${base_url}${path}

Where can I get the CRLite data that is used to make filters?

The staging data is hosted in Google Cloud Storage in a bucket named crlite_filters. The web interface for the files is accessible publicly here, though it requires a Google login.

The Google gsutil tool is handy for downloading entire datasets (~7 GB each). These commands would download all the files:

mkdir crlite-dataset/
gsutil -m cp -r gs://crlite_filters/20200101-0 crlite-dataset/

The known folder contains JSON files named by the enrolled issuing CA of all their unexpired DER-encoded serial numbers. The revoked folder has files of the same issuing CA format, but contains DER-encoded serial numbers of the revoked certificates. The serials in revoked are not guarnateed to be a subset of known, as many are likely expired, so set math is required to get known revoked from the directories.

The mlbf folder contains the filter and its metadata as-generated.

The log folder contain all the logs for the runs. As of this writing, many errors and warnings are still emitted that require bugfixing in one fashion or other. There are also many pointers to potential CRL problems with CAs, though few are compliance issues, and at least some are known to be innocent problems.

How can I produce my own CRLite filter?

You'll need the crlite repository downloaded locally, and to install the requirements.txt Python packages.

With a full dataset at hand from the above gsutil command:

python3 ~/git/crlite/create_filter_cascade/ -knownPath ./20200101-0/known/ -revokedPath ./20200101-0/revoked/ my_filter_identifier

With sufficient memory, you'll get the output filter; it should be deterministic.

How can I query my CRLite filter?

To be written, but the project is what's used by Firefox.

How can I run the CRLite backend infrastructure myself?

See the main

Why don't you also scrape OCSP?

It's extremely inefficient, having to do so many OCSP queries. While the original paper's implementation did it, and so did casebenton/certificate-revocation-analysis (our initial proof-of-principal), downloading CRLs scales much better. If CRLite gains traction, OCSP bandwidth savings and speedups may prove to be reasons for CAs to issue CRLs.

Clone this wiki locally
You can’t perform that action at this time.