This collection of tools is designed to assemble a cascading bloom filter containing all TLS certificate revocations, as described in the CRLite paper.
These tools were built from scratch, using the original CRLite research code as a design reference and closely following the documentation in their paper, however it is a separate implementation, and should still be considered a work in progress, particularly the details of filter generation in
For details about CRLite, Mozilla Security Engineering has a blog post series, and this repository has a FAQ.
- Python 3
- Kubernetes / Docker
At this point, CRLite is intended to be run in a series of Docker containers, run as differing kinds of jobs:
containers/crlite-fetch, a constantly-running task that downloads from Certificate Transparency logs into Redis and Google Firestore
containers/crlite-generate, a periodic (cron) job that produces a CRLite filter from the data in Redis and uploads the artifacts into Google Cloud Storage
containers/crlite-rebuild, an as-needed job that reads out all data in Google Firestore and writes the necessary metadata into Redis, for the generate task. This is intended for use when Redis has to be reinitialized (e.g., after a resize).
Each of these jobs has a
pod.yaml intended for use in Kubernetes.
There are scripts in
containers/ to build Docker images both using Google Cloud's builder and locally with Docker, see
build-local.sh. They make assumptions about the
PROJECT_ID which will need to change, but PRs are welcome.
Storage consists of four parts:
- Google Firestore, for bulk certificate PEM data, bucketed by expiration date for easy deletion
- Redis, e.g. Google Cloud Memorystore, for certificate metadata (CRL DPs, serial numbers, expirations, issuers), used in filter generation.
- Google Cloud Storage, for storage of the artifacts when a job is completed.
- A local persistent disk, for persistent storage of downloaded CRLs. This is defined in
This tooling monitors Certificate Transparency logs and, upon secheduled execution,
crlite-generate produces a new filter and uploads it to Cloud Storage.
The process for producing a CRLite filter, is run by
system/crlite-fullrun, which is described in block form in this diagram:
For complete details of the filter construction see Section III.B of the CRLite paper.
The keys used into the CRLite data structure consist of the SHA256 digest of the issuer's
Subject Public Key Information field in DER-encoded form, followed by the the certificate's serial number, unmodified, in DER-encoded form.
It's possible to run the tools locally, though you will need local instances of Redis and Firestore. First, install the tools and their dependnecnies
go install -u github.com/jcjones/ct-mapreduce/cmd/ct-fetch go install -u github.com/jcjones/ct-mapreduce/cmd/reprocess-known-certs go install -u github.com/mozilla/crlite/go/cmd/aggregate-crls go install -u github.com/mozilla/crlite/go/cmd/aggregate-known pipenv install
You can configure via a config file, or use environment variables.
To use a configuration file,
~/.ct-fetch.ini (or any file selected on the CLI using
-config), construct it as so:
certPath = /ct numThreads = 16 cacheSize = 128
You'll want to set a collection of configuration parameters:
cacheSize[number of cache entries. An individual entry contains an issuer-day's worth of serial numbers, which could be as much as 64 MB of RAM, but is generally closer to 1 MB.]
The log list is all the logs you want to sync, comma separated, as URLs:
To get all current ones from certificate-transparency.org:
echo "logList = $(setup/list_all_active_ct_logs)" >> ~/.ct-fetch.ini
If running forever, set the delay on polling for new updates, per log. This will have some jitter added:
If not running forever, you can give limits or slice up CT log data:
Then choose either local storage or Firestore cloud storage by setting either
firestoreProjectId[project ID string]
If you set
firestoreProjectId, then choose a firestore instance type:
GOOGLE_APPLICATION_CREDENTIALS[base64-encoded string of the service credentials JSON]
If you need to proxy the connection, perhaps via SSH, set the
HTTPS_PROXY to something like
socks5://localhost:32547/" as well.
system/crlite-fullrun executes a complete "run", syncing with CT and producing a filter. It's configured using a series of environment variables. Generally, this is run from a Docker container.
That script ultimately runs the scripts in
workflow/, in order. They can be run independently for fine control.
Starting the Local Dependencies
To run with Firestore locally, you'll need the
gcloud Google Cloud utility's Firestore emulator. For docker, be sure to bind to an accessible address, not just localhost. Port 8403 is just a suggestion:
gcloud beta emulators firestore start --host-port="my_ip_address:8403"
Redis can be provided in a variety of ways, easiest is probably the Redis docker distribution. For whatever reason, I have the best luck remapping ports to make it run on 6379:
docker run -p 6379:7000 redis:4 --port 7000
Running from a Docker Container
To construct a container, see
docker run --rm -it \ -e "FIRESTORE_EMULATOR_HOST=my_ip_address:8403" \ -e "outputRefreshMs=1000" \ crlite:0.1
To use local disk, set the
/ctdata and mount that volume in Docker. You should also mount the volume
/processing to get the output files:
docker run --rm -it \ -e "certPath=/ctdata" \ -e "outputRefreshMs=1000" \ --mount type=bind,src=/tmp/ctlite_data,dst=/ctdata \ --mount type=bind,src=/tmp/crlite_results,dst=/processing \ crlite:0.1
To run in a remote container, such as a Kubernetes pod, you'll need to make sure to set all the environment variables properly, and the container should otherwise work. See
containers/crlite-config.properties.example for an example of the Kubernetes environment that can be imported using
kubectl create configmap, see the
containers README.md for details.
Downloads all CT entries' certificates to a Firestore instance and collects their metadata.
.pem files to update the
.pem.known files. Needed if there's
suspected corruption from crashes of
Obtains all CRLs defined in all CT entries' certificates, verifies them, and collates their results
*issuer SKI base64*.revoked files.
Collates all CT entries' unexpired certificates into
*issuer SKI base64*.known files.
If the certificate cohort is 500M, and Firestore costs $0.60 / 1M reads, then
reprocess-known-certs is $300 to run.