Skip to content
WebPKI-level Certificate Revocation via Multi-Level Bloom Filter Cascade
Go Python Shell Dockerfile
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
benchmarking Investigation tooling Nov 12, 2019
containers Move CRLs to a persistent folder in the container, and leave processi… Nov 27, 2019
create_filter_cascade Fix #18 - Don't update CRL files unless the parse, and retry downloads Nov 25, 2019
docs Update README Jan 9, 2020
go Spam less about duplicate issuers Jan 9, 2020
hooks Rebase on new ct-mapreduce with ExpDate type Nov 16, 2019
kinto-publisher Updates for intermediate preloading to catch-up Dec 16, 2019
setup Update README Nov 30, 2018
system Make the rootprogram::issuers methods track all certs, not allow dupl… Dec 16, 2019
workflow Make the rootprogram::issuers methods track all certs, not allow dupl… Dec 16, 2019
.flake8 Apply more pre-commit hooks and reformat publisher Jan 31, 2019
.gitignore Add python-specific gitignore entries Nov 8, 2018
.pre-commit-config.yaml Rebase on new ct-mapreduce with ExpDate type Nov 16, 2019
.travis.yml Add a Travis file Nov 30, 2018
CODE_OF_CONDUCT.md Add license and Code of Conduct Jan 9, 2020
LICENSE Add license and Code of Conduct Jan 9, 2020
Pipfile add pipfiles Sep 10, 2019
README.md Update credits Jan 9, 2020
go.mod
requirements.txt Depend on filter-cascade v0.2.0 Nov 19, 2019

README.md

This collection of tools is designed to assemble a cascading bloom filter containing all TLS certificate revocations, as described in the CRLite paper.

These tools were built from scratch, using the original CRLite research code as a design reference and closely following the documentation in their paper, however it is a separate implementation, and should still be considered a work in progress, particularly the details of filter generation in create_filter_cascade.

For details about CRLite, Mozilla Security Engineering has a blog post series, and this repository has a FAQ.

Dependencies

  1. ct-fetch from ct-mapreduce
  2. Python 3
  3. Kubernetes / Docker
  4. Patience

General Structure

At this point, CRLite is intended to be run in a series of Docker containers, run as differing kinds of jobs:

  1. containers/crlite-fetch, a constantly-running task that downloads from Certificate Transparency logs into Redis and Google Firestore
  2. containers/crlite-generate, a periodic (cron) job that produces a CRLite filter from the data in Redis and uploads the artifacts into Google Cloud Storage
  3. containers/crlite-rebuild, an as-needed job that reads out all data in Google Firestore and writes the necessary metadata into Redis, for the generate task. This is intended for use when Redis has to be reinitialized (e.g., after a resize).

Each of these jobs has a pod.yaml intended for use in Kubernetes.

There are scripts in containers/ to build Docker images both using Google Cloud's builder and locally with Docker, see build-gcp.sh and build-local.sh. They make assumptions about the PROJECT_ID which will need to change, but PRs are welcome.

Storage

Storage consists of four parts:

  1. Google Firestore, for bulk certificate PEM data, bucketed by expiration date for easy deletion
  2. Redis, e.g. Google Cloud Memorystore, for certificate metadata (CRL DPs, serial numbers, expirations, issuers), used in filter generation.
  3. Google Cloud Storage, for storage of the artifacts when a job is completed.
  4. A local persistent disk, for persistent storage of downloaded CRLs. This is defined in containers/crl-storage-claim.yaml.

Information Flow

This tooling monitors Certificate Transparency logs and, upon secheduled execution, crlite-generate produces a new filter and uploads it to Cloud Storage.

Information flow

The process for producing a CRLite filter, is run by system/crlite-fullrun, which is described in block form in this diagram:

Process for building a CRLite Bloom filter

The output Bloom filter cascade is built by the Python mozilla/filter-cascade tool and then read in Firefox by the Rust mozilla/rust-cascade package.

For complete details of the filter construction see Section III.B of the CRLite paper.

Structure of the CRLite Bloom filter cascade

The keys used into the CRLite data structure consist of the SHA256 digest of the issuer's Subject Public Key Information field in DER-encoded form, followed by the the certificate's serial number, unmodified, in DER-encoded form.

Structure of Certificate Identifiers

Local Installation

It's possible to run the tools locally, though you will need local instances of Redis and Firestore. First, install the tools and their dependnecnies

go install -u github.com/jcjones/ct-mapreduce/cmd/ct-fetch
go install -u github.com/jcjones/ct-mapreduce/cmd/reprocess-known-certs
go install -u github.com/mozilla/crlite/go/cmd/aggregate-crls
go install -u github.com/mozilla/crlite/go/cmd/aggregate-known

pipenv install

Configuration

You can configure via a config file, or use environment variables.

To use a configuration file, ~/.ct-fetch.ini (or any file selected on the CLI using -config), construct it as so:

certPath = /ct
numThreads = 16
cacheSize = 128

Parameters

You'll want to set a collection of configuration parameters:

  • runForever [true/false]
  • logExpiredEntries [true/false]
  • numThreads 16
  • cacheSize [number of cache entries. An individual entry contains an issuer-day's worth of serial numbers, which could be as much as 64 MB of RAM, but is generally closer to 1 MB.]
  • outputRefreshMs [milliseconds]

The log list is all the logs you want to sync, comma separated, as URLs:

To get all current ones from certificate-transparency.org:

echo "logList = $(setup/list_all_active_ct_logs)" >> ~/.ct-fetch.ini

If running forever, set the delay on polling for new updates, per log. This will have some jitter added:

  • pollingDelay [minutes]

If not running forever, you can give limits or slice up CT log data:

  • limit [uint]
  • offset [uint]

Then choose either local storage or Firestore cloud storage by setting either

  • firestoreProjectId [project ID string]
  • certPath [path string]

If you set firestoreProjectId, then choose a firestore instance type:

  • GOOGLE_APPLICATION_CREDENTIALS [base64-encoded string of the service credentials JSON]
  • FIRESTORE_EMULATOR_HOST [host]:[port]

If you need to proxy the connection, perhaps via SSH, set the HTTPS_PROXY to something like socks5://localhost:32547/" as well.

General Operation

system/crlite-fullrun executes a complete "run", syncing with CT and producing a filter. It's configured using a series of environment variables. Generally, this is run from a Docker container.

That script ultimately runs the scripts in workflow/, in order. They can be run independently for fine control.

Starting the Local Dependencies

To run with Firestore locally, you'll need the gcloud Google Cloud utility's Firestore emulator. For docker, be sure to bind to an accessible address, not just localhost. Port 8403 is just a suggestion:

gcloud beta emulators firestore start --host-port="my_ip_address:8403"

Redis can be provided in a variety of ways, easiest is probably the Redis docker distribution. For whatever reason, I have the best luck remapping ports to make it run on 6379:

docker run -p 6379:7000 redis:4 --port 7000

Running from a Docker Container

To construct a container, see containers/README.md.

docker run --rm -it \
  -e "FIRESTORE_EMULATOR_HOST=my_ip_address:8403" \
  -e "outputRefreshMs=1000" \
  crlite:0.1

To use local disk, set the certPath to /ctdata and mount that volume in Docker. You should also mount the volume /processing to get the output files:

docker run --rm -it \
  -e "certPath=/ctdata" \
  -e "outputRefreshMs=1000" \
  --mount type=bind,src=/tmp/ctlite_data,dst=/ctdata \
  --mount type=bind,src=/tmp/crlite_results,dst=/processing \
  crlite:0.1

To run in a remote container, such as a Kubernetes pod, you'll need to make sure to set all the environment variables properly, and the container should otherwise work. See containers/crlite-config.properties.example for an example of the Kubernetes environment that can be imported using kubectl create configmap, see the containers README.md for details.

Tools

ct-fetch Downloads all CT entries' certificates to a Firestore instance and collects their metadata.

reprocess-known-certs Reprocesses all .pem files to update the .pem.meta and .pem.known files. Needed if there's suspected corruption from crashes of ct-fetch.

aggregate-crls Obtains all CRLs defined in all CT entries' certificates, verifies them, and collates their results into *issuer SKI base64*.revoked files.

aggregate-known Collates all CT entries' unexpired certificates into *issuer SKI base64*.known files.

Planning

If the certificate cohort is 500M, and Firestore costs $0.60 / 1M reads, then reprocess-known-certs is $300 to run.

Credits

You can’t perform that action at this time.