Telemetry Ingestion on Google Cloud Platform


A monorepo for documentation and implementation of the Mozilla telemetry ingestion system deployed to Google Cloud Platform (GCP).

There are currently four components:

  • ingestion-edge: a simple Python service for accepting HTTP messages and delivering to Google Cloud Pub/Sub
  • ingestion-beam: a Java module defining Apache Beam jobs for streaming and batch transformations of ingested messages
  • ingestion-sink: a Java application that runs in Kubernetes, reading input from Google Cloud Pub/Sub and emitting records to outputs like GCS or BigQuery
  • ingestion-core: a Java module for code shared between ingestion-beam and ingestion-sink

For more information, see the documentation.

Java 11 support is a work in progress for the Beam Java SDK, so this project requires Java 8. Maven has been configured to compile for Java 8 when using newer versions of the JDK, but support is only guaranteed for JDK 8. To manage multiple local JDKs, consider jenv and the jenv enable-plugin maven command.