Reconstructs bug versions from bugzilla history and stores them in ElasticSearch
JavaScript Shell
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

Bugzilla ETL

Notice: This ETL is no longer used - active development has moved to

A set of Pentaho DI jobs to extract bug versions from a bugzilla database and store them in an elasticsearch index. This ETL drives dashboards for BMO, for various teams at Mozilla Corporation.


  • an elasticsearch cluster where you can CRUD the index bugs
  • a working PDI (a.k.a kettle) installation (free community edition should work fine). Tested with PDI CE 4.3

Minimal instructions

  • Clone this project into a local directory
  • Configure the elasticsearch indexes (put a cluster node in place of localhost):

    • Optionally: clean out previous indexes:

      curl -XDELETE 'http://localhost:9200/bugs'

    • Initialize the elasticsearch mappings:

      curl -XPOST 'http://localhost:9200/bugs' --data @configuration/es/bug_version.json

  • Configure Pentaho DI:

    • add a directory .kettle in your $KETTLE_HOME
    • there, create a file
    • in that file, add settings for bugs_db_host, bugs_db_port, bugs_db_user, bugs_db_pass and bugs_db_name for your bugzilla-database connection.
    • add settings for ES_NODES, ES_CLUSTER, ES_INDEX
  • If necessary, modify bin/, then run it to import the full data set.
  • Later on, use bin/ to read incremental modifications from the MySQL database

Known issues

  • Some cases where a user's bugzilla ID changes mid-history for a bug can't be handled automatically, and should be added to configuration/kettle/bugzilla_aliases.txt. There are several alias-related scripts and transformations that help to detect these types of changes. See bin/, bin/, transformations/find_aliases.ktr, and transformations/detect_new_aliases.ktr.
  • Mozilla Bug 804946 causes some trouble with the ETL. See Bug 804961 for details.