Skip to content
This repository has been archived by the owner on May 5, 2022. It is now read-only.

Latest commit

 

History

History
90 lines (62 loc) · 3.79 KB

processes.md

File metadata and controls

90 lines (62 loc) · 3.79 KB

Processes

Periodic and event-driven processes paths through components and persistent data stores.

Batch sets are used approximately once per week.

  1. Run the batch enqueue with the script openaddr-enqueue-sources. This will require a current Github access token and a connection to the machine database:

    openaddr-enqueue-sources -t <Github Token> -d <Database URL>
    
  2. Complete sources are read from Github’s API using the current master branch of the OpenAddresses repository.

  3. A new empty set is created in the sets table, and becomes visible at results.openaddresses.io/sets.

  4. New runs are slowly drip-fed into the tasks queue. New items are only enqueued when the queue length is zero, to prevent Worker auto-scale costs from ballooning.

  5. Worker processes runs from the queue, storing results in S3 and passing completed runs to the done queue.

  6. Completed run information is handled by Dequeuer.

  7. When all runs are finished, new coverage maps are rendered and openaddr-enqueue-sources exits successfully.

Continuous integration jobs are used each time an OpenAddresses contributor modifies the main repository with a pull request.

  1. A contributor issues a pull request.

  2. Github posts a blob of JSON data describing the edits to Webhook /hook endpoint.

  3. Webhook immediately attempts to create a new empty job in the jobs table and enqueues any new source runs found in the edits.

    If this step fails, an error status is posted back to the Github status API, and no job or run is created.

    If this step succeeds, a pending status is posted back to the Github status API, and the job becomes visible at results.openaddresses.io/jobs.

  4. Worker processes runs from the queue, storing results in S3 and passing completed runs to the done queue.

  5. Completed run information is handled by Dequeuer.

  6. When all runs are finished, a final success or failure status is posted back to the Github status API.

Collection

New Zip collections are generated every other night.

  1. Run the collection with the script openaddr-collect-extracts. This will require a connection to the machine database and S3 access credentials in environment variables:

    openaddr-collect-extracts -d <Database URL>
    
  2. Current data is read from the sets and runs tables, using the most-recent successful run for each source listed in the most recent set. This will include older successful runs for sources that have since failed.

  3. New Zip archives are created for geographic regions of the world.

  4. Zip archives are uploaded to S3 in predictable locations overwriting previous archives, and immediately available from results.openaddresses.io.