Skip to content
Matt Gaunt edited this page Jan 7, 2022 · 1 revision

Steps

  1. Crawler.rb queries the GitHub API and writes the events to a file locally on the VM
    1. These files are per hour
  2. At 5mins past every hour, a cron job compresses the last file and uploads it to cloud storage
  3. At 8mins past every hour, ruby upload.rb is run to upload the compressed file to bigquery

Yearly Tables

The yearly tables are created by a cron job at 5am on January 1st.

Month Tables

?How are these created?

Clone this wiki locally