Skip to content
No description, website, or topics provided.
HCL Shell
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


Loading open datasets into Google BigQuery using Terraform.

BigQuery is Google's serverless, scalable data warehouse that enables training custom Machine Learning models using SQL.

Terraform (by Hashicorp) is the definitive tool for provisioning Cloud resources. It enables 'everything' to be defined as code.


In England, an increased number of children are not getting a place at their first choice high school due to a baby boom towards the end of the 2000s. We will test this theory by comparing live birth data with school application data. Note, in the UK, children starting high school in the September after they turn 11. Therefore we will compare school admission data with live births from 11 years previous.

Data Sources


  • Get the data
  • Create BigQuery dataset and tables
  • Load the data into BigQuery
  • Run some queries on the data
  • Create some nice visualisations using Google Data Studio

Get the data

Save the following files into the data directory. rename as 2017_Live_Births.csv

Clean the data

cd data
./ 2019_Apps_Offers_UD_time_series.csv
./ 2017_Live_Births.csv

Load the data into BigQuery

First of all we need to create the BigQuery dataset and tables with an appropriate schema. Once they are created we will load the data into the tables from our local CSV files.

We'll do all this using Terraform which is the definitive tool for defining Cloud resources as code.

Install Terraform here, initialise remote state then run:

cd terraform
terraform init
terraform plan
terraform apply

Query the Data

Create a visualisation in Data Studio

Using the view as a Data Source, create a report like this one!

Data Studio Report

You can’t perform that action at this time.