BigQuery is Google's serverless, scalable data warehouse that enables training custom Machine Learning models using SQL.
Terraform (by Hashicorp) is the definitive tool for provisioning Cloud resources. It enables 'everything' to be defined as code.
In England, an increased number of children are not getting a place at their first choice high school due to a baby boom towards the end of the 2000s. We will test this theory by comparing live birth data with school application data. Note, in the UK, children starting high school in the September after they turn 11. Therefore we will compare school admission data with live births from 11 years previous.
- Get the data
- Create BigQuery dataset and tables
- Load the data into BigQuery
- Run some queries on the data
- Create some nice visualisations using Google Data Studio
Get the data
Save the following files into the
https://www.ons.gov.uk/generator?uri=/peoplepopulationandcommunity/birthsdeathsandmarriages/livebirths/bulletins/birthsummarytablesenglandandwales/2017/a819f426&format=csv rename as
Clean the data
cd data ./clean_apps_offers.sh 2019_Apps_Offers_UD_time_series.csv ./clean_live_births.sh 2017_Live_Births.csv
Load the data into BigQuery
We'll do all this using Terraform which is the definitive tool for defining Cloud resources as code.
cd terraform terraform init terraform plan terraform apply
Query the Data
- Open up BigQuery and run the SQL in data/national_secondary_admissions_and_live_births.sql
- Save the results of the query as a view
Create a visualisation in Data Studio
Using the view as a Data Source, create a report like this one!