ETL data pipeline for SixFifty modelling & analytics.
Dataset | Date | Format | Source | Licence | Download URL | Repo Path |
---|---|---|---|---|---|---|
UK Parliament general election results | 6th May 2010 | XLS | Electoral Commission | Open Government Licence v2.0 | GE2010-results-flatfile-website.xls | /data/general_election/electoral_commission/results/ |
UK Parliament general election results | 7th May 2015 | CSV - Zip file | Electoral Commission | Open Government Licence v2.0 | 2015-UK-general-election-data-results-WEB.zip | /data/general_election/electoral_commission/results/ |
EU Referendum results | 23rd June 2016 | CSV | Electoral Commission | Open Government Licence v2.0 | EU-referendum-result-data.csv | /data/eu_referendum/electoral_commission/results/ |
We aim to provide our processed datasets in both CSV and Feather formats.
Dataset | Description | Download URL | Repo Path |
---|---|---|---|
ge_2010_results |
Cleaner version of 2010 GE data | CSV, Feather | data/general_election/electoral_commission/results/clean/ge_2010_results.csv |
ge_2015_results |
Cleaner version of 2015 GE data | CSV, Feather | data/general_election/electoral_commission/results/clean/ge_2015_results.csv |
model_2015 |
Clean version of 2015 GE data along with counties and EU Referendum results at a regional level | CSV, Feather | data/model/clean/model_2015.csv |
A manually curated set of poll results can be downloaded in a variety of formats. See data/polls/ for more information including a data dictionary.
Created by SixFifty, includes timestamps of when each person was speaking and for how long. See data/bbcdebate/speakers/ for more information including a data dictionary.
- Check you're running Python 3.
- Ensure you have the Python requirements with
pip install -r requirements.txt
- Then cd into the repo root (where this README is located) and run the following to download, populate this repo with data and auto-clean it ready for modelling:
python data/generate_data.py
Please see these instructions on installing Anaconda + dependencies + configuring S3 tokens.
Name | Description | Attribution Statement |
---|---|---|
Open Parliament Licence | Free to copy, publish, distribute, transmit, adapt and exploit commercially or non-commercially. See URL for full details. | Contains Parliamentary information licensed under the Open Parliament Licence v3.0. |
Open Government Licence | Free to copy, publish, distribute, transmit, adapt and exploit commercially and non-commercially. See URL for full details. | Contains public sector information licensed under the Open Government Licence v2.0. |