The main idea of this project is the implementation of a Data Lake and a Data Warehouse using data provided by API from football-data.org.
You can check the progress of the project through the issues and commit history in the development
branch.
The main tools that I used to make this project is provided by Google.
It is intended to use the free tier quota of each tool.
- Python - This Programming language will be the core of this project mainly to make API requests.
- Google Cloud Storage (GCS) - Where all our data will be stored, either
.csv
or.json
files. - Big Query - This engine will help us to have a data warehouse nested to our datalake.
- Apache Spark - Processing engine of data. This tool will be responsible for processing our data and leading them to more refined layers.
- Looker - Data Visualization tool - It is important to provide the results and have the possibility of generate dashboards and support decision making.
I'm going to use data from football-data API.
This API provides all data in .json format and there are countless data about football around the world (leagues, matches, teams, players, etc).
On my use case I will design the project using only data from brazilian league (BSA).