Skip to content

soutothales/gcp-datalake

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 

Repository files navigation

gcp-datalake

Summary

  1. About
  2. Tools
  3. Data

About

The main idea of this project is the implementation of a Data Lake and a Data Warehouse using data provided by API from football-data.org.

You can check the progress of the project through the issues and commit history in the development branch.

Tools

The main tools that I used to make this project is provided by Google.

It is intended to use the free tier quota of each tool.

  1. Python - This Programming language will be the core of this project mainly to make API requests.
  2. Google Cloud Storage (GCS) - Where all our data will be stored, either .csv or .json files.
  3. Big Query - This engine will help us to have a data warehouse nested to our datalake.
  4. Apache Spark - Processing engine of data. This tool will be responsible for processing our data and leading them to more refined layers.
  5. Looker - Data Visualization tool - It is important to provide the results and have the possibility of generate dashboards and support decision making.

Data

I'm going to use data from football-data API.

This API provides all data in .json format and there are countless data about football around the world (leagues, matches, teams, players, etc).

On my use case I will design the project using only data from brazilian league (BSA).

Releases

No releases published

Packages

No packages published