gcp-datalake

Summary

About
Tools
Data

About

The main idea of this project is the implementation of a Data Lake and a Data Warehouse using data provided by API from football-data.org.

You can check the progress of the project through the issues and commit history in the development branch.

Tools

The main tools that I used to make this project is provided by Google.

It is intended to use the free tier quota of each tool.

Python - This Programming language will be the core of this project mainly to make API requests.
Google Cloud Storage (GCS) - Where all our data will be stored, either .csv or .json files.
Big Query - This engine will help us to have a data warehouse nested to our datalake.
Apache Spark - Processing engine of data. This tool will be responsible for processing our data and leading them to more refined layers.
Looker - Data Visualization tool - It is important to provide the results and have the possibility of generate dashboards and support decision making.

Data

I'm going to use data from football-data API.

This API provides all data in .json format and there are countless data about football around the world (leagues, matches, teams, players, etc).

On my use case I will design the project using only data from brazilian league (BSA).

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Repository files navigation

gcp-datalake

Summary

About

Tools

Data

About

Releases

Packages

soutothales/gcp-datalake

Folders and files

Latest commit

History

README.md

README.md

Repository files navigation

gcp-datalake

Summary

About

Tools

Data

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages