Data Warehouse on AWS with Redshift

The purpose of this project is to build an adapted data model thanks to python to load data in a S3 bucket and wrangle them into a star schema (see the ERD).

Prerequisite

Install Python 3.x.
This project is build with conda instead of pip. Install anaconda or modify the script to make use of pip.
You need also to have a AWS Redshift cluster up and running (4 to 8 nodes suggested)

Main Goal

The compagny Sparkify need to analyses theirs data to better know the way users (free/paid) use theirs services. With this data model we will be able to ask question like When? Who? Where? and What? about the data. The task is to build an ETL Pipeline that extract data from a S3, stagging it in Redshift to be able to transform the data into a Star Schema (Dimensional and Fact Tables) to let the Analytics Team to find insights easily.

Data Model

This data model is called a start schema data model. At it's aim is a Fact Table -songplays- that containg fact on song play like user agent, location, session or user's level and then have columns of foreign keys (FK) of 4 dimension tables :

Songs table with data about songs
Artists table
Users table
Time table

This model enable search with the minimum SQL JOIN possible and enable fast read queries.

Run it

Few steps

Launch create_tables.py to prepare the database
Run etl.py to wrangle the data

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
Song_ERD.png		Song_ERD.png
create_tables.py		create_tables.py
environment.yaml		environment.yaml
etl.py		etl.py
requirements.txt		requirements.txt
sql_queries.py		sql_queries.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Warehouse on AWS with Redshift

Prerequisite

Main Goal

Data Model

Run it

About

Releases

Packages

Languages

gfelot/DEND-DataWarehouse

Folders and files

Latest commit

History

Repository files navigation

Data Warehouse on AWS with Redshift

Prerequisite

Main Goal

Data Model

Run it

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages