Skip to content

npk7/Postgres_DataMart_Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Modeling with PostgreSQL

Summary

Using a relational database can be very beneficial espescially in context of summarizing data, aggregating data and able to store it in normalized schema can enable that.

This project is developing a dimensional data model (star schema) for our Sparkify data to achieve fast querying to obtain summarized, aggregated data.

I am using PostgreSQL to acheive this.

Database Schema

I am using a dimensional data model which otherwise called star schema to store the data. Dimensional data models are different from the normalized relational data models. Standard relational data models are primarily used for transaction processing whereas dimensional data models serve analytic processing purposes.

We are using star schema that benefits measuring items and sorting, summarizing using various featuers/attribures. The features of the Sparkify or attributes will be stored in dimensional tables whereas the measures will be stored in fact table.

ETL Pipeline

I have a ETL piepline that processes data, to fill these tables, by extracting necessary information from the source documents, applying various transformations and loading it to the relational tables.

How to run the scripts

  1. To create python tables please execute the create_tables.py by > python create_tables.py
  2. To load the tables (by extracting and transforming) please execute etl.py by > python etl.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published