Skip to content

ntalib/Data-Modeling-with-PostgreSQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Modeling with PostgreSQL

Overview

This is a Data Modeling project using PostgreSQL. This project builds an ETL pipeline using Python using generated song data. The datas are in json and this project analyzes the songs to discern what songs the users are listening to.

The Dataset

This project uses songs generated from Million Song Dataset

Log Dataset

The log dataset is generated by Event Simulator

Project Schema

Fact Table

songplays : records song playes in the log data, records with NextSong

songplay_id, start_time, user_id, level, song_id, artist_id, session_id, location, user_agent

Dimension Tables

users : users in the app

user_id, first_name, last_name, gender, level

songs : songs in music database

song_id, title, artist_id, year, duration

artists : artists in music database

artist_id, name, location, latitude, longitude

time : timestamps of records in songplays broken down into specific units

start_time, hour, day, week, month, year, weekday

Project Documents

sql_queries.py : contains sql queries for dropping and creating fact and dimension tables. Also, contains insertion query template.

create_tables.py : contains code for setting up database. Running this file creates sparkifydb and also creates the fact and dimension tables.

etl.ipynb : a jupyter notebook to analyse dataset before loading.

etl.py : read and process song_data and log_data

test.ipynb : a notebook to connect to postgres db and validate the data loaded.

Environments used

Python 3.6 or above

PostgresSQL 9.5 or above

psycopg2 - PostgreSQL database adapter for Python

Run

Run the drive program main.py as below.

python main.py

The create_tables.py and etl.py file can also be run independently as below:

python create_tables.py 
python etl.py 

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published