Song Play Data Analysis

The ETL process will read every file contained in the data folder, process its data and then transform them to be inserted in the database. For the songs and artists table, the data was transfered from the song_data file. For songlplays, the data was taken from the log_data file together with two atributes from the songs and artists tables (song_id and artist_id), after a join on the artist_id, where songs title, artists name and songs duration were equal to the log_data atributes, which only happened once with this dataset. It also converts timestamp column in the log file data to datetime, so it can be inserted to Time Table, as the log file had the Timestamp in milliseconds.

Project Files

This project consists of the following files:

data - This is all the data collected on songs and user activity on Sparkfy new music streaming app.
sql_queries.py - This file contains Postgres SQL queries in string formate.
create_tables.py - This script uses the sql_queries.py file to create new tables or drop old tables in the database.
etl.py - This script is used to build ETL processes which will read every file contained in the data folder, process its data and transform them to be inserted in the database using variables in sql_queries.py file.
etl.ipynb - This notebook has every step in etl.py used to run every step as a trial before using the whole script.
test.ipynb - This notebook is used to run tests on the database.

How To Run

Firstly, we run create_tables.py in the terminal to create the tables or drop if they already exists. To make sure every thing is created we can use test.ipynb notebook. Secondly, we run etl.py to make the ETL process. Finaly, to make sure every thing is working use the test.ipynbnotebook to run tests on the database.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
images		images
log_data/2018/11		log_data/2018/11
song_data/A		song_data/A
ER.ipynb		ER.ipynb
README.md		README.md
create_tables.py		create_tables.py
etl.ipynb		etl.ipynb
etl.py		etl.py
sql_queries.py		sql_queries.py
test.ipynb		test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Song Play Data Analysis

Contents

Project Purpose

Project Discription

Database

Fact Table

Dimension Tables

ETL Process

Project Files

How To Run

About

Releases

Packages

Languages

ibrahimmoursy/Song-Play-Data-Analysis

Folders and files

Latest commit

History

Repository files navigation

Song Play Data Analysis

Contents

Project Purpose

Project Discription

Database

Fact Table

Dimension Tables

ETL Process

Project Files

How To Run

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages