Skip to content

Applied Data Science with Python - Coursera Specialization from University of Michigan

Notifications You must be signed in to change notification settings

kellwyn/kellwyn.github.io

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Purpose of the Database:

Sparkify, a music streaming service wants to analyze and identify ways to best monetize and improve their platform. THey are interested in utilizing data points collected from users activity on the app, and build a database that allows running queries for reporting and analytical purposes.

The metadata source is a subset of the Million Song Dataset. There are 2 main directories available to build from, the Songs metadata which forms the detailed information on relevant song attributes such as artist name, location, year of release etc. The second source is a Logs dataset logging user information attributes and their activity.

Star schema is used for dimensional modeling, as we can easily query the relevant information using simplified queries and faster aggregation.It also allows for denormalized tables.

Fact table: songplays Dimensions tables: songs, artist, users, time.

In addition to the data files, the project includes six files:

test.ipynb displays the first few rows of each table to let me check my database. create_tables.py drops and creates tables. I run this file to reset my tables before each time I run the ETL scripts. etl.ipynb reads and processes a single file from song_data and log_data and loads the data into the tables. This notebook contains detailed instructions on the ETL process for each of the tables. etl.py reads and processes files from song_data and log_data and loads them into the tables. It's based on my work in the ETL notebook. sql_queries.py contains all my sql queries, and is imported into the last three files above. README.md then provides an introduction to this project.

Examples %load_ext sql %sql postgresql://student:student@127.0.0.1/sparkifydb %%sql SELECT COUNT() from songplays; postgresql://student:**@127.0.0.1/sparkifydb

1 rows affected.

Out[1]: count 6820

About

Applied Data Science with Python - Coursera Specialization from University of Michigan

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages