GitHub - kellwyn/kellwyn.github.io: Applied Data Science with Python - Coursera Specialization from University of Michigan

Purpose of the Database:

Sparkify, a music streaming service wants to analyze and identify ways to best monetize and improve their platform. THey are interested in utilizing data points collected from users activity on the app, and build a database that allows running queries for reporting and analytical purposes.

The metadata source is a subset of the Million Song Dataset. There are 2 main directories available to build from, the Songs metadata which forms the detailed information on relevant song attributes such as artist name, location, year of release etc. The second source is a Logs dataset logging user information attributes and their activity.

Star schema is used for dimensional modeling, as we can easily query the relevant information using simplified queries and faster aggregation.It also allows for denormalized tables.

Fact table: songplays Dimensions tables: songs, artist, users, time.

In addition to the data files, the project includes six files:

test.ipynb displays the first few rows of each table to let me check my database. create_tables.py drops and creates tables. I run this file to reset my tables before each time I run the ETL scripts. etl.ipynb reads and processes a single file from song_data and log_data and loads the data into the tables. This notebook contains detailed instructions on the ETL process for each of the tables. etl.py reads and processes files from song_data and log_data and loads them into the tables. It's based on my work in the ETL notebook. sql_queries.py contains all my sql queries, and is imported into the last three files above. README.md then provides an introduction to this project.

Examples %load_ext sql %sql postgresql://student:student@127.0.0.1/sparkifydb %%sql SELECT COUNT() from songplays; postgresql://student:**@127.0.0.1/sparkifydb

1 rows affected.

Out[1]: count 6820

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
Applied ML with Python		Applied ML with Python
CricketIPLProj		CricketIPLProj
Data Science with Python		Data Science with Python
AntarcticSeawatersurface.csv		AntarcticSeawatersurface.csv
Assignment3 Building a Custom Visualization.ipynb		Assignment3 Building a Custom Visualization.ipynb
Assignment4 DataScience Project NZ Avg temperature Vs Ocean Surface temperatures.ipynb		Assignment4 DataScience Project NZ Avg temperature Vs Ocean Surface temperatures.ipynb
Assignment4Visualizations_ProjectSubmission.docx		Assignment4Visualizations_ProjectSubmission.docx
Assignment4Visualizations_ProjectSubmissionpdf.pdf		Assignment4Visualizations_ProjectSubmissionpdf.pdf
AvgTemperature.csv		AvgTemperature.csv
Correlating NZTemp with Ocean surface waters.ipynb		Correlating NZTemp with Ocean surface waters.ipynb
Coursera Certification Course 1.pdf		Coursera Certification Course 1.pdf
Coursera QCZU692SW363 Certificate Course 2.pdf		Coursera QCZU692SW363 Certificate Course 2.pdf
Data Engineering project - Data modeling with Postgresql.zip		Data Engineering project - Data modeling with Postgresql.zip
Pandas Assignment 4.pdf		Pandas Assignment 4.pdf
Pandas Assignment Week 3, Reattempt.pdf		Pandas Assignment Week 3, Reattempt.pdf
PlottingWeatherPatternsAssignment2.ipynb		PlottingWeatherPatternsAssignment2.ipynb
Project 1 - Data Modeling with Postgres - Final Submission.zip		Project 1 - Data Modeling with Postgres - Final Submission.zip
PythonCourse2PractiseNotebook.ipynb		PythonCourse2PractiseNotebook.ipynb
README.md		README.md
TasmanSeawatersurface.csv		TasmanSeawatersurface.csv
UnderstandingDistributionsThroughSampling.ipynb		UnderstandingDistributionsThroughSampling.ipynb
Week 1.ipynb		Week 1.ipynb
Week 2.ipynb		Week 2.ipynb
Week 3.ipynb		Week 3.ipynb
Week 4.ipynb		Week 4.ipynb
_config.yml		_config.yml
create_tables.py		create_tables.py
etl.ipynb		etl.ipynb
etl.py		etl.py
sql_queries.py		sql_queries.py
test.ipynb		test.ipynb
watersurface.csv		watersurface.csv

kellwyn/kellwyn.github.io

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages