# In Class Activity - Spotify Database - SQL Practice

Instructor: Melissa Laurino</br>
Spring 2025</br>

Name:
</br>
Date:
<br>
<br>

In [19]:
# Load necessary packages:
from sqlalchemy import create_engine, inspect, text # Database navigation
import sqlite3 # A second option for working with databases
import pandas as pd # Python data manilpulation

In [20]:
# Create a SQLite database and engine
db_file = "spotify_data.db"
engine = create_engine(f"sqlite:///{db_file}")

In [21]:
# Inspect the database to list the fields
inspector = inspect(engine)
columns = inspector.get_columns("spotify_history")

# Print column names
print("Columns in spotify_history table:")
for col in columns:
    print(col["name"], "-", col["type"])

Columns in spotify_history table:
ts - TEXT
platform - TEXT
ms_played - BIGINT
conn_country - TEXT
ip_addr - TEXT
master_metadata_track_name - TEXT
master_metadata_album_artist_name - TEXT
master_metadata_album_album_name - TEXT
spotify_track_uri - TEXT
episode_name - FLOAT
episode_show_name - FLOAT
spotify_episode_uri - FLOAT
audiobook_title - FLOAT
audiobook_uri - FLOAT
audiobook_chapter_uri - FLOAT
audiobook_chapter_title - FLOAT
reason_start - TEXT
reason_end - TEXT
shuffle - BOOLEAN
skipped - BOOLEAN
offline - BOOLEAN
offline_timestamp - FLOAT
incognito_mode - BOOLEAN
year - BIGINT


For practice, there are not multiple tables in this database. There is only one table named spotify_history.

Metadata for this database can be found here: https://github.com/MelissaLaurino/SpotifyStreamingHistory

We can use COUNT( * ) to summarize and count occurrences in SQL. Use COUNT( * ) below:

Query 1: <br>
Find the top 30 artists listened to in the year 2024.

In [22]:
with engine.connect() as connection:
    query = text("""
        SELECT master_metadata_album_artist_name, COUNT(*)
        FROM spotify_history
        WHERE year = 2024
        GROUP BY master_metadata_album_artist_name
        ORDER BY COUNT(*) DESC
        LIMIT 30;
    """)
    top_30_artists_in_2024 = pd.read_sql(query, connection)

top_30_artists_in_2024

Unnamed: 0,master_metadata_album_artist_name,COUNT(*)
0,Miley Cyrus,680
1,Ariana Grande,583
2,Hozier,512
3,Eminem,454
4,Sabrina Carpenter,444
5,Lady Gaga,326
6,Billie Eilish,284
7,Noah Cyrus,282
8,Amy Winehouse,266
9,Teddy Swims,257


Query 2: <br>
Find the top 30 songs listened to in the year 2017.

In [23]:
with engine.connect() as connection:
    query = text("""
        SELECT master_metadata_track_name, COUNT(*)
        FROM spotify_history
        WHERE year = 2017
        GROUP BY master_metadata_track_name
        ORDER BY COUNT(*) DESC
        LIMIT 30;
    """)
    top_30_songs_in_2017 = pd.read_sql(query, connection)

top_30_songs_in_2017

Unnamed: 0,master_metadata_track_name,COUNT(*)
0,Malibu,223
1,Close,150
2,Shape of You,147
3,It Ain’t Me (with Selena Gomez),117
4,Take Me Down,95
5,Rainbow,95
6,Havana (feat. Young Thug),95
7,Week Without You,88
8,Anyway,86
9,Love Someone,82


Query 3: <br>
Ask a question and create your own! 

In [24]:
#Disconnect from the database. Always remember to disconnect :) 


# Assignment #5 - Data Gathering and Warehousing - DSSA-5102

<b>Only Murders in the...Database?</b><br>
An introduction to navigating SQL databases using R and Jupyter Notebook. <br>
<br>
Congrats! You have solved the murder from Assignment #4, let's practice more queries in SQL City! <br>
<br>
Your task for <b>Assignment #5</b> is to complete the following objectives:<br>
You are a data scientist that was hired by SQL City. Your objectives are as follows:<br>
- Objective 1: The town is willing to fund more training for officers in SQL City based on the type of crime that is most committed. The training would help them identify the clues that indicate these two types upon arrival on a crime scene. As a data scientist, what crime type would you advise needs more training for officers in SQL city? What crime was committed the most in SQL City within the database date range? Save your query as a dataframe and quickly add a ggplot2 bar graph visual to support your recommendation.<br>
- Objective 2: The town has also received more funding to encourage DOUBLE overtime for officers in SQL City during the month with the highest crime rate through out the date range of the database. What month would you advise the town to encourage officer overtime? <br>
- Objective 3: To thank the officers for their extra training and overtime, the town will pay for their monthly gym membership. They want to give the officers the membership that the least amount of civilians have to avoid being recognized daily as the town heros. The membership can be used in any town. What membership does the town give them?<br><br>
<b>--</b>Add detailed comments to explain EVERY query or SQL command you use while we are still learning and practicing. I have my steps outlined, but please add more cells inbetween for additional queries! There is no limit on the number of queries you can use. <br>
<b>--</b>For each query include comments such as "SELECT all records FROM table WHERE column name = X"<br>
<b>--</b>Answer the prompts in markdown cells. Justify your response. A simple yes/no answer will receive no credit.<br>

Recommended Readings: Chapters 4-7 in Getting Started with SQL by Thomas Nield<br><br>

SQL Dictionary: https://www.w3schools.com/sql/sql_ref_join.asp

This fabulous database was created by @NUKnightLab on Github and can be found here: https://github.com/NUKnightLab/sql-mysteries


In [25]:
# Load necessary libraries


In [26]:
# Connect to our .db file


In [27]:
# For a quick reference for tables and columns, refer to schema on Blackboard, or list the tables and fields below:


We can use the COUNT command to explore our queries further. COUNT will count the number of records that meet the specified criteria.
Additional examples using COUNT: https://www.w3schools.com/sql/sql_count.asp

In [28]:
#Practice analyzing COUNTS in data tables with SQL queries:
#Brainstorming for Objective 1
#We can find the top ten cities within our Murder Mystery database that had the most crimes by using the COUNT SQL command.
#SELECT the cities that have the most counts within the table crime_scene_report and
#GROUP the results BY city and ORDER BY a DESC count, only show a limit of 10 records:
practice_cities <- dbSendQuery(db, "SELECT city, COUNT(*) as count
                            From crime_scene_report
                            GROUP BY city
                            ORDER BY count DESC
                            LIMIT 10")
dbFetch(practice_cities)

#Looks like SQL City and Murfeesboro have the highest number of crimes! 

#Use COUNT(*) as count to determine the answers to the objectives below. 

SyntaxError: unterminated string literal (detected at line 6) (2993665642.py, line 6)

<b>Objective 1:</b> The town is willing to fund more training for officers in SQL City based on the type of crime that is most committed. The training would help them identify the clues that indicate these two types upon arrival on a crime scene. As a data scientist, what crime type would you advise needs more training for officers in SQL city? What crime was committed the most in SQL City within the database date range? Save your query as a dataframe and quickly add a bar graph visual to support your recommendation.

In [None]:
# Graph your results:




Answer:

<b>Objective 2:</b> The town has also received more funding to encourage DOUBLE overtime for officers in SQL City during the month with the highest crime rate through out the date range of the database. What month would you advise the town to encourage officer overtime?

Answer:

<b>Objective 3:</b> To thank the officers for their extra training and overtime, the town will pay for their monthly gym membership. They want to give the officers the membership that the least amount of civilians have to avoid being recognized daily as the town heros. The membership can be used in any town. What membership does the town give them?

Answer:

In [None]:
#Disconnect from the database. Always remember to disconnect :) 

