## MSDS 696 Notebook6 Spotify Playlist Demo Using SQLite Database 

## Project Title:
Create and Build A Data Engineering Pipeline to Collect, Process, and Store Spotify Data. This is intended to be a fun project to look at who the most popular artists are, what their most popular tracks are , and look at some characteristics of the songs.

## NoteBook Description:
This notebook creates new features on the Spotify track data. Once features are created and applied,the data will be checked for NANs. After cleaning, A Euclidian Distance will be calculated from one on the engineered features called "custom_score" to determine song similarity.This song list will then be sent to my personal Spotify account for listening.    

### Mary J Hollon
### Due 8-22-2024

### Data Dictionary

- id: Unique identifier for each track.
- name: Name of the track.
- artist_id: Identifier for the artist.
- year: - year song was released
- popularity: - this is the popularity score of the artist. The higher the score, the more popular the artist.
- release_date: - the song's actual release date
- energy: A float representing the intensity and activity of the track or how "loud" and "noisy" a track is. Values close to 0 are low energy and values close to 1 are high energy.
- danceability: A float indicating how suitable a track is for dancing based on a combination of musical elements. A value of 0 is the least danceable, a value of 1 is the most danceable.
- instrumentalness: A float predicting whether a track has no vocals.Values close to 0 have more vocal content, values close to 1 do not have vocal content
- loudness: A float representing the overall loudness of the track in decibels.The closer to 0 the louder the track and farther away from zero or large in absolute value, the softer or lower the track.
- tempo: A float indicating the tempo of the track in beats per minute. It indicates the "speed" or "pace" of the music
- valence: A float describing the musical positiveness conveyed by a track. Values closer to 0 are more sad or angry, while values closer to 1 are more happy or cheerful. 


Source: https://developer.spotify.com/documentation/web-api/reference/get-several-audio-features

Let's inspect the table information in the database

In [1]:
import sqlite3
import pandas as pd

# Connect to the SQLite database
conn = sqlite3.connect('spotify_music.db')

# Query to get column names from Tracks table
query_tracks = "PRAGMA table_info(Tracks)"
tracks_columns_info = pd.read_sql_query(query_tracks, conn)

# Query to get column names from Artists table
query_artists = "PRAGMA table_info(Artists)"
artists_columns_info = pd.read_sql_query(query_artists, conn)

# Display the column information
print("Tracks Table Columns:")
print(tracks_columns_info)

print("\nArtists Table Columns:")
print(artists_columns_info)


Tracks Table Columns:
    cid                              name     type  notnull dflt_value  pk
0     0                          track_id     TEXT        0       None   1
1     1                        track_name     TEXT        0       None   0
2     2                         artist_id     TEXT        0       None   0
3     3                              year  INTEGER        0       None   0
4     4                            energy     REAL        0       None   0
5     5                      danceability     REAL        0       None   0
6     6                  instrumentalness     REAL        0       None   0
7     7                          loudness     REAL        0       None   0
8     8                             tempo     REAL        0       None   0
9     9                           valence     REAL        0       None   0
10   10                   energy_category     TEXT        0       None   0
11   11                    tempo_category     TEXT        0       None   0
12 

In [2]:

# Connect to the SQLite database
conn = sqlite3.connect('spotify_music.db')

# Define the song name you're looking for
song_name = "High Hopes"

# Query to find the song by name
query = """
SELECT 
    t.track_id AS id, 
    t.track_name, 
    t.custom_score, 
    t.loudness_category,
    t.danceability_category,
    t.tempo_category,
    t.valence_category,
    a.artist_id, 
    a.artist_name,
    a.simplified_genre AS genre
FROM 
    Tracks t 
JOIN 
    Artists a 
ON 
    t.artist_id = a.artist_id
WHERE 
    t.track_name = ?
"""

# Execute the query with the song name as a parameter
df = pd.read_sql_query(query, conn, params=(song_name,))

# Display the result
df.head()


Unnamed: 0,id,track_name,custom_score,loudness_category,danceability_category,tempo_category,valence_category,artist_id,artist_name,genre
0,1rqqCSm0Qe4I9rUvWncaom,High Hopes,0.765519,Very High,Medium,Slow,Happy,20JZFwl6HVl6yg8a4H3ZqK,Panic! At The Disco,Pop


In [3]:
import sqlite3
import pandas as pd
from scipy.spatial.distance import euclidean

# Connect to the SQLite database
conn = sqlite3.connect('spotify_music.db')

def find_similar_tracks(track_name, top_n=25):
    # Query the database to get the tracks and artist data using table aliases
    query = """
    SELECT 
        t.track_id AS id, 
        t.track_name, 
        t.custom_score, 
        t.loudness_category,
        t.danceability_category,
        t.tempo_category,
        t.valence_category,
        a.artist_id, 
        a.artist_name,
        a.simplified_genre AS genre
    FROM 
        Tracks t 
    JOIN 
        Artists a 
    ON 
        t.artist_id = a.artist_id
    """

    df = pd.read_sql_query(query, conn)

    # Check if the track name exists in the dataframe
    reference_track = df[df['track_name'].str.strip().str.lower() == track_name.strip().lower()]

    if reference_track.empty:
        raise ValueError(f"Track name '{track_name}' not found in the dataset.")

    # Get the custom score of the reference track
    reference_score = reference_track['custom_score'].values[0]
    
    # Calculate the Euclidean distance (similarity) between the reference score and all other scores
    df['similarity'] = df['custom_score'].apply(lambda x: euclidean([x], [reference_score]))
    
    # Sort by similarity (ascending) and exclude the reference track itself
    similar_tracks = df[df['track_name'].str.strip().str.lower() != track_name.strip().lower()].sort_values(by='similarity', ascending=True).head(top_n)
    
    # Adjust display options for better readability
    pd.set_option('display.max_rows', None)
    pd.set_option('display.max_columns', None)
    pd.set_option('display.width', 1000)
    pd.set_option('display.colheader_justify', 'center')  # Center-align headers
    pd.set_option('display.float_format', '{:.4f}'.format)  # Format float values
    
           
    # Select relevant columns and rename them for clarity, including track_id
    similar_tracks = similar_tracks[['id', 'track_name', 'artist_name', 'custom_score', 'similarity']]
    similar_tracks.columns = ['Track ID', 'Track Name', 'Artist', 'Custom Score', 'Similarity']

    # Return the DataFrame with the most similar tracks
    return similar_tracks


# Example Usageenter the track name for track_name
similar_tracks_df = find_similar_tracks(track_name="High Hopes", top_n=25)

# Output the DataFrame to the console
print(similar_tracks_df)



             Track ID                   Track Name                         Artist                Custom Score  Similarity
2481  4gJfPoMIcSDGY0tMGxm6Fu                        OKI DOKI                            KAROL G     0.7655       0.0001  
949   2RiBogNRfulkNf7fVbPOrJ                    Saturday Sun                          Vance Joy     0.7656       0.0001  
982   3Yt2ph8Ko0JBANpdawzSF2         Major (feat. Key Glock)                        Young Dolph     0.7657       0.0001  
935   5cF0dROlMOK5uNZtivgu50                       Attention                       Charlie Puth     0.7654       0.0002  
2287  7mXuWTczZNxG5EDcjFEuJR                       LADY GAGA                         Peso Pluma     0.7653       0.0002  
2367  29vIEdxgMT9H1fnBMecWjA                  Lo Tienes Todo  Julión Álvarez y su Norteño Banda     0.7658       0.0002  
2292  6CvTEtGagmzQvkUzzyKR9k                    HARLEY QUINN                      Fuerza Regida     0.7652       0.0003  
816   7yNf9YjeO5JXUE3JEB

### NOTE: You MUST have Spotify Developer Credentials for this code to run !

In [16]:
import spotipy
from spotipy.oauth2 import SpotifyOAuth
import pandas as pd
from spotify_credentials import SPOTIFY_CLIENT_ID, SPOTIFY_CLIENT_SECRET

# Authenticate with Spotify using OAuth
sp = spotipy.Spotify(auth_manager=SpotifyOAuth(client_id=SPOTIFY_CLIENT_ID,
                                               client_secret=SPOTIFY_CLIENT_SECRET,
                                               redirect_uri="your_redirect_uri",
                                               scope="playlist-modify-public"))

def write_playlist_to_spotify(similar_tracks_df, playlist_name="Similar Songs Playlist"):
    user_id = sp.current_user()["id"]

    # Create a new playlist
    playlist = sp.user_playlist_create(user=user_id, name=playlist_name, public=True, description="Playlist based on custom_score similarity")

    # Get track IDs
    track_ids = similar_tracks_df['Track ID'].tolist()  # Use the correct case for 'Track ID'

    # Add tracks to the playlist
    sp.playlist_add_items(playlist_id=playlist['id'], items=track_ids)  # Use 'id' for playlist ID, not 'TRACK ID'
    print(f"Playlist '{playlist_name}' created successfully with {len(track_ids)} tracks!")

# Example Usage
similar_tracks = find_similar_tracks(track_name="High Hopes", top_n=25)

# Write the similar_tracks DataFrame to Spotify
write_playlist_to_spotify(similar_tracks, playlist_name="Similar Tracks to 'High Hopes'")


Playlist 'Similar Tracks to 'High Hopes'' created successfully with 25 tracks!


### I will select the song "I Had Some Help (Feat. Morgan Wallen)" by Post Malone and generate similar tracks based on my similarity score

In [15]:
# Connect to the SQLite database
conn = sqlite3.connect('spotify_music.db')

# Define the song name you're looking for
song_name = "I Had Some Help (Feat. Morgan Wallen)"

# Query to find the song by name
query = """
SELECT 
    t.track_id AS id, 
    t.track_name, 
    t.custom_score, 
    t.loudness_category,
    t.danceability_category,
    t.tempo_category,
    t.valence_category,
    a.artist_id, 
    a.artist_name,
    a.simplified_genre AS genre
FROM 
    Tracks t 
JOIN 
    Artists a 
ON 
    t.artist_id = a.artist_id
WHERE 
    
    t.track_name LIKE?
"""

# Execute the query with the song name as a parameter
df = pd.read_sql_query(query, conn, params=(song_name,))

# Display the result
df.head()

Unnamed: 0,id,track_name,custom_score,loudness_category,danceability_category,tempo_category,valence_category,artist_id,artist_name,genre
0,7221xIgOnuakPdLqT0F3nP,I Had Some Help (Feat. Morgan Wallen),0.7381,Very High,Medium,Medium,Happy,246dkjvS1zLTtiykXe5h60,Post Malone,Rap
1,0RKMthG1dz8hGpnCoDw6ae,I Had Some Help (Feat. Morgan Wallen),0.7554,Very High,Medium,Medium,Happy,246dkjvS1zLTtiykXe5h60,Post Malone,Rap
2,5IZXB5IKAD2qlvTPJYDCFB,I Had Some Help (Feat. Morgan Wallen),0.7522,Very High,Medium,Medium,Happy,246dkjvS1zLTtiykXe5h60,Post Malone,Rap
3,3ITY5rry5FRd6CVAmtskcI,I Had Some Help (Feat. Morgan Wallen),0.7516,Very High,Medium,Medium,Happy,246dkjvS1zLTtiykXe5h60,Post Malone,Rap
4,2PoE4L4YsgxtCgqBogKc5c,I Had Some Help (Feat. Morgan Wallen),0.7561,Very High,Medium,Medium,Happy,246dkjvS1zLTtiykXe5h60,Post Malone,Rap


In [19]:
import sqlite3
import pandas as pd
from scipy.spatial.distance import euclidean

# Connect to the SQLite database
conn = sqlite3.connect('spotify_music.db')

def find_similar_tracks(track_name, top_n=25):
    # Query the database to get the tracks and artist data using table aliases
    query = """
    SELECT 
        t.track_id AS id, 
        t.track_name, 
        t.custom_score, 
        t.loudness_category,
        t.danceability_category,
        t.tempo_category,
        t.valence_category,
        a.artist_id, 
        a.artist_name,
        a.simplified_genre AS genre
    FROM 
        Tracks t 
    JOIN 
        Artists a 
    ON 
        t.artist_id = a.artist_id
    """

    df = pd.read_sql_query(query, conn)

    # Check if the track name exists in the dataframe
    reference_track = df[df['track_name'].str.strip().str.lower() == track_name.strip().lower()]

    if reference_track.empty:
        raise ValueError(f"Track name '{track_name}' not found in the dataset.")

    # Get the custom score of the reference track
    reference_score = reference_track['custom_score'].values[0]
    
    # Calculate the Euclidean distance (similarity) between the reference score and all other scores
    df['similarity'] = df['custom_score'].apply(lambda x: euclidean([x], [reference_score]))
    
    # Sort by similarity (ascending) and exclude the reference track itself
    similar_tracks = df[df['track_name'].str.strip().str.lower() != track_name.strip().lower()].sort_values(by='similarity', ascending=True).head(top_n)
    
    # Adjust display options for better readability
    pd.set_option('display.max_rows', None)
    pd.set_option('display.max_columns', None)
    pd.set_option('display.width', 1000)
    pd.set_option('display.colheader_justify', 'center')  # Center-align headers
    pd.set_option('display.float_format', '{:.4f}'.format)  # Format float values
    
           
    # Select relevant columns and rename them for clarity, including track_id
    similar_tracks = similar_tracks[['id', 'track_name', 'artist_name', 'custom_score', 'similarity']]
    similar_tracks.columns = ['Track ID', 'Track Name', 'Artist', 'Custom Score', 'Similarity']

    # Return the DataFrame with the most similar tracks
    return similar_tracks


# Example Usageenter the track name for track_name
similar_tracks_df = find_similar_tracks(track_name= "I Had Some Help (Feat. Morgan Wallen)", top_n=25)

# Output the DataFrame to the console
print(similar_tracks_df)



             Track ID                            Track Name                            Artist         Custom Score  Similarity
2593  7HUhMOrlvwBPfBq3c0ajh0                                            JOYRIDE                Kesha     0.7381       0.0001  
1513  5vDNippoMr52KpXBO9b9KQ                                             Coffin           Lil Yachty     0.7382       0.0001  
2170  1oWnDC5OoMPPosVY2cdXgT                                      Lavender Girl                Caamp     0.7382       0.0001  
2314  2gyxAWHebV7xPYVxqoi86f                                      get him back!       Olivia Rodrigo     0.7378       0.0003  
772   53yTYusPQJ1AApL1hi0Dnc                           Sunrise, Sunburn, Sunset           Luke Bryan     0.7383       0.0003  
2364  3wlIAXSDtlD9iU8ysld06Z                                            BIPOLAR           Peso Pluma     0.7385       0.0004  
1140  03eJ2DclFWXYU8GWgANdmZ  bad vibes forever (feat. PnB Rock & Trippie Redd)         XXXTENTACION     0.7386

In [21]:
 
import spotipy
from spotipy.oauth2 import SpotifyOAuth
import pandas as pd
from spotify_credentials import SPOTIFY_CLIENT_ID, SPOTIFY_CLIENT_SECRET

# Authenticate with Spotify using OAuth
sp = spotipy.Spotify(auth_manager=SpotifyOAuth(client_id=SPOTIFY_CLIENT_ID,
                                               client_secret=SPOTIFY_CLIENT_SECRET,
                                               redirect_uri="your_redirect_uri",
                                               scope="playlist-modify-public"))

def write_playlist_to_spotify(similar_tracks_df, playlist_name="Similar Songs Playlist"):
    user_id = sp.current_user()["id"]

    # Create a new playlist
    playlist = sp.user_playlist_create(user=user_id, name=playlist_name, public=True, description="Playlist based on custom_score similarity")

    # Get track IDs
    track_ids = similar_tracks_df['Track ID'].tolist()  # Use the correct case for 'Track ID'

    # Add tracks to the playlist
    sp.playlist_add_items(playlist_id=playlist['id'], items=track_ids)  # Use 'id' for playlist ID, not 'TRACK ID'
    print(f"Playlist '{playlist_name}' created successfully with {len(track_ids)} tracks!")

# Example Usage
similar_tracks = find_similar_tracks(track_name="I Had Some Help (Feat. Morgan Wallen)", top_n=25)

# Write the similar_tracks DataFrame to Spotify
write_playlist_to_spotify(similar_tracks, playlist_name="Similar Tracks to 'I Had Some Help'")



Playlist 'Similar Tracks to 'I Had Some Help'' created successfully with 25 tracks!


### Summary

This notebook demonstrates how to generate a playlist on a similarity score that is generated off of a feature called "custom_score". The "custom_score" is a weighted sum of the following features: danceability_weight=0.30, energy_weight=0.25, valence_weight=0.25, loudness_weight=0.20). The "custom_score" is calculated in File3 during feature engineering:  

#### Calculate the custom score
    df['custom_score'] = (df['danceability'] * danceability_weight +
                          df['energy'] * energy_weight +
                          df['valence'] * valence_weight +
                          df['loudness_scaled'] * loudness_weight)
                          
                          
- The "custom_score" is a composite metric that combines several audio features—danceability, energy, valence, and loudnessinto a single value. This score is designed to capture an overall "vibe" or "mood" of a track based on these specific attributes. 

- Interpretation of the Custom Score
The custom score is a single numeric value that synthesizes the key musical characteristics of a track into a unified metric.
- Higher custom scores represent tracks that are generally more danceable, energetic, positive, and loud.

- Higher Custom Scores: These suggest that a track is likely to be upbeat, lively, and suitable for environments where high energy and positive mood are desired, such as parties or workout sessions

- Lower Custom Scores: These might correspond to tracks that are more subdued, less energetic, or have a lower sense of positivity. These tracks might be better suited for relaxation or introspective moments.

#### Use in Playlist Generation
In the context of playlist generation, this custom score can be used to filter or rank tracks based on how well they fit a desired mood or style. For example, if you want a playlist of energetic and danceable songs, you would select tracks with the highest custom scores.
The playlist generated using the custom score will have a coherent feel, as all included tracks will share a similar balance of danceability, energy, valence, and loudness, as dictated by the weighted sum.

#### Limitations and Considerations
The custom score is highly dependent on the chosen weights. Changing these weights will alter the significance of each feature in the final score. For example, if you increase the weight of valence, the custom score will place more emphasis on the positive or negative emotion conveyed by the track.
Additionally, the score is only as good as the input data. If the features (e.g., danceability, energy) are not accurately reflecting the track's qualities, the custom score might not perfectly represent the desired characteristics.


                          

## END of Notebook