<a href="https://colab.research.google.com/github/jarodchristiansen/Machine-Learning-Deep-Learning/blob/master/Spotify_Recommendation_Algo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Setup Spotify API

In [2]:
!pip install spotipy

Collecting spotipy
  Downloading spotipy-2.24.0-py3-none-any.whl.metadata (4.9 kB)
Collecting redis>=3.5.3 (from spotipy)
  Downloading redis-5.0.8-py3-none-any.whl.metadata (9.2 kB)
Downloading spotipy-2.24.0-py3-none-any.whl (30 kB)
Downloading redis-5.0.8-py3-none-any.whl (255 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m255.6/255.6 kB[0m [31m8.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: redis, spotipy
Successfully installed redis-5.0.8 spotipy-2.24.0


In [34]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

from google.colab import userdata


# Set up Spotify API credentials
client_id = userdata.get('spotify_id')
client_secret = userdata.get('spotify_secret')

# Authenticate using Client Credentials Flow
auth_manager = SpotifyClientCredentials(client_id=client_id, client_secret=client_secret)
sp = spotipy.Spotify(auth_manager=auth_manager)


## Methods to gather initial tracks for seed dataset

### External methods to enhance specifity

In [14]:
def search_tracks_by_genre(genre, limit=50):
    """
    Fetches track IDs by searching for a specific genre.

    Args:
    - genre (str): Genre keyword to search for.
    - limit (int): Maximum number of tracks to fetch.

    Returns:
    - track_ids (list): List of track IDs.
    """
    track_ids = []

    # Search for tracks by genre
    results = sp.search(q=f'genre:{genre}', type='track', limit=limit)
    tracks = results['tracks']['items']

    # Collect track IDs
    for track in tracks:
        track_ids.append(track['id'])

    return track_ids

# Example usage
genre = 'pop'  # You can replace this with any genre you prefer
pop_tracks = search_tracks_by_genre(genre, limit=50)
print(pop_tracks)


['0WbMK4wrZ1wFSty9F7FCgu', '6dOtVTDdiauQNBQEDOtlAB', '2plbrEY59IikOBgBGLjaoe', '5G2f63n7IPVPPjfNIGih7Q', '5N3hjp1WNayUPZrA8kJmJP', '2qSkIjg1o9h3YT9RAgYN75', '4xdBrk0nFZaP54vvZj0yx7', '1UHS8Rf6h5Ar3CDWRd3wjF', '1k2pQc5i348DCHwbn5KTdc', '7221xIgOnuakPdLqT0F3nP', '7FOgcfdz9Nx5V9lCNXdBYv', '102YUQbYmwdBXS7jwamI90', '0mflMxspEfB0VbI1kyLiAv', '3WOhcATHxK2SLNeP5W3v1v', '2FQrifJ1N335Ljm3TjTVVf', '7tI8dRuH2Yc6RuoTjxo4dU', '21B4gaTWnTkuSh77iWEXdS', '19RybK6XDbAVpcdxSbZL1o', '0UYnhUfnUj5adChuAXvLUB', '3WSOUb3U7tqURbBSgZTrZX', '3QaPy1KgI7nu9FJEQUgn6h', '629DixmZGHc7ILtEntuiWE', '2QjOHCTQ1Jl3zawyYOpxh6', '5fZJQrFKWQLb7FpJXZ1g7K', '51eSHglvG1RJXtL3qI5trr', '3iPIDAFybaoyqX7hvAfWkl', '5oIVNm56t6OIf9ZjdEG3ud', '3Vr3zh0r7ALn8VLqCiRR10', '3xkHsmpQCBMytMJNiDf3Ii', '1BxfuPKGuaTgP7aM0Bbdwr', '5IZXB5IKAD2qlvTPJYDCFB', '4w2GLmK2wnioVnb5CPQeex', '53IRnAWx13PYmoVYtemUBS', '3qhlB30KknSejmIvZZLjOD', '0XkZmBCCcdMY0EPY8ij6Gb', '1bjeWoagtHmUKputLVyDxQ', '0AjmK0Eai4zGrLaJwPvrDp', '7BRD7x5pt8Lqa1eGYC4dzj', '7iQMm50NNw

In [8]:
def get_playlist_tracks(playlist_id, limit=100):
    """
    Fetches track IDs from a specific playlist.

    Args:
    - playlist_id (str): The Spotify playlist ID.
    - limit (int): Number of tracks to fetch (max 100 per request).

    Returns:
    - track_ids (list): List of track IDs from the playlist.
    """
    track_ids = []
    results = sp.playlist_tracks(playlist_id, limit=limit)

    # Collect track IDs from the playlist
    for item in results['items']:
        track = item['track']
        track_ids.append(track['id'])

    return track_ids

# Example usage
playlist_id = '37i9dQZEVXbMDoHDwVN2tF'  # Spotify Top 50 Global playlist
top_50_tracks = get_playlist_tracks(playlist_id, limit=50)
print(top_50_tracks)


['2plbrEY59IikOBgBGLjaoe', '6dOtVTDdiauQNBQEDOtlAB', '5G2f63n7IPVPPjfNIGih7Q', '7tI8dRuH2Yc6RuoTjxo4dU', '2qSkIjg1o9h3YT9RAgYN75', '0WbMK4wrZ1wFSty9F7FCgu', '6WatFBLVB0x077xWeoVc2k', '5N3hjp1WNayUPZrA8kJmJP', '2PnlsTsOTLE5jnBnNe2K0A', '3xkHsmpQCBMytMJNiDf3Ii', '1UHS8Rf6h5Ar3CDWRd3wjF', '5fZJQrFKWQLb7FpJXZ1g7K', '17phhZDn6oGtzMe56NuWvj', '2cZOYofOX4d6g0OXxkaIjA', '3hRV0jL3vUpRrcy398teAU', '5Z0UnEtpLDQyYlWwgi8m9C', '7CyPwkp0oE8Ro9Dd5CUDjW', '2esZG2XFtuoWWA9AfDvSxy', '7z7kvUQGwlC6iOl7vMuAr9', '3WOhcATHxK2SLNeP5W3v1v', '0OA00aPt3BV10qeMIs3meW', '2QjOHCTQ1Jl3zawyYOpxh6', '5XeFesFbtLpXzIVDNQP22n', '6AI3ezQ4o3HUoP6Dhudph3', '4xdBrk0nFZaP54vvZj0yx7', '5AJ9hqTS2wcFQCELCFRO7A', '5IZXB5IKAD2qlvTPJYDCFB', '51ZQ1vr10ffzbwIjDCwqm4', '2nLtzopw4rPReszdYBJU6h', '42VsgItocQwOQC3XWZ8JNA', '62bOmKYxYg7dhrC6gH9vFn', '51rfRCiUSvxXlCSCfIztBy', '7ov3TDp5D00Rnu5R1viX4w', '0UYnhUfnUj5adChuAXvLUB', '3QaPy1KgI7nu9FJEQUgn6h', '3qhlB30KknSejmIvZZLjOD', '3AJwUDP919kvQ9QcozQPxg', '2aYZaN5SmkRDLsrrV8GkBQ', '1BxfuPKGua

In [9]:
def get_user_saved_tracks(limit=50):
    """
    Fetches the current user's saved track IDs.

    Args:
    - limit (int): Number of saved tracks to fetch (max 50 per request).

    Returns:
    - track_ids (list): List of track IDs from the user's saved tracks.
    """
    track_ids = []

    # Get current user's saved tracks
    results = sp.current_user_saved_tracks(limit=limit)

    # Collect track IDs
    for item in results['items']:
        track = item['track']
        track_ids.append(track['id'])

    return track_ids

# Example usage
user_saved_tracks = get_user_saved_tracks(limit=50)
print(user_saved_tracks)


ERROR:spotipy.client:HTTP Error for GET to https://api.spotify.com/v1/me/tracks with Params: {'limit': 50, 'offset': 0, 'market': None} returned 403 due to Forbidden.


SpotifyException: http status: 403, code:-1 - https://api.spotify.com/v1/me/tracks?limit=50&offset=0:
 Forbidden., reason: None

In [10]:
def get_tracks_from_artist(artist_name, limit=50):
    """
    Fetches track IDs from albums of a specific artist.

    Args:
    - artist_name (str): The name of the artist.
    - limit (int): Number of tracks to fetch.

    Returns:
    - track_ids (list): List of track IDs.
    """
    track_ids = []

    # Search for the artist by name
    results = sp.search(q=f'artist:{artist_name}', type='artist', limit=1)
    artist = results['artists']['items'][0]
    artist_id = artist['id']

    # Get the artist's albums
    albums = sp.artist_albums(artist_id, limit=limit)

    # Collect track IDs from each album
    for album in albums['items']:
        album_tracks = sp.album_tracks(album['id'], limit=50)
        for track in album_tracks['items']:
            track_ids.append(track['id'])

    return track_ids

# Example usage
artist_tracks = get_tracks_from_artist('Taylor Swift', limit=50)
print(artist_tracks)


['6dODwocEuGzHAavXqTbwHv', '4PdLaGZubp4lghChqp8erB', '7uGYWMwRy24dm7RUDDhUlD', '1kbEbBdEgQdQeLXCJh28pJ', '7wAkQFShJ27V8362MqevQr', '4QMgEffJQuKtjCNvqfRZ0m', '7IWcDWOfiooH5hRs9XOVYz', '5ExOm0dh4NyRyAdSAO9hyM', '799KrpEbhZp0MHeiA8YK9P', '2d8UxVNhJinc8uat9PoM9y', '5chnRTB9qMK3W1M41SnU9s', '3YkNIrAvbKNrrwwEd7NVLl', '2fPvQfGQEZOKtJ9qXeL4x8', '1xtw1krCR6Dw2KwkXw5z63', '1tuNqJOtRQVHvONR8Lg3MZ', '4d9PtIEVij9jW5OaLinH66', '62E2nR0od0M5HYxuYLaDz7', '1kcwpPDQnqEqmezzXdJTCP', '4EF6IyONolQy0bIQXm2EmX', '1rmEsOezwf2lmIZTMAO5Ag', '5Bedn0svl0ZD7RGmJkmKKw', '7Mts0OfPorF4iwOomvfqn1', '3hlGuz3loYoLfI3bpwieWq', '7ogK4lJDVDMU6A6vYR5rvD', '1Zai5UJ2di3qEuR2HeT2s8', '18WFFUIsewmA8g31KAeo3e', '0g4fMVo4JjwnIpTfFfLdxS', '3zMDGj4D8ogaYgAIZPeU7S', '2913xXOVAIDAqxzV2g4VcU', '2CnjDMdpRjlWv04Xk3s6MW', '1DTRUYVd8rYpla9hhVVwjo', '2OzhQlSqBEmt7hmkYxfT6m', '3NMrVbIVWT3fPXBj0rNDKG', '2XXwLdtuAcE0HSCu61ijAb', '2F3N9tdombb64aW6VtZOdo', '3Vevii7qKqrmW8CcyzBHDl', '5og4Qzt92jJzVDkOtSEilb', '3fO566xJgwxIa3qGCGBvIC', '3ZVFcD8Wlw

### Bulk dataset gathering before getting recommendations/features

In [23]:
def get_available_genres():
    """
    Fetches a list of available genre seeds from Spotify API.

    Returns:
    - genres (list): List of available genres.
    """
    genres = sp.recommendation_genre_seeds()['genres']
    return genres

# Example usage
available_genres = get_available_genres()
print(available_genres)


['acoustic', 'afrobeat', 'alt-rock', 'alternative', 'ambient', 'anime', 'black-metal', 'bluegrass', 'blues', 'bossanova', 'brazil', 'breakbeat', 'british', 'cantopop', 'chicago-house', 'children', 'chill', 'classical', 'club', 'comedy', 'country', 'dance', 'dancehall', 'death-metal', 'deep-house', 'detroit-techno', 'disco', 'disney', 'drum-and-bass', 'dub', 'dubstep', 'edm', 'electro', 'electronic', 'emo', 'folk', 'forro', 'french', 'funk', 'garage', 'german', 'gospel', 'goth', 'grindcore', 'groove', 'grunge', 'guitar', 'happy', 'hard-rock', 'hardcore', 'hardstyle', 'heavy-metal', 'hip-hop', 'holidays', 'honky-tonk', 'house', 'idm', 'indian', 'indie', 'indie-pop', 'industrial', 'iranian', 'j-dance', 'j-idol', 'j-pop', 'j-rock', 'jazz', 'k-pop', 'kids', 'latin', 'latino', 'malay', 'mandopop', 'metal', 'metal-misc', 'metalcore', 'minimal-techno', 'movies', 'mpb', 'new-age', 'new-release', 'opera', 'pagode', 'party', 'philippines-opm', 'piano', 'pop', 'pop-film', 'post-dubstep', 'power-po

In [24]:
def build_large_track_dataset(genres, playlists, num_tracks_per_source=50):
    """
    Builds a dataset of tracks by combining tracks from multiple genres and playlists.

    Args:
    - genres (list): List of genres to search.
    - playlists (list): List of playlist IDs to pull tracks from.
    - num_tracks_per_source (int): Number of tracks to fetch per genre/playlist.

    Returns:
    - tracks_df (pd.DataFrame): DataFrame containing track information.
    """
    track_data = []

    # Fetch tracks by genre
    for genre in genres:
        results = sp.search(q=f'genre:{genre}', type='track', limit=num_tracks_per_source)
        tracks = results['tracks']['items']

        # Collect relevant track information
        for track in tracks:
            track_info = {
                'track_id': track['id'],
                'track_name': track['name'],
                'artist_name': track['artists'][0]['name'],  # Take the first artist listed
                'album_name': track['album']['name'],
                'release_date': track['album']['release_date'],
                'popularity': track['popularity'],
                'genre_source': genre  # Save which genre the track came from
            }
            track_data.append(track_info)

    # Fetch tracks from playlists
    for playlist_id in playlists:
        results = sp.playlist_tracks(playlist_id, limit=num_tracks_per_source)
        tracks = results['items']

        for item in tracks:
            track = item['track']
            track_info = {
                'track_id': track['id'],
                'track_name': track['name'],
                'artist_name': track['artists'][0]['name'],
                'album_name': track['album']['name'],
                'release_date': track['album']['release_date'],
                'popularity': track['popularity'],
                'playlist_source': playlist_id  # Save which playlist the track came from
            }
            track_data.append(track_info)

    # Convert list of track data to a DataFrame
    tracks_df = pd.DataFrame(track_data)

    return tracks_df

# Example usage
# genres = ['pop', 'rock', 'hip-hop']
playlists = ['37i9dQZEVXbMDoHDwVN2tF', '37i9dQZF1DWXRqgorJj26U']  # Top 50 Global and USA

# Fetch the dataset
tracks_df = build_large_track_dataset(available_genres, playlists, num_tracks_per_source=50)

# Display the first few rows of the dataframe
tracks_df


Unnamed: 0,track_id,track_name,artist_name,album_name,release_date,popularity,genre_source,playlist_source
0,1HMQmOWrkieKYWlFsjUP3D,Bloom - Bonus Track,The Paper Kites,Woodland,2013-03-05,77,acoustic,
1,6uHvbKL0Yi37AuvNRmUfMw,Paint,The Paper Kites,Young North,2013-03-05,73,acoustic,
2,7jIAttgQTpLDoNtykIQXjH,Blister In The Sun,Violent Femmes,Violent Femmes,1983-01-01,72,acoustic,
3,4E6cwWJWZw2zWf7VFbH7wf,Love Song,Sara Bareilles,Little Voice,2007-07-03,74,acoustic,
4,1jyddn36UN4tVsJGtaJfem,You Are the Best Thing,Ray LaMontagne,Gossip In The Grain,2008-10-13,69,acoustic,
...,...,...,...,...,...,...,...,...
5745,7GonnnalI2s19OCQO1J7Tf,Kickstart My Heart,Mötley Crüe,Dr. Feelgood,1989,2,,37i9dQZF1DWXRqgorJj26U
5746,5LNiqEqpDc8TuqPy79kDBu,Edge of Seventeen - 2016 Remaster,Stevie Nicks,Bella Donna (Deluxe Edition),2016-11-04,56,,37i9dQZF1DWXRqgorJj26U
5747,6NxsCnLeLd8Ai1TrgGxzIx,Bad Moon Rising,Creedence Clearwater Revival,Green River (40th Anniversary Edition),1969-08-03,0,,37i9dQZF1DWXRqgorJj26U
5748,5eYwDBLucWfWI5KsV7oYX2,Mary Jane's Last Dance,Tom Petty and the Heartbreakers,Anthology: Through The Years,2000-01-01,0,,37i9dQZF1DWXRqgorJj26U


In [25]:
tracks_df.to_csv('tracks_df_og.csv', index=False)

## Extract track features/get spotify recommendations to seed

In [49]:
# def add_audio_features_to_df(df):
#     """
#     Adds audio features to a DataFrame containing track information.

#     Args:
#     - df (pd.DataFrame): DataFrame with track information including 'track_id'.

#     Returns:
#     - df (pd.DataFrame): DataFrame with additional columns for audio features.
#     """
#     audio_features_list = []

#     # Fetch audio features for each track
#     for track_id in df['track_id']:
#         features = sp.audio_features(track_id)[0]  # Fetch audio features for the track
#         if features:
#             audio_features_list.append(features)
#         else:
#             audio_features_list.append({})

#     # Convert audio features list to DataFrame and merge with the original DataFrame
#     audio_features_df = pd.DataFrame(audio_features_list)
#     df_with_features = pd.concat([df.reset_index(drop=True), audio_features_df.reset_index(drop=True)], axis=1)

#     return df_with_features

# tracks_with_features_df = add_audio_features_to_df(tracks_df)
# tracks_with_features_df


ERROR:spotipy.client:Max Retries reached


SpotifyException: http status: 429, code:-1 - /v1/audio-features/?ids=1HMQmOWrkieKYWlFsjUP3D:
 Max Retries, reason: too many 429 error responses

In [37]:
import time
import logging

# Set up basic logging
logging.basicConfig(level=logging.WARNING)

def add_audio_features_to_df(df, max_retries=3, wait_time=2):
    """
    Adds audio features to a DataFrame containing track information with error handling.

    Args:
    - df (pd.DataFrame): DataFrame with track information including 'track_id'.
    - max_retries (int): Maximum number of retries before skipping a track.
    - wait_time (int): Time to wait in seconds before retrying a request (in case of rate limit).

    Returns:
    - df (pd.DataFrame): DataFrame with additional columns for audio features.
    """
    audio_features_list = []

    for index, track_id in enumerate(df['track_id']):
        retries = 0
        while retries < max_retries:
            try:
                # Fetch audio features for the track
                features = sp.audio_features(track_id)[0]
                if features:
                    audio_features_list.append(features)
                else:
                    audio_features_list.append({})
                break  # Exit the retry loop if successful

            except spotipy.exceptions.SpotifyException as e:
                # Handle rate limit (429 error)
                if e.http_status == 429:
                    retries += 1
                    # Get retry-after header to know how long to wait
                    retry_after = int(e.headers.get('Retry-After', wait_time))
                    logging.warning(f"Rate limited on track {track_id}, retrying in {retry_after} seconds...")
                    time.sleep(retry_after)  # Sleep for the time specified by Spotify

                else:
                    logging.error(f"Error fetching audio features for track {track_id}: {str(e)}")
                    audio_features_list.append({})
                    break  # Exit the retry loop if it's not a rate limit error

        if retries == max_retries:
            logging.warning(f"Max retries reached for track {track_id}, skipping...")
            audio_features_list.append({})  # Add empty dict if we skip the track

    # Convert the list of audio features to a DataFrame
    audio_features_df = pd.DataFrame(audio_features_list)

    # Concatenate the original DataFrame with the audio features DataFrame
    df_with_features = pd.concat([df.reset_index(drop=True), audio_features_df.reset_index(drop=True)], axis=1)

    return df_with_features

# Example usage
tracks_with_features_df = add_audio_features_to_df(tracks_df)
tracks_with_features_df


ERROR:spotipy.client:Max Retries reached
ERROR:spotipy.client:Max Retries reached
ERROR:spotipy.client:Max Retries reached


KeyboardInterrupt: 

In [29]:
tracks_with_features_df

Unnamed: 0,track_id,track_name,artist_name,album_name,release_date,popularity,genre_source,playlist_source,danceability,energy,...,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0WbMK4wrZ1wFSty9F7FCgu,"Good Luck, Babe!",Chappell Roan,"Good Luck, Babe!",2024-04-05,97,pop,,0.700,0.582,...,0.0881,0.785,116.712,audio_features,0WbMK4wrZ1wFSty9F7FCgu,spotify:track:0WbMK4wrZ1wFSty9F7FCgu,https://api.spotify.com/v1/tracks/0WbMK4wrZ1wF...,https://api.spotify.com/v1/audio-analysis/0WbM...,218424,4
1,6dOtVTDdiauQNBQEDOtlAB,BIRDS OF A FEATHER,Billie Eilish,HIT ME HARD AND SOFT,2024-05-17,100,pop,,0.747,0.507,...,0.1170,0.438,104.978,audio_features,6dOtVTDdiauQNBQEDOtlAB,spotify:track:6dOtVTDdiauQNBQEDOtlAB,https://api.spotify.com/v1/tracks/6dOtVTDdiauQ...,https://api.spotify.com/v1/audio-analysis/6dOt...,210373,4
2,2plbrEY59IikOBgBGLjaoe,Die With A Smile,Lady Gaga,Die With A Smile,2024-08-16,99,pop,,0.521,0.592,...,0.1220,0.535,157.969,audio_features,2plbrEY59IikOBgBGLjaoe,spotify:track:2plbrEY59IikOBgBGLjaoe,https://api.spotify.com/v1/tracks/2plbrEY59Iik...,https://api.spotify.com/v1/audio-analysis/2plb...,251668,3
3,5G2f63n7IPVPPjfNIGih7Q,Taste,Sabrina Carpenter,Short n' Sweet,2024-08-23,95,pop,,0.674,0.907,...,0.2970,0.721,112.964,audio_features,5G2f63n7IPVPPjfNIGih7Q,spotify:track:5G2f63n7IPVPPjfNIGih7Q,https://api.spotify.com/v1/tracks/5G2f63n7IPVP...,https://api.spotify.com/v1/audio-analysis/5G2f...,157280,4
4,5N3hjp1WNayUPZrA8kJmJP,Please Please Please,Sabrina Carpenter,Please Please Please,2024-06-06,96,pop,,0.669,0.586,...,0.1040,0.579,107.071,audio_features,5N3hjp1WNayUPZrA8kJmJP,spotify:track:5N3hjp1WNayUPZrA8kJmJP,https://api.spotify.com/v1/tracks/5N3hjp1WNayU...,https://api.spotify.com/v1/audio-analysis/5N3h...,186365,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
245,7GonnnalI2s19OCQO1J7Tf,Kickstart My Heart,Mötley Crüe,Dr. Feelgood,1989,2,,37i9dQZF1DWXRqgorJj26U,0.359,0.981,...,0.2840,0.271,178.980,audio_features,7GonnnalI2s19OCQO1J7Tf,spotify:track:7GonnnalI2s19OCQO1J7Tf,https://api.spotify.com/v1/tracks/7GonnnalI2s1...,https://api.spotify.com/v1/audio-analysis/7Gon...,282653,4
246,5LNiqEqpDc8TuqPy79kDBu,Edge of Seventeen - 2016 Remaster,Stevie Nicks,Bella Donna (Deluxe Edition),2016-11-04,56,,37i9dQZF1DWXRqgorJj26U,0.591,0.804,...,0.0818,0.658,111.457,audio_features,5LNiqEqpDc8TuqPy79kDBu,spotify:track:5LNiqEqpDc8TuqPy79kDBu,https://api.spotify.com/v1/tracks/5LNiqEqpDc8T...,https://api.spotify.com/v1/audio-analysis/5LNi...,329413,4
247,6NxsCnLeLd8Ai1TrgGxzIx,Bad Moon Rising,Creedence Clearwater Revival,Green River (40th Anniversary Edition),1969-08-03,0,,37i9dQZF1DWXRqgorJj26U,0.647,0.762,...,0.0705,0.930,89.837,audio_features,6NxsCnLeLd8Ai1TrgGxzIx,spotify:track:6NxsCnLeLd8Ai1TrgGxzIx,https://api.spotify.com/v1/tracks/6NxsCnLeLd8A...,https://api.spotify.com/v1/audio-analysis/6Nxs...,141600,4
248,5eYwDBLucWfWI5KsV7oYX2,Mary Jane's Last Dance,Tom Petty and the Heartbreakers,Anthology: Through The Years,2000-01-01,0,,37i9dQZF1DWXRqgorJj26U,0.402,0.814,...,0.2660,0.516,170.020,audio_features,5eYwDBLucWfWI5KsV7oYX2,spotify:track:5eYwDBLucWfWI5KsV7oYX2,https://api.spotify.com/v1/tracks/5eYwDBLucWfW...,https://api.spotify.com/v1/audio-analysis/5eYw...,272267,4


In [18]:
# def get_track_features(track_ids):
#     """
#     Fetches audio features for a list of track IDs.

#     Args:
#     - track_ids (list): List of Spotify track IDs.

#     Returns:
#     - features (list): A list of dictionaries containing audio features for each track.
#     """
#     features = []

#     # Fetch audio features in batches
#     for i in range(0, len(track_ids), 100):  # 100 is the maximum batch size per request
#         audio_features = sp.audio_features(track_ids[i:i+100])
#         features.extend(audio_features)

#     return features

# # Example usage
# # track_ids = ['track_id1', 'track_id2', 'track_id3']  # Replace with actual track IDs
# track_features = get_track_features(large_track_id_array)
# track_features

[{'danceability': 0.7,
  'energy': 0.582,
  'key': 11,
  'loudness': -5.96,
  'mode': 0,
  'speechiness': 0.0356,
  'acousticness': 0.0502,
  'instrumentalness': 0,
  'liveness': 0.0881,
  'valence': 0.785,
  'tempo': 116.712,
  'type': 'audio_features',
  'id': '0WbMK4wrZ1wFSty9F7FCgu',
  'uri': 'spotify:track:0WbMK4wrZ1wFSty9F7FCgu',
  'track_href': 'https://api.spotify.com/v1/tracks/0WbMK4wrZ1wFSty9F7FCgu',
  'analysis_url': 'https://api.spotify.com/v1/audio-analysis/0WbMK4wrZ1wFSty9F7FCgu',
  'duration_ms': 218424,
  'time_signature': 4},
 {'danceability': 0.747,
  'energy': 0.507,
  'key': 2,
  'loudness': -10.171,
  'mode': 1,
  'speechiness': 0.0358,
  'acousticness': 0.2,
  'instrumentalness': 0.0608,
  'liveness': 0.117,
  'valence': 0.438,
  'tempo': 104.978,
  'type': 'audio_features',
  'id': '6dOtVTDdiauQNBQEDOtlAB',
  'uri': 'spotify:track:6dOtVTDdiauQNBQEDOtlAB',
  'track_href': 'https://api.spotify.com/v1/tracks/6dOtVTDdiauQNBQEDOtlAB',
  'analysis_url': 'https://api.sp

In [32]:
# def get_recommendations(seed_tracks, limit=10):
#     """
#     Fetches track recommendations based on seed tracks.

#     Args:
#     - seed_tracks (list): List of seed track IDs.
#     - limit (int): Number of recommendations to fetch.

#     Returns:
#     - recommendations (list): List of recommended track objects.
#     """
#     recommendations = sp.recommendations(seed_tracks=seed_tracks, limit=limit)
#     return recommendations['tracks']

# # Example usage
# recommendations = get_recommendations(tracks_with_features_df['track_id'])
# recommendations

def get_recommendations(seed_tracks, limit=10):
    """
    Fetches track recommendations based on seed tracks.

    Args:
    - seed_tracks (list): List of seed track IDs.
    - limit (int): Number of recommendations to fetch.

    Returns:
    - recommendations (list): List of recommended track objects.
    """
    recommendations = sp.recommendations(seed_tracks=seed_tracks, limit=limit)
    return recommendations['tracks']

# Example usage
seed_tracks_list = tracks_with_features_df['track_id'].tolist()  # Convert to list
spotify_recommendations = get_recommendations(seed_tracks_list[:5])  # Fetch recommendations using the first 5 track IDs as seeds
print(spotify_recommendations)


[{'album': {'album_type': 'ALBUM', 'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/1QAJqy2dA3ihHBFIHRphZj'}, 'href': 'https://api.spotify.com/v1/artists/1QAJqy2dA3ihHBFIHRphZj', 'id': '1QAJqy2dA3ihHBFIHRphZj', 'name': 'Cigarettes After Sex', 'type': 'artist', 'uri': 'spotify:artist:1QAJqy2dA3ihHBFIHRphZj'}], 'available_markets': ['AR', 'AT', 'BE', 'BO', 'BR', 'BG', 'CA', 'CL', 'CO', 'CR', 'CY', 'CZ', 'DK', 'DO', 'DE', 'EC', 'EE', 'SV', 'FI', 'FR', 'GR', 'GT', 'HN', 'HK', 'HU', 'IS', 'IE', 'IT', 'LV', 'LT', 'LU', 'MY', 'MT', 'MX', 'NL', 'NI', 'NO', 'PA', 'PY', 'PE', 'PH', 'PL', 'PT', 'SG', 'SK', 'ES', 'SE', 'CH', 'TW', 'TR', 'UY', 'US', 'GB', 'AD', 'LI', 'MC', 'ID', 'JP', 'TH', 'VN', 'RO', 'IL', 'ZA', 'SA', 'AE', 'BH', 'QA', 'OM', 'KW', 'EG', 'MA', 'DZ', 'TN', 'LB', 'JO', 'PS', 'IN', 'BY', 'KZ', 'MD', 'UA', 'AL', 'BA', 'HR', 'ME', 'MK', 'RS', 'SI', 'KR', 'BD', 'PK', 'LK', 'GH', 'KE', 'NG', 'TZ', 'UG', 'AG', 'AM', 'BS', 'BB', 'BZ', 'BT', 'BW', 'BF', 'CV', 'CW',

In [35]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

# Step 1: Normalize audio features
scaler = MinMaxScaler()
tracks_with_normalized_features = tracks_with_features_df.copy()
numerical_columns = ['danceability', 'energy', 'tempo', 'acousticness', 'valence']
tracks_with_normalized_features[numerical_columns] = scaler.fit_transform(
    tracks_with_normalized_features[numerical_columns])

# Step 2: Fetch recommendations and build the target dataframe
recommendations = []
for track_id in tracks_with_normalized_features['track_id']:
    recs = get_recommendations([track_id], limit=5)  # Get 5 recommendations
    rec_track_ids = [track['id'] for track in recs]
    recommendations.append({'track_id': track_id, 'recommended_tracks': rec_track_ids})

recommendations_df = pd.DataFrame(recommendations)
recommendations_df

KeyboardInterrupt: 

## Build seed dataset

In [None]:
import pandas as pd

def build_dataset(seed_tracks, num_recommendations=20):
    """
    Builds a dataset of tracks and their features, including recommendations.

    Args:
    - seed_tracks (list): List of seed track IDs.
    - num_recommendations (int): Number of recommendations to fetch.

    Returns:
    - df (pd.DataFrame): A DataFrame containing track features and recommendations.
    """
    # Get initial track features for seed tracks
    seed_track_features = get_track_features(seed_tracks)

    # Get recommended tracks
    recommended_tracks = get_recommendations(seed_tracks, limit=num_recommendations)

    # Extract the IDs of the recommended tracks
    recommended_track_ids = [track['id'] for track in recommended_tracks]

    # Get audio features for recommended tracks
    recommended_track_features = get_track_features(recommended_track_ids)

    # Combine seed track features and recommended track features
    all_features = seed_track_features + recommended_track_features

    # Convert to DataFrame
    df = pd.DataFrame(all_features)

    return df

# Example usage
seed_tracks = ['track_id1', 'track_id2', 'track_id3']  # Replace with actual seed track IDs
df = build_dataset(seed_tracks)
print(df.head())


## Content-Based Filtering
Content-based filtering recommends songs based on features of the song itself (e.g., audio features). This method can be useful when starting with a general dataset and allows you to build a recommendation engine that compares items directly.

Steps:

Collect features such as tempo, energy, danceability, acousticness, valence, etc., using Spotify’s API.
Use these features to compute song similarity (e.g., cosine similarity).
Recommend songs that are closest to the song features using similarity metrics.

In [42]:
tracks_with_features_df

Unnamed: 0,track_id,track_name,artist_name,album_name,release_date,popularity,genre_source,playlist_source,danceability,energy,...,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0WbMK4wrZ1wFSty9F7FCgu,"Good Luck, Babe!",Chappell Roan,"Good Luck, Babe!",2024-04-05,97,pop,,0.700,0.582,...,0.0881,0.785,116.712,audio_features,0WbMK4wrZ1wFSty9F7FCgu,spotify:track:0WbMK4wrZ1wFSty9F7FCgu,https://api.spotify.com/v1/tracks/0WbMK4wrZ1wF...,https://api.spotify.com/v1/audio-analysis/0WbM...,218424,4
1,6dOtVTDdiauQNBQEDOtlAB,BIRDS OF A FEATHER,Billie Eilish,HIT ME HARD AND SOFT,2024-05-17,100,pop,,0.747,0.507,...,0.1170,0.438,104.978,audio_features,6dOtVTDdiauQNBQEDOtlAB,spotify:track:6dOtVTDdiauQNBQEDOtlAB,https://api.spotify.com/v1/tracks/6dOtVTDdiauQ...,https://api.spotify.com/v1/audio-analysis/6dOt...,210373,4
2,2plbrEY59IikOBgBGLjaoe,Die With A Smile,Lady Gaga,Die With A Smile,2024-08-16,99,pop,,0.521,0.592,...,0.1220,0.535,157.969,audio_features,2plbrEY59IikOBgBGLjaoe,spotify:track:2plbrEY59IikOBgBGLjaoe,https://api.spotify.com/v1/tracks/2plbrEY59Iik...,https://api.spotify.com/v1/audio-analysis/2plb...,251668,3
3,5G2f63n7IPVPPjfNIGih7Q,Taste,Sabrina Carpenter,Short n' Sweet,2024-08-23,95,pop,,0.674,0.907,...,0.2970,0.721,112.964,audio_features,5G2f63n7IPVPPjfNIGih7Q,spotify:track:5G2f63n7IPVPPjfNIGih7Q,https://api.spotify.com/v1/tracks/5G2f63n7IPVP...,https://api.spotify.com/v1/audio-analysis/5G2f...,157280,4
4,5N3hjp1WNayUPZrA8kJmJP,Please Please Please,Sabrina Carpenter,Please Please Please,2024-06-06,96,pop,,0.669,0.586,...,0.1040,0.579,107.071,audio_features,5N3hjp1WNayUPZrA8kJmJP,spotify:track:5N3hjp1WNayUPZrA8kJmJP,https://api.spotify.com/v1/tracks/5N3hjp1WNayU...,https://api.spotify.com/v1/audio-analysis/5N3h...,186365,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
245,7GonnnalI2s19OCQO1J7Tf,Kickstart My Heart,Mötley Crüe,Dr. Feelgood,1989,2,,37i9dQZF1DWXRqgorJj26U,0.359,0.981,...,0.2840,0.271,178.980,audio_features,7GonnnalI2s19OCQO1J7Tf,spotify:track:7GonnnalI2s19OCQO1J7Tf,https://api.spotify.com/v1/tracks/7GonnnalI2s1...,https://api.spotify.com/v1/audio-analysis/7Gon...,282653,4
246,5LNiqEqpDc8TuqPy79kDBu,Edge of Seventeen - 2016 Remaster,Stevie Nicks,Bella Donna (Deluxe Edition),2016-11-04,56,,37i9dQZF1DWXRqgorJj26U,0.591,0.804,...,0.0818,0.658,111.457,audio_features,5LNiqEqpDc8TuqPy79kDBu,spotify:track:5LNiqEqpDc8TuqPy79kDBu,https://api.spotify.com/v1/tracks/5LNiqEqpDc8T...,https://api.spotify.com/v1/audio-analysis/5LNi...,329413,4
247,6NxsCnLeLd8Ai1TrgGxzIx,Bad Moon Rising,Creedence Clearwater Revival,Green River (40th Anniversary Edition),1969-08-03,0,,37i9dQZF1DWXRqgorJj26U,0.647,0.762,...,0.0705,0.930,89.837,audio_features,6NxsCnLeLd8Ai1TrgGxzIx,spotify:track:6NxsCnLeLd8Ai1TrgGxzIx,https://api.spotify.com/v1/tracks/6NxsCnLeLd8A...,https://api.spotify.com/v1/audio-analysis/6Nxs...,141600,4
248,5eYwDBLucWfWI5KsV7oYX2,Mary Jane's Last Dance,Tom Petty and the Heartbreakers,Anthology: Through The Years,2000-01-01,0,,37i9dQZF1DWXRqgorJj26U,0.402,0.814,...,0.2660,0.516,170.020,audio_features,5eYwDBLucWfWI5KsV7oYX2,spotify:track:5eYwDBLucWfWI5KsV7oYX2,https://api.spotify.com/v1/tracks/5eYwDBLucWfW...,https://api.spotify.com/v1/audio-analysis/5eYw...,272267,4


In [43]:
list(tracks_with_features_df)

['track_id',
 'track_name',
 'artist_name',
 'album_name',
 'release_date',
 'popularity',
 'genre_source',
 'playlist_source',
 'danceability',
 'energy',
 'key',
 'loudness',
 'mode',
 'speechiness',
 'acousticness',
 'instrumentalness',
 'liveness',
 'valence',
 'tempo',
 'type',
 'id',
 'uri',
 'track_href',
 'analysis_url',
 'duration_ms',
 'time_signature']

In [56]:
from sklearn.metrics.pairwise import cosine_similarity

# Example of fetching and comparing Spotify song features
song_features = tracks_with_features_df[['danceability',
 'energy',
 'key',
 'loudness',
 'mode',
 'speechiness',
 'acousticness',
 'instrumentalness',
 'liveness',
 'valence',
 'tempo']]
similarity_matrix = cosine_similarity(song_features)

# Recommend similar songs
def recommend_songs(song_id, similarity_matrix, df, top_n=10):
    song_index = df[df['id'] == song_id].index[0]
    similar_indices = similarity_matrix[song_index].argsort()[::-1][1:top_n+1]
    return df.iloc[similar_indices]

recommendations = recommend_songs('1k2pQc5i348DCHwbn5KTdc', similarity_matrix, tracks_with_features_df)
recommendations

Unnamed: 0,track_id,track_name,artist_name,album_name,release_date,popularity,genre_source,playlist_source,danceability,energy,...,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
46,2OzhQlSqBEmt7hmkYxfT6m,Fortnight (feat. Post Malone),Taylor Swift,THE TORTURED POETS DEPARTMENT,2024-04-18,88,pop,,0.504,0.386,...,0.0961,0.281,192.004,audio_features,2OzhQlSqBEmt7hmkYxfT6m,spotify:track:2OzhQlSqBEmt7hmkYxfT6m,https://api.spotify.com/v1/tracks/2OzhQlSqBEmt...,https://api.spotify.com/v1/audio-analysis/2Ozh...,228965,4
169,3WOhcATHxK2SLNeP5W3v1v,Guess featuring Billie Eilish,Charli xcx,Guess featuring Billie Eilish,2024-08-01,92,,37i9dQZEVXbMDoHDwVN2tF,0.776,0.667,...,0.0761,0.618,130.019,audio_features,3WOhcATHxK2SLNeP5W3v1v,spotify:track:3WOhcATHxK2SLNeP5W3v1v,https://api.spotify.com/v1/tracks/3WOhcATHxK2S...,https://api.spotify.com/v1/audio-analysis/3WOh...,145219,4
13,3WOhcATHxK2SLNeP5W3v1v,Guess featuring Billie Eilish,Charli xcx,Guess featuring Billie Eilish,2024-08-01,92,pop,,0.776,0.667,...,0.0761,0.618,130.019,audio_features,3WOhcATHxK2SLNeP5W3v1v,spotify:track:3WOhcATHxK2SLNeP5W3v1v,https://api.spotify.com/v1/tracks/3WOhcATHxK2S...,https://api.spotify.com/v1/audio-analysis/3WOh...,145219,4
198,6X6m4xmdFcz31p1h7Qg68c,EL LOKERON,Tito Double P,INCÓMODO,2024-08-22,84,,37i9dQZEVXbMDoHDwVN2tF,0.724,0.689,...,0.0922,0.936,120.208,audio_features,6X6m4xmdFcz31p1h7Qg68c,spotify:track:6X6m4xmdFcz31p1h7Qg68c,https://api.spotify.com/v1/tracks/6X6m4xmdFcz3...,https://api.spotify.com/v1/audio-analysis/6X6m...,146141,3
239,4JNi40t7xR5bO3PWxRkiPN,Free Bird,Lynyrd Skynyrd,(Pronounced 'Leh-'Nérd 'Skin-'Nérd) [Expanded ...,1973,0,,37i9dQZF1DWXRqgorJj26U,0.279,0.852,...,0.0624,0.438,117.418,audio_features,4JNi40t7xR5bO3PWxRkiPN,spotify:track:4JNi40t7xR5bO3PWxRkiPN,https://api.spotify.com/v1/tracks/4JNi40t7xR5b...,https://api.spotify.com/v1/audio-analysis/4JNi...,558933,4
235,5UwbnHhjnbinJH8TefuQfN,Long Cool Woman (In a Black Dress) - 1999 Rema...,The Hollies,Distant Light (1999 Remaster),1971-10-08,38,,37i9dQZF1DWXRqgorJj26U,0.757,0.868,...,0.355,0.815,138.923,audio_features,5UwbnHhjnbinJH8TefuQfN,spotify:track:5UwbnHhjnbinJH8TefuQfN,https://api.spotify.com/v1/tracks/5UwbnHhjnbin...,https://api.spotify.com/v1/audio-analysis/5Uwb...,199200,4
105,3xKsf9qdS1CyvXSMEid6g8,Pink + White,Frank Ocean,Blonde,2016-08-20,88,hip-hop,,0.545,0.545,...,0.417,0.549,159.94,audio_features,3xKsf9qdS1CyvXSMEid6g8,spotify:track:3xKsf9qdS1CyvXSMEid6g8,https://api.spotify.com/v1/tracks/3xKsf9qdS1Cy...,https://api.spotify.com/v1/audio-analysis/3xKs...,184516,3
136,3Rfre3qkrhwdZZ7dyznwbN,Lonely Road (with Jelly Roll),mgk,Lonely Road (with Jelly Roll),2024-07-26,81,hip-hop,,0.549,0.679,...,0.365,0.235,100.214,audio_features,3Rfre3qkrhwdZZ7dyznwbN,spotify:track:3Rfre3qkrhwdZZ7dyznwbN,https://api.spotify.com/v1/tracks/3Rfre3qkrhwd...,https://api.spotify.com/v1/audio-analysis/3Rfr...,189357,4
187,2aYZaN5SmkRDLsrrV8GkBQ,LA PATRULLA,Peso Pluma,ÉXODO,2024-06-20,87,,37i9dQZEVXbMDoHDwVN2tF,0.722,0.74,...,0.176,0.893,132.731,audio_features,2aYZaN5SmkRDLsrrV8GkBQ,spotify:track:2aYZaN5SmkRDLsrrV8GkBQ,https://api.spotify.com/v1/tracks/2aYZaN5SmkRD...,https://api.spotify.com/v1/audio-analysis/2aYZ...,130615,3
226,57JVGBtBLCfHw2muk5416J,Another One Bites The Dust - Remastered 2011,Queen,The Game (Deluxe Remastered Version),1980-06-27,73,,37i9dQZF1DWXRqgorJj26U,0.932,0.528,...,0.163,0.757,109.975,audio_features,57JVGBtBLCfHw2muk5416J,spotify:track:57JVGBtBLCfHw2muk5416J,https://api.spotify.com/v1/tracks/57JVGBtBLCfH...,https://api.spotify.com/v1/audio-analysis/57JV...,214653,4


In [46]:
recommendations

Unnamed: 0,track_id,track_name,artist_name,album_name,release_date,popularity,genre_source,playlist_source,danceability,energy,...,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
188,1BxfuPKGuaTgP7aM0Bbdwr,Cruel Summer,Taylor Swift,Lover,2019-08-23,92,,37i9dQZEVXbMDoHDwVN2tF,0.552,0.702,...,0.105,0.564,169.994,audio_features,1BxfuPKGuaTgP7aM0Bbdwr,spotify:track:1BxfuPKGuaTgP7aM0Bbdwr,https://api.spotify.com/v1/tracks/1BxfuPKGuaTg...,https://api.spotify.com/v1/audio-analysis/1Bxf...,178427,4
240,2X6gdRlGOQgfaXU9ALUQFQ,The Chain,Fleetwood Mac,Rumours,1977-02-04,0,,37i9dQZF1DWXRqgorJj26U,0.546,0.529,...,0.0383,0.574,151.727,audio_features,2X6gdRlGOQgfaXU9ALUQFQ,spotify:track:2X6gdRlGOQgfaXU9ALUQFQ,https://api.spotify.com/v1/tracks/2X6gdRlGOQgf...,https://api.spotify.com/v1/audio-analysis/2X6g...,271000,4
73,0gmbgwZ8iqyMPmXefof8Yf,How You Remind Me,Nickelback,Silver Side Up,2001-09-11,85,rock,,0.446,0.764,...,0.099,0.543,172.094,audio_features,0gmbgwZ8iqyMPmXefof8Yf,spotify:track:0gmbgwZ8iqyMPmXefof8Yf,https://api.spotify.com/v1/tracks/0gmbgwZ8iqyM...,https://api.spotify.com/v1/audio-analysis/0gmb...,223840,4
201,63T7DJ1AFDD6Bn8VzG6JE8,"Paint It, Black",The Rolling Stones,Aftermath,1966-04-15,83,,37i9dQZF1DWXRqgorJj26U,0.464,0.795,...,0.399,0.612,158.691,audio_features,63T7DJ1AFDD6Bn8VzG6JE8,spotify:track:63T7DJ1AFDD6Bn8VzG6JE8,https://api.spotify.com/v1/tracks/63T7DJ1AFDD6...,https://api.spotify.com/v1/audio-analysis/63T7...,202267,4
95,6dGnYIeXmHdcikdzNNDMm2,Here Comes The Sun - Remastered 2009,The Beatles,Abbey Road (Remastered),1969-09-26,85,rock,,0.557,0.54,...,0.179,0.394,129.177,audio_features,6dGnYIeXmHdcikdzNNDMm2,spotify:track:6dGnYIeXmHdcikdzNNDMm2,https://api.spotify.com/v1/tracks/6dGnYIeXmHdc...,https://api.spotify.com/v1/audio-analysis/6dGn...,185733,4
248,5eYwDBLucWfWI5KsV7oYX2,Mary Jane's Last Dance,Tom Petty and the Heartbreakers,Anthology: Through The Years,2000-01-01,0,,37i9dQZF1DWXRqgorJj26U,0.402,0.814,...,0.266,0.516,170.02,audio_features,5eYwDBLucWfWI5KsV7oYX2,spotify:track:5eYwDBLucWfWI5KsV7oYX2,https://api.spotify.com/v1/tracks/5eYwDBLucWfW...,https://api.spotify.com/v1/audio-analysis/5eYw...,272267,4
2,2plbrEY59IikOBgBGLjaoe,Die With A Smile,Lady Gaga,Die With A Smile,2024-08-16,99,pop,,0.521,0.592,...,0.122,0.535,157.969,audio_features,2plbrEY59IikOBgBGLjaoe,spotify:track:2plbrEY59IikOBgBGLjaoe,https://api.spotify.com/v1/tracks/2plbrEY59Iik...,https://api.spotify.com/v1/audio-analysis/2plb...,251668,3
150,2plbrEY59IikOBgBGLjaoe,Die With A Smile,Lady Gaga,Die With A Smile,2024-08-16,99,,37i9dQZEVXbMDoHDwVN2tF,0.521,0.592,...,0.122,0.535,157.969,audio_features,2plbrEY59IikOBgBGLjaoe,spotify:track:2plbrEY59IikOBgBGLjaoe,https://api.spotify.com/v1/tracks/2plbrEY59Iik...,https://api.spotify.com/v1/audio-analysis/2plb...,251668,3
230,61Q9oJNd9hJQFhSDh6Qlap,House Of The Rising Sun,The Animals,The Best Of The Animals,1966-02,57,,37i9dQZF1DWXRqgorJj26U,0.315,0.534,...,0.101,0.283,116.891,audio_features,61Q9oJNd9hJQFhSDh6Qlap,spotify:track:61Q9oJNd9hJQFhSDh6Qlap,https://api.spotify.com/v1/tracks/61Q9oJNd9hJQ...,https://api.spotify.com/v1/audio-analysis/61Q9...,269720,3
24,51eSHglvG1RJXtL3qI5trr,Slow It Down,Benson Boone,Fireworks & Rollerblades,2024-04-05,89,pop,,0.432,0.583,...,0.0933,0.544,181.489,audio_features,51eSHglvG1RJXtL3qI5trr,spotify:track:51eSHglvG1RJXtL3qI5trr,https://api.spotify.com/v1/tracks/51eSHglvG1RJ...,https://api.spotify.com/v1/audio-analysis/51eS...,161831,4


## Collaborative Filtering
Collaborative filtering leverages user preferences across a large set of users, analyzing patterns of co-listened songs. It's one of the most popular techniques for recommendations and can be useful when you move toward incorporating user data.

Types:

User-based collaborative filtering: Recommends songs based on what similar users have enjoyed.
Item-based collaborative filtering: Recommends songs based on what similar songs were listened to by others.
Example (Using Matrix Factorization with surprise library):

In [None]:
from surprise import Dataset, Reader, SVD
from surprise.model_selection import train_test_split
from surprise import accuracy

# Assuming you have user-song ratings data (e.g., from Spotify listens)
reader = Reader(rating_scale=(1, 5))  # Scale depends on how you collect feedback
data = Dataset.load_from_df(df[['user_id', 'song_id', 'rating']], reader)

trainset, testset = train_test_split(data, test_size=0.2)

# Use SVD for matrix factorization
algo = SVD()
algo.fit(trainset)

predictions = algo.test(testset)
accuracy.rmse(predictions)


## Matrix Factorization (Latent Factor Models)
Matrix factorization aims to find hidden factors in the data, which can capture underlying patterns in how users interact with songs. This is effective for implicit data (e.g., listen counts instead of explicit ratings).

Approach:

Use techniques like SVD (Singular Value Decomposition) or ALS (Alternating Least Squares).
You can use implicit feedback (e.g., play count) to build a matrix and decompose it to find relationships between users and songs.

In [None]:
from implicit.als import AlternatingLeastSquares
import scipy.sparse as sp

# User-song interaction matrix
user_song_matrix = sp.coo_matrix((df['play_count'], (df['user_id'], df['song_id'])))

model = AlternatingLeastSquares(factors=50, regularization=0.1, iterations=30)
model.fit(user_song_matrix)

# Recommend songs to a user
recommendations = model.recommend(user_id, user_song_matrix[user_id], N=10)
