# Music Recommendation System Using Python

To get started with building a Music Recommendation System, we first need to have an access token. The access token serves as a temporary authorization credential, allowing the code to make authenticated requests to the Spotify API on behalf of the application. Below is how we can get it:

In [19]:
import requests
import base64

In [20]:
# Replace with your own Client ID and LCient Secret
CLIENT_ID = "ed618130280643578827c3c3dbee9e0d"
CLIENT_SECRET = "7e5c64ddb0e94bdca5ec3140ed2b3b51"

In [21]:
#Base64 encode the client ID and client secret
client_credentials = f"{CLIENT_ID}:{CLIENT_SECRET}"
client_credentials_base64 = base64.b64encode(client_credentials.encode())

In [23]:
# Request the acces token
token_url = 'https://accounts.spotify.com/api/token'
headers = {
    'Authorization': f'Basic {client_credentials_base64.decode()}'
}
data = {
    'grant_type': 'client_credentials'
}
response = requests.post(token_url, data=data, headers=headers)
if response.status_code == 200:
    access_token = response.json()['access_token']
    print("Access token obtained successfully.")
else:
    print("Error obtaining access token.")
    exit()

Access token obtained successfully.


In the above code, The CLIENT_ID and CLIENT_SECRET variables hold my credentials (you need to add your credentials in these variables) that uniquely identify the application making requests to the Spotify API. These credentials are obtained when a developer registers their application with Spotify’s developer dashboard. The Client ID identifies the application, while the Client Secret is a confidential key used for authentication.

With the access token, the application can now make authorized requests to retrieve music data, such as tracks, albums, artists, and user information, which is fundamental for building a music recommendation system using the Spotify API and Python.

Now, I’ll write a function to get music data from any playlist on Spotify. For this task, you need to install the Spotipy library, which is a Python library providing access to Spotify’s web API. 
- Here’s how to install it on your system by writing the command mentioned below in your command prompt or terminal:

In [9]:
pip install spotipy

Collecting spotipy
  Downloading spotipy-2.23.0-py3-none-any.whl (29 kB)
Collecting redis>=3.5.3 (from spotipy)
  Obtaining dependency information for redis>=3.5.3 from https://files.pythonhosted.org/packages/df/b2/dfdc17f701f7b587f6c89c2b9b6b5978c87a8a785555efc810b064c875de/redis-5.0.0-py3-none-any.whl.metadata
  Downloading redis-5.0.0-py3-none-any.whl.metadata (8.8 kB)
Downloading redis-5.0.0-py3-none-any.whl (250 kB)
   ---------------------------------------- 0.0/250.1 kB ? eta -:--:--
   ---------------------------------------- 250.1/250.1 kB 5.1 MB/s eta 0:00:00
Installing collected packages: redis, spotipy
Successfully installed redis-5.0.0 spotipy-2.23.0
Note: you may need to restart the kernel to use updated packages.


### now i am defining a function responsible for collectiing music data from any playlist on spotify using spotify library

In [24]:
import pandas as pd
import spotipy
from spotipy.oauth2 import SpotifyOAuth

In [50]:
def get_trending_playlist_data(playlist_id,access_token):
    # setup spotify with the access token
    sp = spotipy.Spotify(auth=access_token)
    
    #Get the track from the playlist
    playlist_tracks = sp.playlist_tracks(playlist_id, fields='items(track(id, name, artists, album(id, name)))')
    
    #Extract relevent information and store in a list of dictionaries
    music_data = []
    for track_info in playlist_tracks['items']:
        track = track_info['track']
        track_name = track['name']
        artists = ', '.join([artist['name'] for artist in track['artists']])
        album_name = track['album']['name']
        album_id = track['album']['id']
        track_id = track['id']
        
        # Get audio features for the track
        audio_features = sp.audio_features(track_id)[0] if track_id != 'Not available' else None
        
        # Get release date of the album
        try:
            album_info = sp.album(album_id) if album_id != 'Not available' else None
            release_date = album_info['release_date'] if album_info else None
        except:
            release_date = None
            
        # Get popularity of the track
        try:
            track_info = sp.track(tack_id) if track_id != 'Not available' else None
            popularity = track_info['popularity'] if track_info else None
        except:
            popularity = None
            
        # Add additional track information to the track data
        track_data = {
            'Track Name' : track_name,
            'Artists' : artists,
            'Album Name': album_name,
            'Album Id': album_id,
            'Track ID':track_id,
            'Popularity':popularity,
            'Release Date':release_date,
            'Duration (ms)': audio_features['duration_ms'] if audio_features else None,
            'Explicit URLs': track_info.get('external_urls',{}).get('splotify',None),
            'Danceability': audio_features['danceability'] if audio_features else None,
            'Energy': audio_features['energy'] if audio_features else None,
            'Key': audio_features['key'] if audio_features else None,
            'Loudness': audio_features['loudness'] if audio_features else None,
            'Mode': audio_features['mode'] if audio_features else None,
            'Speechiness': audio_features['speechiness'] if audio_features else None,
            'Acousticness': audio_features['acousticness'] if audio_features else None,
            'Instrumentalness': audio_features['instrumentalness'] if audio_features else None,
            'Liveness': audio_features['liveness'] if audio_features else None,
            'Valence': audio_features['valence'] if audio_features else None,
            'Tempo': audio_features['tempo'] if audio_features else None,
        }
        
        music_data.append(track_data)
        
    # Create a pandas DataFrame from the list of dictionaries
    df = pd.DataFrame(music_data)
    
    return df

In [51]:
playlist_id = '79g6ROrUpEvLXTFXqrkxVZ'

# call the function to get music data from the playlist and store it in dataframe
music_df = get_trending_playlist_data(playlist_id,access_token)
print(music_df)

                                           Track Name  \
0                                             FRIENDS   
1                                        Dance Monkey   
2                                            Señorita   
3   Love Me Like You Do - From "Fifty Shades Of Grey"   
4                                             7 rings   
..                                                ...   
95                                   Somebody To Love   
96                                      Material Girl   
97                     Whatta Man / Seven Nation Army   
98                                     Let's Get Loud   
99                            My Oh My (feat. DaBaby)   

                                              Artists  \
0                              Marshmello, Anne-Marie   
1                                         Tones And I   
2                        Shawn Mendes, Camila Cabello   
3                                      Ellie Goulding   
4                             

In [52]:
print(music_df.isnull().sum())

Track Name            0
Artists               0
Album Name            0
Album Id              0
Track ID              0
Popularity          100
Release Date          0
Duration (ms)         0
Explicit URLs       100
Danceability          0
Energy                0
Key                   0
Loudness              0
Mode                  0
Speechiness           0
Acousticness          0
Instrumentalness      0
Liveness              0
Valence               0
Tempo                 0
dtype: int64


In [53]:
music_df[:5]

Unnamed: 0,Track Name,Artists,Album Name,Album Id,Track ID,Popularity,Release Date,Duration (ms),Explicit URLs,Danceability,Energy,Key,Loudness,Mode,Speechiness,Acousticness,Instrumentalness,Liveness,Valence,Tempo
0,FRIENDS,"Marshmello, Anne-Marie",FRIENDS,1BmxOYHjQv1dKZRr13YRZM,08bNPGLD8AhKpnnERrAc6G,,2018-02-09,202621,,0.626,0.88,9,-2.384,0,0.0504,0.205,0.0,0.128,0.534,95.079
1,Dance Monkey,Tones And I,Dance Monkey (Stripped Back) / Dance Monkey,0UywfDKYlyiu1b38DRrzYD,2XU0oxnq2qxCpomAAuJY8K,,2019-10-17,209438,,0.824,0.588,6,-6.4,0,0.0924,0.692,0.000104,0.149,0.513,98.027
2,Señorita,"Shawn Mendes, Camila Cabello",Señorita,2ZaX1FdZCwchXl1QZiD4O4,0TK2YIli7K1leLovkQiNik,,2019-06-21,190960,,0.759,0.54,9,-6.039,0,0.0287,0.037,0.0,0.0945,0.75,116.947
3,"Love Me Like You Do - From ""Fifty Shades Of Grey""",Ellie Goulding,Delirium (Deluxe),20Ol6zZ0nLlc5EGTH1zA0j,3zHq9ouUJQFQRf3cm1rRLu,,2015-11-06,252534,,0.262,0.606,8,-6.646,1,0.0484,0.247,0.0,0.125,0.275,189.857
4,7 rings,Ariana Grande,"thank u, next",2fYhqwDWXjbpjaIJPEfKFw,6ocbgoVGwYJhOv1GgI9NsF,,2019-02-08,178627,,0.778,0.317,1,-10.732,0,0.334,0.592,0.0,0.0881,0.327,140.048


## Now lets move further to building a music recommendation system

In [36]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from datetime import datetime
from sklearn.metrics.pairwise import cosine_similarity

data = music_df

While providing music recommendations to users, it is important to recommend the latest releases. For this, we need to give more weight to the latest releases in the recommendations.

In [54]:
# fucntion to calculate weighted popularity scores based on release date

def calculate_weighted_popularity(release_date):
    # convert the release date to datetime object
    release_date = datetime.strptime(release_date, "%Y-%m-%d")
    
    # calculate the time span between release date and todays date
    time_span = datetime.now() - release_date
    
    # calculate the weighted popularity score based on tume span (e.g., more recent releases have higher weight)
    weight = 1/(time_span.days + 1)
    return weight

The idea behind this formula is that the weight decreases as the time span between the release date and today increases. More recent releases will have a higher weight, while older releases will have a lower weight. As a result, when combining this weighted popularity score with other factors in a recommendation system, recent tracks will have a more significant impact on the final recommendations, reflecting users’ potential interest in newer music.

In [55]:
scaler = MinMaxScaler()
music_features = music_df[['Danceability','Energy','Key','Loudness','Mode','Speechiness','Acousticness',
                          'Instrumentalness','Liveness','Valence','Tempo']].values
music_features_scaled = scaler.fit_transform(music_features)

We will create a hybrid recommendation system for music recommendations. The first approach will be based on **recommending music based on music audio features**, and the second approach will be based on **recommending music based on weighted popularity.

### Here's how to generate music recommendations based on the music audio features

In [56]:
def content_based_recommendations(input_song_name, num_recommendations=5):
    if input_song_name not in music_df['Track Name'].values:
        print(f"'{input_song_name}' not found in the dataset. Please enter a valid song name.")
        return

    # Get the index of the input song in the music DataFrame
    input_song_index = music_df[music_df['Track Name'] == input_song_name].index[0]

    # Calculate the similarity scores based on music features (cosine similarity)
    similarity_scores = cosine_similarity([music_features_scaled[input_song_index]], music_features_scaled)

    # Get the indices of the most similar songs
    similar_song_indices = similarity_scores.argsort()[0][::-1][1:num_recommendations + 1]

    # Get the names of the most similar songs based on content-based filtering
    content_based_recommendations = music_df.iloc[similar_song_indices][['Track Name', 'Artists', 'Album Name', 'Release Date', 'Popularity']]

    return content_based_recommendations

In [59]:
input_song_name = "FRIENDS"
recommendations = content_based_recommendations(input_song_name, num_recommendations=5)
print(f"Content based recommended songs for '{input_song_name}':")
# print(recommendations)
recommendations

Content based recommended songs for 'FRIENDS':


Unnamed: 0,Track Name,Artists,Album Name,Release Date,Popularity
23,Lush Life,Zara Larsson,So Good,2017-03-17,
42,Believer,Imagine Dragons,Evolve,2017-06-23,
25,Levitating,Dua Lipa,Future Nostalgia,2020-03-27,
43,Not Your Barbie Girl,Ava Max,Not Your Barbie Girl,2018-08-13,
19,Cheap Thrills,Sia,This Is Acting,2016-01-29,


The function calculates the similarity scores between the audio features of the input song and all other songs in the dataset. It uses cosine similarity, a common measure used in content-based filtering. The cosine_similarity function from scikit-learn is employed to compute these similarity scores.

The function identifies the num_recommendations most similar songs to the input song based on their audio features. It does this by sorting the similarity scores in descending order and selecting the top num_recommendations songs. The input song itself is excluded from the recommendations (hence the [1:num_recommendations + 1] slicing). The function then extracts the details (such as track name, artists, album name, release date, and popularity) of the most similar songs from the music_df DataFrame using the indices of the most similar songs.

# END