# **Lyric-Based Music Recommendation System: Analyzing Decades of Top Billboard Songs**

# **K-NN CONTENT BASED FILTERING MODEL**

Steps of the k-NN Content-Based Filtering Model

    Loading the Dataset: Load the dataset into a Pandas DataFrame.
    Handling Missing Values: Clean the data by filling or dropping missing values.
    Text Normalization: Standardize text data to lowercase and remove special characters.
    Extract URL from Media Column: Extract URLs from the JSON-like Media column for linking to song media.
    Feature Extraction: Combine important text features into a single column for vectorization.
    Vectorization: Convert text data into numerical values using TF-IDF to prepare it for machine learning algorithms.
    Model Training: Train a k-NN model to find similar songs based on text features.
    Recommendation Function: Provide recommendations by finding the nearest neighbors (similar songs) to a given song.

In [1]:
# Importing  the required libraries

import pandas as pd
import re
import joblib
import ast
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.neighbors import NearestNeighbors
from sklearn.metrics.pairwise import cosine_similarity

In [2]:
# Importing the dataset  from local drive

data = pd.read_csv('/content/drive/MyDrive/MY PROJECTS/TOP 100 SONGS RECOMMENDATION /all_songs_data.csv')
data.head()

Unnamed: 0,Album,Album URL,Artist,Featured Artists,Lyrics,Media,Rank,Release Date,Song Title,Song URL,Writers,Year
0,Battle of New Orleans,https://genius.com/albums/Johnny-horton/Battle...,Johnny Horton,[],[Verse 1] In 1814 we took a little trip Along ...,[{'native_uri': 'spotify:track:0dwpdcQkeZqpuoA...,1,1959-04-01,The Battle Of New Orleans,https://genius.com/Johnny-horton-the-battle-of...,"[{'api_path': '/artists/561913', 'header_image...",1959.0
1,That’s All,https://genius.com/albums/Bobby-darin/That-s-all,Bobby Darin,[],"Oh the shark, babe Has such teeth, dear And he...",[{'native_uri': 'spotify:track:3E5ndyOfO6vFDEI...,2,,Mack The Knife,https://genius.com/Bobby-darin-mack-the-knife-...,"[{'api_path': '/artists/218851', 'header_image...",1959.0
2,“Mr Personality’s” 15 Big Hits,https://genius.com/albums/Lloyd-price/Mr-perso...,Lloyd Price,[],Over and over I tried to prove my love to you ...,"[{'provider': 'youtube', 'start': 0, 'type': '...",3,,Personality,https://genius.com/Lloyd-price-personality-lyrics,"[{'api_path': '/artists/355804', 'header_image...",1959.0
3,The Greatest Hits Of Frankie Avalon,https://genius.com/albums/Frankie-avalon/The-g...,Frankie Avalon,[],"Hey, Venus! Oh, Venus! Venus, if you will Ple...",[],4,,Venus,https://genius.com/Frankie-avalon-venus-lyrics,"[{'api_path': '/artists/1113175', 'header_imag...",1959.0
4,Paul Anka Sings His Big 15,https://genius.com/albums/Paul-anka/Paul-anka-...,Paul Anka,[],I'm just a lonely boy Lonely and blue I'm all ...,[],5,,Lonely Boy,https://genius.com/Paul-anka-lonely-boy-lyrics,[],1959.0


In [3]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6500 entries, 0 to 6499
Data columns (total 12 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Album             6036 non-null   object 
 1   Album URL         6036 non-null   object 
 2   Artist            6500 non-null   object 
 3   Featured Artists  6384 non-null   object 
 4   Lyrics            6384 non-null   object 
 5   Media             6384 non-null   object 
 6   Rank              6500 non-null   int64  
 7   Release Date      4563 non-null   object 
 8   Song Title        6500 non-null   object 
 9   Song URL          6384 non-null   object 
 10  Writers           6384 non-null   object 
 11  Year              6500 non-null   float64
dtypes: float64(1), int64(1), object(10)
memory usage: 609.5+ KB


In [4]:
data.describe()

Unnamed: 0,Rank,Year
count,6500.0,6500.0
mean,50.5,1991.0
std,28.868291,18.763106
min,1.0,1959.0
25%,25.75,1975.0
50%,50.5,1991.0
75%,75.25,2007.0
max,100.0,2023.0


In [5]:
# Checking whether null values present or not

data.isnull().sum()              # Null values present

Album                464
Album URL            464
Artist                 0
Featured Artists     116
Lyrics               116
Media                116
Rank                   0
Release Date        1937
Song Title             0
Song URL             116
Writers              116
Year                   0
dtype: int64

In [6]:
# Filling Missing values

data['Album'] = data['Album'].fillna('Unknown Album')
data['Featured Artists'] = data['Featured Artists'].fillna('None')
data['Release Date'] = data['Release Date'].fillna('Unknown')
data['Writers'] = data['Writers'].fillna('Unknown Writers')
data['Year'] = data['Year'].fillna(data['Year'].median())

In [7]:
data = data.dropna(subset = ['Album URL', 'Artist', 'Lyrics', 'Media', 'Song URL'])

In [8]:
data.isnull().sum()

Album               0
Album URL           0
Artist              0
Featured Artists    0
Lyrics              0
Media               0
Rank                0
Release Date        0
Song Title          0
Song URL            0
Writers             0
Year                0
dtype: int64

In [9]:
data.shape

(6036, 12)

In [10]:
# Checking the duplicte values present or not

data.duplicated().sum()     # No Duplicates in the dataset

0

In [11]:
# Normalize the text

def normalize_text(text):
  text = text.lower()
  text = re.sub(r'[^a-zA-Z0-9\s]', ' ', text)
  return text

In [12]:
data['Album']  = data['Album'].apply(normalize_text)
data['Artist'] = data['Artist'].apply(normalize_text)
data['Featured Artists'] = data['Featured Artists'].apply(normalize_text)
data['Lyrics'] = data['Lyrics'].apply(normalize_text)
data['Song Title'] = data['Song Title'].apply(normalize_text)
data['Writers'] = data['Writers'].apply(normalize_text)

In [13]:
# Extract url from Media columns

def extract_url(text):
  media_list = ast.literal_eval(text)
  if media_list and isinstance(media_list,list):
    return media_list[0].get('url', 'unknown')
  return 'unknown'

In [14]:
data['Media'] = data['Media'].apply(extract_url)

In [15]:
data.head()

Unnamed: 0,Album,Album URL,Artist,Featured Artists,Lyrics,Media,Rank,Release Date,Song Title,Song URL,Writers,Year
0,battle of new orleans,https://genius.com/albums/Johnny-horton/Battle...,johnny horton,,verse 1 in 1814 we took a little trip along ...,https://open.spotify.com/track/0dwpdcQkeZqpuoA...,1,1959-04-01,the battle of new orleans,https://genius.com/Johnny-horton-the-battle-of...,api path artists 561913 header image...,1959.0
1,that s all,https://genius.com/albums/Bobby-darin/That-s-all,bobby darin,,oh the shark babe has such teeth dear and he...,https://open.spotify.com/track/3E5ndyOfO6vFDEI...,2,Unknown,mack the knife,https://genius.com/Bobby-darin-mack-the-knife-...,api path artists 218851 header image...,1959.0
2,mr personality s 15 big hits,https://genius.com/albums/Lloyd-price/Mr-perso...,lloyd price,,over and over i tried to prove my love to you ...,http://www.youtube.com/watch?v=6UvvpiKShBI,3,Unknown,personality,https://genius.com/Lloyd-price-personality-lyrics,api path artists 355804 header image...,1959.0
3,the greatest hits of frankie avalon,https://genius.com/albums/Frankie-avalon/The-g...,frankie avalon,,hey venus oh venus venus if you will ple...,unknown,4,Unknown,venus,https://genius.com/Frankie-avalon-venus-lyrics,api path artists 1113175 header imag...,1959.0
4,paul anka sings his big 15,https://genius.com/albums/Paul-anka/Paul-anka-...,paul anka,,i m just a lonely boy lonely and blue i m all ...,unknown,5,Unknown,lonely boy,https://genius.com/Paul-anka-lonely-boy-lyrics,,1959.0


In [16]:
# FEATURE EXTRACTION

data['combined_features'] = data['Song Title'] + ' ' + data['Album'] + ' ' + data['Artist'] + ' ' + data['Featured Artists'] + ' ' + data['Writers']+ ' ' + data['Lyrics']

In [17]:
data.head()

Unnamed: 0,Album,Album URL,Artist,Featured Artists,Lyrics,Media,Rank,Release Date,Song Title,Song URL,Writers,Year,combined_features
0,battle of new orleans,https://genius.com/albums/Johnny-horton/Battle...,johnny horton,,verse 1 in 1814 we took a little trip along ...,https://open.spotify.com/track/0dwpdcQkeZqpuoA...,1,1959-04-01,the battle of new orleans,https://genius.com/Johnny-horton-the-battle-of...,api path artists 561913 header image...,1959.0,the battle of new orleans battle of new orlean...
1,that s all,https://genius.com/albums/Bobby-darin/That-s-all,bobby darin,,oh the shark babe has such teeth dear and he...,https://open.spotify.com/track/3E5ndyOfO6vFDEI...,2,Unknown,mack the knife,https://genius.com/Bobby-darin-mack-the-knife-...,api path artists 218851 header image...,1959.0,mack the knife that s all bobby darin ap...
2,mr personality s 15 big hits,https://genius.com/albums/Lloyd-price/Mr-perso...,lloyd price,,over and over i tried to prove my love to you ...,http://www.youtube.com/watch?v=6UvvpiKShBI,3,Unknown,personality,https://genius.com/Lloyd-price-personality-lyrics,api path artists 355804 header image...,1959.0,personality mr personality s 15 big hits llo...
3,the greatest hits of frankie avalon,https://genius.com/albums/Frankie-avalon/The-g...,frankie avalon,,hey venus oh venus venus if you will ple...,unknown,4,Unknown,venus,https://genius.com/Frankie-avalon-venus-lyrics,api path artists 1113175 header imag...,1959.0,venus the greatest hits of frankie avalon fran...
4,paul anka sings his big 15,https://genius.com/albums/Paul-anka/Paul-anka-...,paul anka,,i m just a lonely boy lonely and blue i m all ...,unknown,5,Unknown,lonely boy,https://genius.com/Paul-anka-lonely-boy-lyrics,,1959.0,lonely boy paul anka sings his big 15 paul ank...


In [18]:
# Vectorization
vectorizer = TfidfVectorizer(stop_words = 'english', max_features = 5000)

x = vectorizer.fit_transform(data['combined_features'])

In [19]:
# Save the vectorizer for further process

joblib.dump(vectorizer, '/content/drive/MyDrive/MY PROJECTS/TOP 100 SONGS RECOMMENDATION /vectorizer_knn.pkl')

['/content/drive/MyDrive/MY PROJECTS/TOP 100 SONGS RECOMMENDATION /vectorizer_knn.pkl']

In [20]:
# Model Training

model = NearestNeighbors(n_neighbors = 10, metric = 'cosine')

In [21]:
# Fit the model

model = model.fit(x)
model

In [22]:
# Save the model

joblib.dump(model,'/content/drive/MyDrive/MY PROJECTS/TOP 100 SONGS RECOMMENDATION /knn_model.pkl')

['/content/drive/MyDrive/MY PROJECTS/TOP 100 SONGS RECOMMENDATION /knn_model.pkl']

In [23]:
# Recommendation System


def recommendation_system(song_name, n_recommendations=10):

    # Load the model and vectorizer
    model = joblib.load('/content/drive/MyDrive/MY PROJECTS/TOP 100 SONGS RECOMMENDATION /knn_model.pkl')
    vectorizer = joblib.load('/content/drive/MyDrive/MY PROJECTS/TOP 100 SONGS RECOMMENDATION /vectorizer_knn.pkl')

    # Normalize the input song name
    normalized_song = normalize_text(song_name)

    # Check if the song is in the dataset
    if normalized_song not in data['Song Title'].values:
        return pd.DataFrame({'Error': [f"Song '{song_name}' not found in the dataset."]})

    # Find the index of the song in the dataset
    song_index = data[data['Song Title'] == normalized_song].index[0]

    # Transform the song's combined features using the vectorizer
    song_vector = vectorizer.transform([data.iloc[song_index]['combined_features']])

    # Find the nearest neighbors
    distances, indices = model.kneighbors(song_vector, n_neighbors=n_recommendations)

    # Get the recommendations as a DataFrame
    recommendations = data.iloc[indices[0]].copy()
    recommendations['Similarity'] = 1 - distances[0]

    # Convert URLs to clickable links
    recommendations['Media'] = recommendations['Media'].apply(lambda x: f'<a href="{x}" target="_blank">Media Link</a>')
    recommendations['Song URL'] = recommendations['Song URL'].apply(lambda x: f'<a href="{x}" target="_blank">Song Link</a>')
    recommendations['Album URL'] = recommendations['Album URL'].apply(lambda x: f'<a href="{x}" target="_blank">Album Link</a>')

    return recommendations



In [24]:
# Example of songs recommendation system


song_name = 'sea of love'
recommendations_df = recommendation_system(song_name)


In [26]:
print(recommendations_df)

                                          Album  \
2914                      tear down these walls   
664                                   mendocino   
5592  5 seconds of summer  bonus track version    
5858                  hopeless fountain kingdom   
310                                   hey  baby   
2001                                  bad girls   
4331                                rock steady   
2812                              crowded house   
4794                                       king   
5497                          how country feels   

                                              Album URL  \
2914  <a href="https://genius.com/albums/Billy-ocean...   
664   <a href="https://genius.com/albums/The-sir-dou...   
5592  <a href="https://genius.com/albums/5-seconds-o...   
5858  <a href="https://genius.com/albums/Halsey/Hope...   
310   <a href="https://genius.com/albums/Bruce-chann...   
2001  <a href="https://genius.com/albums/Donna-summe...   
4331  <a href="https://ge

In [27]:
song_name = 'lipstick on your collar'
recommendations_df = recommendation_system(song_name)
print(recommendations_df)

                                  Album  \
30                       red river rock   
3291                     the soul cages   
131   the greatest songs of the fifties   
2365                       private eyes   
45                              ulysses   
1201                     finnegans wake   
631                      finnegans wake   
271                      finnegans wake   
1687                     finnegans wake   
3714                     finnegans wake   

                                              Album URL  \
30    <a href="https://genius.com/albums/Johnny-and-...   
3291  <a href="https://genius.com/albums/Sting/The-s...   
131   <a href="https://genius.com/albums/Barry-manil...   
2365  <a href="https://genius.com/albums/Hall-and-oa...   
45    <a href="https://genius.com/albums/James-joyce...   
1201  <a href="https://genius.com/albums/James-joyce...   
631   <a href="https://genius.com/albums/James-joyce...   
271   <a href="https://genius.com/albums/James-joyce..