# Song Recommendation System
This notebook implements a song recommendation system using NLP techniques and cosine similarity.

# Load Dataset
Load the Spotify dataset and display basic information about the data.
# Preprocess Text
Clean and preprocess the text data for analysis.
# Tokenization
Convert text into tokens and apply stemming to reduce words to their root forms.
# TF-IDF Vectorization
Transform the text data into numerical features using TF-IDF.
# Similarity Calculation
Calculate the similarity between songs based on their TF-IDF features.
# Recommendation Function
Define a function to recommend similar songs based on a given song.
# Save Data
Save the similarity matrix and processed data for future use.

In [30]:
import pandas as pd

# Load Dataset
Load the Spotify dataset and display basic information about the data.

In [31]:
# Load the dataset
df = pd.read_csv('data/spotify_millsongdata.csv')
print("Total number of songs:", len(df))
df.head()

Total number of songs: 57650


Unnamed: 0,artist,song,link,text
0,ABBA,Ahe's My Kind Of Girl,/a/abba/ahes+my+kind+of+girl_20598417.html,"Look at her face, it's a wonderful face \r\nA..."
1,ABBA,"Andante, Andante",/a/abba/andante+andante_20002708.html,"Take it easy with me, please \r\nTouch me gen..."
2,ABBA,As Good As New,/a/abba/as+good+as+new_20003033.html,I'll never know why I had to go \r\nWhy I had...
3,ABBA,Bang,/a/abba/bang_20598415.html,Making somebody happy is a question of give an...
4,ABBA,Bang-A-Boomerang,/a/abba/bang+a+boomerang_20002668.html,Making somebody happy is a question of give an...


In [32]:
df.tail(10)

Unnamed: 0,artist,song,link,text
57640,Zebrahead,The Setup,/z/zebrahead/the+setup_10198494.html,Lie to me \r\nTell me that everything will be...
57641,Ziggy Marley,Freedom Road,/z/ziggy+marley/freedom+road_20531174.html,"That's why I'm marching, yes, I'm marching, \..."
57642,Ziggy Marley,Friend,/z/ziggy+marley/friend_20673508.html,[Chorus] \r\nI wanna thank you for the things...
57643,Ziggy Marley,G7,/z/ziggy+marley/g7_20531173.html,Seven richest countries in the world \r\nThem...
57644,Ziggy Marley,Generation,/z/ziggy+marley/generation_20531171.html,Many generation have passed away \r\nFighting...
57645,Ziggy Marley,Good Old Days,/z/ziggy+marley/good+old+days_10198588.html,Irie days come on play \r\nLet the angels fly...
57646,Ziggy Marley,Hand To Mouth,/z/ziggy+marley/hand+to+mouth_20531167.html,Power to the workers \r\nMore power \r\nPowe...
57647,Zwan,Come With Me,/z/zwan/come+with+me_20148981.html,all you need \r\nis something i'll believe \...
57648,Zwan,Desire,/z/zwan/desire_20148986.html,northern star \r\nam i frightened \r\nwhere ...
57649,Zwan,Heartsong,/z/zwan/heartsong_20148991.html,come in \r\nmake yourself at home \r\ni'm a ...


In [33]:
df = df.sample(10000).drop('link',axis=1).reset_index(drop=True)
df.head(10)

Unnamed: 0,artist,song,text
0,Faith Hill,Little Drummer Boy,"Come, they told me \r\nPa, rum, pa, pum, pum ..."
1,Demi Lovato,Let It Go,"Let it go, let it go \r\nCan't hold you back ..."
2,James Taylor,How's The World Treating You,I've had nothing but sorrow \r\nSince you sai...
3,Quicksand,Unfulfilled,"To stand the test of time, \r\nTo stand alone..."
4,Ice Cube,Who's The Mack?,Straight gangsta mack \r\nStraight gangsta ma...
5,Tom Jones,Love Me Tonight,I know that it's late and I really must leave ...
6,Kenny Rogers,One More Day,Just one more day I ask \r\nOne chance to lov...
7,Dream Theater,Peruvian Skies,"There, there it is \r\nI swear he's gonna mur..."
8,Leo Sayer,Who Will The Next Fool Be,"Falling, oh, oh, I'm falling \r\nFalling so d..."
9,Chaka Khan,Nothing's Gonna Take You Away,"You, you carry your dreams in your eyes \r\nA..."


In [34]:
df.shape

(10000, 3)

In [35]:
df.isnull().sum()

artist    0
song      0
text      0
dtype: int64

In [36]:
df.duplicated().sum()

np.int64(0)

In [37]:
df['text'][0]

"Come, they told me  \r\nPa, rum, pa, pum, pum  \r\nA newborn king to see  \r\nPa, rum, pa, pum, pum  \r\nOur finest gifts we bring  \r\nPa, rum, pa, pum, pum  \r\nTo lay before the king  \r\nPa, rum, pa, pum, pum  \r\nRum, a pum pum  \r\nRum, pa, pum, pum  \r\nSo to honor him  \r\nPa, rum, pa, pum, pum  \r\nWhen we come  \r\n  \r\nLittle baby  \r\nPa, rum, pa, pum, pum  \r\nI am a poor boy, too  \r\nPa, rum, pa, pum, pum  \r\nI have no gift to bring  \r\nPa, rum, pa, pum, pum  \r\nThat's fit  \r\nTo give a king  \r\nPa, rum, pa, pum, pum  \r\nRum, pa, pum, pum  \r\nRum, pa, pum, pum  \r\nShall I play for you  \r\nPa, rum, pa, pum, pum  \r\nOn my drum  \r\n  \r\nYoi da adash  \r\nPa, rum, pa, pum, pum  \r\nThe ass  \r\nAnd lamb kept time  \r\nPa, rum, pa, pum, pum  \r\nI played my drum for him  \r\nPa, rum, pa, pum, pum  \r\nI played my best for him  \r\nPa, rum, pa, pum, pum  \r\nRum, pa, pum, pum  \r\nRum, pa, pum, pum  \r\nThen he smiled at me  \r\nPa, rum, pa, pum, pum  \r\nMe and 

# Preprocess Text
Clean and preprocess the text data for analysis.

In [38]:
df['text'] = df['text'].str.lower().replace(r'^\w\s',' ').replace(r'\n',' ',regex=True)
df.tail(10)

Unnamed: 0,artist,song,text
9990,Frankie Laine,On The Sunny Side Of The Street,walked with no one and talked with no one \r ...
9991,Frank Zappa,Cucamonga,"frank zappa (lead guitar, vocals) \r captain ..."
9992,Kate Bush,Hounds Of Love (Live) [Act One],"[intro: bertie mcintosh] \r ""it's in the tree..."
9993,Nazareth,Just To Get Into It,i know that she don't like me \r she knows th...
9994,Kinks,Mr. Churchill Says,"well mr. churchill says, mr. churchill says \..."
9995,Bon Jovi,Mrs. Robinson,"and here's to you, mrs. robinson \r jesus lov..."
9996,Mazzy Star,I've Been Let Down,i've been let down \r and i still comin' roun...
9997,Lionel Richie,Jesus Is Love (Long Version),"father, help your children \r and don't let t..."
9998,Grateful Dead,Big Railroad Blues,"well my mama told me, my papa told me too \r ..."
9999,Imagine Dragons,Look How Far We've Come,[intro] \r \r [verse 1] \r take me on a wh...


In [39]:
import nltk
from nltk.stem.porter import PorterStemmer
stemmer = PorterStemmer()

def tokenization(txt):
    tokens = nltk.word_tokenize(txt)
    stemming = [stemmer.stem(w) for w in tokens]
    return " ".join(stemming)

In [40]:
df['text'] = df['text'].apply(lambda x: tokenization(x))

In [41]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [42]:
tfidvector = TfidfVectorizer(analyzer='word',stop_words='english')
matrix = tfidvector.fit_transform(df['text'])
similarity = cosine_similarity(matrix)

In [43]:
similarity[0]

array([1.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
       8.20680872e-04, 3.45143036e-03, 3.90581610e-03])

In [44]:
df[df['song'] == "Was It Just Another Love Affair?"]

Unnamed: 0,artist,song,text
5673,Eurythmics,Was It Just Another Love Affair?,you do n't call me anymor but do n't you think...


In [45]:
def recommendation(song_df):
    idx = df[df['song'] == song_df].index[0]
    distances = sorted(list(enumerate(similarity[idx])),reverse=True,key=lambda x:x[1])
    
    songs = []
    for m_id in distances[1:21]:
        songs.append(df.iloc[m_id[0]].song)
        
    return songs

In [48]:
recommendation('Hounds Of Love (Live) [Act One]')

['If You Love Me',
 'Our Love',
 'I Can Help',
 'Our Love Was',
 'Is It Love',
 'I Love You',
 'Find A Way',
 'I Will Always Need Your Love',
 "I Guess I'll Always Love You",
 'I Love How To Love Me',
 'No More Looking For Love',
 'My Love Life',
 'Need You Next To Me',
 'Who Do You Love?',
 'Passing Through Air',
 'One Good Love',
 'Blue',
 'I Love All Of Me',
 "You've Been In Love",
 'Hawkmoon 269']

In [49]:
import pickle
pickle.dump(similarity,open('similarity.pkl','wb'))
pickle.dump(df,open('df.pkl','wb'))