# Recommendation System
## This is a recommendation system that outputs a list of songs similar to an input song. There is an example at the end.

In [1]:
import numpy as np
import pandas as pd 
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from tensorflow.keras.layers import TextVectorization
from sklearn.preprocessing import LabelEncoder, StandardScaler

Import the dataset and clean the data by dropping NaN's, converting all strings to lowercase and removing commas.

In [None]:
df = pd.read_csv('tcc_ceds_music.csv')

In [26]:
df.dropna(inplace=True)

In [27]:
def clean_data(x):
    if isinstance(x,list):
        return [str.lower(i.replace(" ", "")) for i in x]
    else:
        if isinstance(x,str):
            return str.lower(x.replace(" ", ""))
        else:
            return ''

Create the dataframe of features used in the recommender system and clean those columns in the data. Then make an array of features for each song in the dataset. 

In [28]:
features = ['artist_name','genre','topic', 'lyrics']

for feature in features:
    df[feature] = df[feature].apply(clean_data)

In [30]:
def features(x):
    return ''.join(x['artist_name'])+''.join(x['genre'])+''.join(x['topic'])+''.join(x['lyrics'])

In [31]:
df['features'] = df.apply(features, axis=1)

In [32]:
df['features']

0        mukeshpopsadnesshold time feel break feel untr...
1        frankielainepopworld/lifebelieve drop rain fal...
2        johnnieraypopmusicsweetheart send letter goodb...
3        pérezpradopopromantickiss lips want stroll cha...
4        giorgospapadopoulospopromantictill darling til...
                               ...                        
28367    mack10hiphopobscenecause fuck leave scar tick ...
28368    m.o.p.hiphopobsceneminks things chain ring bra...
28369    ninehiphopobsceneget ban get ban stick crack r...
28370    willsmithhiphopobscenecheck check yeah yeah he...
28371    jeezyhiphopobsceneremix killer alive remix thr...
Name: features, Length: 28372, dtype: object

Vectorizing the features

In [33]:
from sklearn.feature_extraction.text import CountVectorizer
count = CountVectorizer(stop_words='english')
count_matrix = count.fit_transform(df['features'])

Generate cosine similarity matrix for each song compared to every other song in the dataset. Then, make a function that gets the top 5 most similar songs in the cosine similarity matrix based on the input song.

In [60]:
from sklearn.metrics.pairwise import cosine_similarity
cosine_sim = cosine_similarity(count_matrix, count_matrix)

indices = pd.Series(df.index, index=df['track_name']).drop_duplicates()
def get_recommendations(title, cosine_sim=cosine_sim):
    idx = indices[title]
    sim_scores = list(enumerate(cosine_sim[idx][0]))
    sim_scores = sorted(sim_scores, key = lambda x: x[1], reverse=True)
    sim_scores = sim_scores[0:6]
    song_indices = [i[0] for i in sim_scores]
    return (df[['track_name','artist_name']].iloc[song_indices])

Songs similar to "it's the most wonderful time of the year" according to the recommendation system. There is a song with the same title (most likely a cover), a song with a similar word in the title, and a holiday song. The recommendation system works.

In [66]:
get_recommendations("it's the most wonderful time of the year")

Unnamed: 0,track_name,artist_name
39,it's the most wonderful time of the year,andywilliams
18773,it's the most wonderful time of the year,johnnymathis
13185,high time,pauljones
24435,wonderful christmastime,paulmccartney
8966,take your time,jeffersonstarship
5534,nantes,beirut
