# BUILDING RECOMMENDATION SYSTEM FOR GIVEN SPOTIFY SONGS

In this repository, I built recommendation system for given 42305 Spotify songs based on their genre, mode, and duration. <br>
I used `sklearn.decomposition`' s NMF to did that, <br>
and I preprocessed the data by `normalize` method of `sklearn.preprocessing` <br>
I also replaced Na values with `sklearn.impute`s `SimpleImputer` <br>
The source of data is: https://www.kaggle.com/mrmorj/dataset-of-songs-in-spotify

#### IMPORTING NECESSARY LIBRARIES

In [None]:
import pandas as pd
import numpy as np
import matplotlib as plt
import seaborn as sns
import tensorflow as tf

#### IMPORTING OUR DATASET

In [None]:
spoti = pd.read_csv("/kaggle/input/dataset-of-songs-in-spotify/genres_v2.csv", encoding='utf-8', quotechar='"')
spoti.head(3)

In [None]:
spoti.shape

#### CREATING song_name 

In [None]:
song_name = spoti["song_name"]

In [None]:
print(song_name.shape)
print(song_name.isnull().values.any())

In [None]:
song_name = song_name.values.reshape(-1,1)

In [None]:
song_name.shape

#### REPLACING NA VALUES

In [None]:
from sklearn.impute import SimpleImputer
imr = SimpleImputer(missing_values=np.nan, strategy='most_frequent')
imr = imr.fit(song_name)
imputed_data = imr.transform(song_name)
song_name = pd.DataFrame(imputed_data)
song_name = song_name.rename(columns={0:"Song-Names"})
song_name.head(3)

#### CREATING CORE
Let's create a DataFrame that contains genre, mode, and duration_ms info of the given song names. <br>
Let's name it as core.

In [None]:
core = spoti[["genre","mode","duration_ms"]]
print(core.dtypes)
print(core.head())

In [None]:
core.dtypes

Hmm, looks like dtype of genre is object, and it contains string values. Let's encode them and make convinient for M.L. algorithm. 

In [None]:
core["genre"].value_counts()

In [None]:
core = core.replace({"genre":{"Underground Rap":0, "Dark Trap":1, "Hiphop":2, "trance":3, "trap":4, "techhouse":5, "dnb":6, "psytrance": 7, "techno":8, "hardstyle":9, "RnB":10, "Trap Metal":11, "Rap":12, "Emo":13, "Pop":14}})

#### APPLYING NMF
Non-negative matrix factorization (NMF), also non-negative matrix approximation is a group of algorithms in multivariate analysis and linear algebra where a matrix V is factorized into (usually) two matrices W and H, with the property that all three matrices have no negative elements. This non-negativity makes the resulting matrices easier to inspect.

In [None]:
from sklearn.decomposition import NMF
nmf = NMF(n_components = 6)
nmf_features = nmf.fit_transform(core)

#### PREPROCESSING

`normalize()` samples individually to unit norm. Each sample (i.e. each row of the data matrix) with at least one non zero component is rescaled independently of other samples so that its norm (l1, l2 or inf) equals one. This transformer is able to work both with dense numpy arrays and scipy.

In [None]:
from sklearn.preprocessing import normalize
norm_features = normalize(nmf_features)
current_music = norm_features[23,:]
similarities = norm_features.dot(current_music)
similarities

#### LAST TOUCHES AND CREATING current_music() function

In [None]:
df = pd.DataFrame(norm_features)
x = df.join(song_name)
df = pd.pivot_table(x, x[[0,1,2,3,4,5]],["Song-Names"])#for indexing song_name to our df
def current_music(value):
    print("Top 5 recommendations for given music are:")
    value = df.loc[value]
    similarities = df.dot(value)
    print(format(similarities.nlargest()))

In [None]:
current_music("Missed Calls - Remix")