# Music Recommendation Model
## Author : Laukit Mandal

**Music recommender systems can suggest songs to users based on their listening patterns.**<br>
**Dataset:https://www.kaggle.com/c/kkbox-music-recommendation-challenge/data**


In [2]:
import numpy as np 
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
import matplotlib.pyplot as plt
%matplotlib inline
import gc
import seaborn as sns
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import datetime
import math

**Preparation data <br>
1% of Data is used in the kernel.**

In [3]:
# Load data
df = pd.read_csv('train.csv')

# 1% sample of items
df = df.sample(frac=0.01)

# Load and join songs data
songs = pd.read_csv('songs.csv')
df = pd.merge(df, songs, on='song_id', how='left')
del songs

# Load and join songs data
members = pd.read_csv('members.csv')
df = pd.merge(df, members, on='msno', how='left')
del members

# Replace NA
for i in df.select_dtypes(include=['object']).columns:
    df[i][df[i].isnull()] = 'unknown'
df = df.fillna(value=0)

**Create Dates**

In [4]:
# registration_init_time
df.registration_init_time = pd.to_datetime(df.registration_init_time, format='%Y%m%d', errors='ignore')
df['registration_init_time_year'] = df['registration_init_time'].dt.year
df['registration_init_time_month'] = df['registration_init_time'].dt.month
df['registration_init_time_day'] = df['registration_init_time'].dt.day

# expiration_date
df.expiration_date = pd.to_datetime(df.expiration_date,  format='%Y%m%d', errors='ignore')
df['expiration_date_year'] = df['expiration_date'].dt.year
df['expiration_date_month'] = df['expiration_date'].dt.month
df['expiration_date_day'] = df['expiration_date'].dt.day

**Organising Our Data**

In [5]:
# Select columns
df = df[['msno', 'song_id', 'source_screen_name', 'source_type', 'target',
       'song_length', 'artist_name', 'composer', 'bd',
       'registration_init_time', 'registration_init_time_month',
       'registration_init_time_day', 'expiration_date_day']]

# Dates to categoty
df['registration_init_time'] = df['registration_init_time'].astype('category')

# Object data to category
for col in df.select_dtypes(include=['object']).columns:
    df[col] = df[col].astype('category')
    
# Encoding categorical features
for col in df.select_dtypes(include=['category']).columns:
    df[col] = df[col].cat.codes

df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 73774 entries, 0 to 73773
Data columns (total 13 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   msno                          73774 non-null  int16  
 1   song_id                       73774 non-null  int16  
 2   source_screen_name            73774 non-null  int8   
 3   source_type                   73774 non-null  int8   
 4   target                        73774 non-null  int64  
 5   song_length                   73774 non-null  float64
 6   artist_name                   73774 non-null  int16  
 7   composer                      73774 non-null  int16  
 8   bd                            73774 non-null  int64  
 9   registration_init_time        73774 non-null  int16  
 10  registration_init_time_month  73774 non-null  int64  
 11  registration_init_time_day    73774 non-null  int64  
 12  expiration_date_day           73774 non-null  int64  
dtypes

**Data Split Into Train And Test**

In [6]:
X = df.drop('target', axis = 1)
y = df.target
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size = 0.25, random_state = 0)

**Model**

In [7]:
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier
# Create model
model = RandomForestClassifier(n_estimators=1000)
model.fit(X_train, y_train)
# Predicting
val_pred = model.predict(X_val)

In [8]:
from sklearn.metrics import roc_curve, roc_auc_score, accuracy_score, confusion_matrix, classification_report
print(classification_report(y_val, val_pred))

              precision    recall  f1-score   support

           0       0.64      0.61      0.62      9151
           1       0.63      0.66      0.65      9293

    accuracy                           0.64     18444
   macro avg       0.64      0.64      0.64     18444
weighted avg       0.64      0.64      0.64     18444



In [9]:
print("Accuracy :", accuracy_score(y_val, val_pred))
print("ROC :", roc_auc_score(y_val, val_pred))

Accuracy : 0.6354369984818912
ROC : 0.6352275357444592
