# Day 19 of 60 days of Machine learning from the basics

# Project: Custom music recommendation system
## Steps : 
1. Import the data
2. Clean the data
3. Split the data into Training/Test Sets
4. Create a model
5. Train the model
6. Make predictions
7. Evaluate and Improve

# 1.Importing data from our custom CSV file

In [1]:
import pandas as pd
# reading csv file
music_data = pd.read_csv('music.csv')

music_data

Unnamed: 0,age,gender,genre
0,22,1,Jazz
1,36,1,Pop
2,40,1,Hip-Hop
3,26,0,Jazz
4,30,0,Pop
...,...,...,...
95,53,1,Pop
96,19,1,Rock
97,41,1,Hip-Hop
98,29,0,Jazz


# 2.Cleaning or Preparing the data

let's split the dataset into 2 separate set:
1. Input dataset (age and gender)
2. Output dataset (genre)

In [2]:
# let's use drop method in DataFrame
# Input set
X = music_data.drop(columns = ['genre'])
X

Unnamed: 0,age,gender
0,22,1
1,36,1
2,40,1
3,26,0
4,30,0
...,...,...
95,53,1
96,19,1
97,41,1
98,29,0


In [3]:
# Let's create Output dataset
Y = music_data['genre']
Y

0        Jazz
1         Pop
2     Hip-Hop
3        Jazz
4         Pop
       ...   
95        Pop
96       Rock
97    Hip-Hop
98       Jazz
99    Hip-Hop
Name: genre, Length: 100, dtype: object

# 3.Let's build a model using Decision Tree algorithm from Sickit-learn library

In [4]:
# importing DecisionTree from sickitlearn
from sklearn.tree import DecisionTreeClassifier

In [5]:
# Instance of DecisionTreeClassifier Class
model = DecisionTreeClassifier()

# Train it
model.fit(X, Y)

In [6]:
music_data

Unnamed: 0,age,gender,genre
0,22,1,Jazz
1,36,1,Pop
2,40,1,Hip-Hop
3,26,0,Jazz
4,30,0,Pop
...,...,...,...
95,53,1,Pop
96,19,1,Rock
97,41,1,Hip-Hop
98,29,0,Jazz


# 4.Making predictions

In [7]:
# Now Let's make predictions using prediction method that takes 2d array
predictions = model.predict([ [19,1],[62,0] ])



In [8]:
print("----- Music Genre Prediction for 19 year old male and 62 years old female ----- ")
predictions

----- Music Genre Prediction for 19 year old male and 62 years old female ----- 


array(['Rock', 'Classical'], dtype=object)

# 4.Let's Claculate the accuracy

  # by Splitting the training and testing dataset

In [23]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X_train,X_test, Y_train, Y_test = train_test_split(X,Y,test_size = 0.2)

# passing only trained dataset
model.fit(X_train,Y_train)

In [24]:
# input values for testing
predictions = model.predict(X_test)

# score is from 0 to 1
score  = accuracy_score(Y_test, predictions )
score

0.45

# Persisting models

In [11]:
# model persistance is important for reducing our time since for larger datasets it takes a lots of time
import joblib

joblib.dump(model, 'music-recommeder.joblib')

['music-recommeder.joblib']

# Now let's visuialize our Decision Tree

In [27]:
from sklearn import tree

tree.export_graphviz(model, out_file = 'music-recommendor.dot',
                     feature_names=['age','gender'],
                     class_names = sorted(Y.unique()),
                     label='all',
                     rounded=True,
                     filled = True)