# Online Music Streaming

Reference: [Python Machine Learning Tutorial by Programming with Mosh](https://www.youtube.com/watch?v=7eh4d6sabA0&t=1982s)

Based on user profile, suggests music based on their interests.

Steps
1. Import the data
2. Clean the data
3. Split data into Training and Test Sets
4. Create a model
5. Train a model
6. Make predictions
7. Evaluation and improve

***

## 1. Import the Data

In [11]:
import pandas as pd 

from sklearn.tree import DecisionTreeClassifier # Decision Tree algorithm
from sklearn.model_selection import train_test_split # Easily split dataset to training and test data

from sklearn.metrics import accuracy_score # Calculate accuracy of model

import joblib # Save and load trained model

from sklearn import tree # Visualise Decision Tree

In [2]:
df = pd.read_csv("/Users/katiehuang/Documents/Data Science/Projects/Online Music Streaming/music.csv")
df.head()

Unnamed: 0,age,gender,genre
0,20,1,HipHop
1,23,1,HipHop
2,25,1,HipHop
3,26,1,Jazz
4,29,1,Jazz


***

## 3. Split Data into Training and Test Data

In [3]:
# Create subset of training / input data
X = df.drop(columns=["genre"])
X

Unnamed: 0,age,gender
0,20,1
1,23,1
2,25,1
3,26,1
4,29,1
5,30,1
6,31,1
7,33,1
8,37,1
9,20,0


In [4]:
# Create subset of test / output data
y = df["genre"]
y

0        HipHop
1        HipHop
2        HipHop
3          Jazz
4          Jazz
5          Jazz
6     Classical
7     Classical
8     Classical
9         Dance
10        Dance
11        Dance
12     Acoustic
13     Acoustic
14     Acoustic
15    Classical
16    Classical
17    Classical
Name: genre, dtype: object

In [5]:
# Easier method to split data into training and test data set
# Default result is tuples, hence we split them into 4 data sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

***

## 4-6. Create and Train a Model and Make Predictions

Decision Tree is one of the machine learning algorithms.

In [6]:
# Instantiate Decision Tree algorithm into a model
model = DecisionTreeClassifier()

# Fit the model to our training data set
model.fit(X_train, y_train)

# Use the model to predict the test data
predictions = model.predict(X_test)
predictions

array(['Classical', 'Classical', 'Dance', 'Classical'], dtype=object)

The general rule of thumb in machine learning is to allocate 70-80% to training and 20% for testing.

Then, we can test the results between training and testing data set to calculate the accuracy of the model.



***

## 7. Evaluation and Improve

In [7]:
# Use accuracy_score to test the accuracy between results based on fitted model and y_test
score = accuracy_score(y_test, predictions)
score

1.0

***

## Import Trained Model / Model Persistence

Once we're done with training the model, we save the model so that we do not need to train it every single time we have new data set. 

In [8]:
# Save trained model
joblib.dump(model, 'music-recommender.joblib') # model name, name of model

# Model is saved in the file with ipynb notebook

['music-recommender.joblib']

***

## How to Load Trained Model?

In [9]:
# Load trained model from file
model = joblib.load('music-recommender.joblib') # name of model

In [10]:
# Making predictions using our trained model
predictions = model.predict([[21, 1]])
predictions

array(['HipHop'], dtype=object)

***

## Visualize Decision Tree

In [12]:
# Visualised model using Decision Tree
tree.export_graphviz(model, 
                     out_file='music-recommender.dot', # Name of output file
                     feature_names=['age', 'gender'], # Columns of features
                     class_names=sorted(y.unique()), # Unique values of y data set
                     label='all', # Each node is labelled
                     rounded=True, # Rounded edges of node
                     filled=True) # Each node is coloured

Open 'music-recommender.dot' file in Visual Code. Download 'dot' and preview the visualisation at the side bar.