## Real Machine Learning Project
In the reference video, the author presents a theorical online music store that collects data (age, gender and preffered genre of music) from its customers and is looking to predict the preffered genre of music for new users. Below are the steps outlined in the video.


### Data Prepping
Here we import the data provided and split it into input and output.

In [3]:
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

music_data = pd.read_csv('../data/music.csv')
X = music_data.drop(columns=['genre'])
y = music_data['genre']

In [4]:
#Inspect X
X

Unnamed: 0,age,gender
0,20,1
1,23,1
2,25,1
3,26,1
4,29,1
5,30,1
6,31,1
7,33,1
8,37,1
9,20,0


In [5]:
#Inspect y
y.head()

0    HipHop
1    HipHop
2    HipHop
3      Jazz
4      Jazz
Name: genre, dtype: object

## Learning and Predicting
Here we learn how to invoke and implement the decision tree module from the sklearn module.

In [6]:
from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier()
model.fit(X, y)
model.predict([[21,1],[22,0]])

array(['HipHop', 'Dance'], dtype=object)

## Determining the Accuracy of the model
In this section the video shows us how to calculate the accuracy of a decision tree model. It also discusses the train_test_split function and the impact of separating a % of the data set for testing on the accuracy outcomes.

In [7]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model.fit(X_train, y_train)
predictions = model.predict(X_test)

score = accuracy_score(y_test, predictions)
score

1.0

## Persisting Models
Here the video discusses how to use the model trained above without having to re-train the model each time.

In [8]:
import joblib

joblib.dump(model,'../models/music-genre-predictor.joblib')

['../models/music-genre-predictor.joblib']

In [9]:
model = joblib.load('../models/music-genre-predictor.joblib')
predictions = model.predict([[21,1]])
predictions

array(['HipHop'], dtype=object)

## Visualizing a Decision Tree
In this section we export a vizualization of the model of created above.

In [10]:
from sklearn import tree
tree.export_graphviz(model, 
                     out_file='music-genre-predictor.dot', 
                     feature_names=['age', 'gender'], 
                     class_names=sorted(y.unique()), 
                     label='all', 
                     rounded=True, 
                     filled=True)

## VS code graph preview extension
Since the release of the video the suggested extension no longer supports preview of .dot files. So a different extension had to be installed (see below)

![graph_preview_extension.png](./images/graph_preview_extension.png "Graph Preview Extension")

## To use the extention you need to open the .dot file in vs code and search ">graphviz Interactive: preview graphviz / Dot (beside)".

![graph_preview_use.png](./images/graph_preview_use.png "Using Graph Preview")

## The above will result in the graph being displayed as follows:

![graph_preview_result.png](./images/graph_preview_result.png "Graph Preview")

## Close up of resulting graph preview below:
![music-genre-predictor.png](./images/music-genre-predictor.png "Close Up of Graph Preview")