This code is an introduction to supervised learning solving a classification problem using **decision trees**.
It follows [this tutorial](https://youtu.be/7eh4d6sabA0). 

# **Classification Problem**
We will follow these steps of solving a machine learning problem.


1. Import the Data
2. Clean the Data
3. split the Data into Training/ Test steps
4. Create a Model
5. Train the Model
6. Make Predictions
7. Evaluate and improve


In [7]:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import joblib
from sklearn import tree

1. Import the Data.

In [8]:
drug_data = pd.read_csv('cleanedfile.csv')
drug_data

Unnamed: 0.1,Unnamed: 0,Age,Sex,Cholesterol,Na_to_K,Drug
0,0,23,0,HIGH,25.355,5000.0
1,1,47,1,HIGH,13.093,3000.0
2,2,47,1,HIGH,10.114,3000.0
3,3,28,0,HIGH,7.798,4000.0
4,4,61,0,HIGH,18.043,5000.0
...,...,...,...,...,...,...
193,193,56,0,HIGH,11.567,3000.0
194,194,16,1,HIGH,12.006,3000.0
195,195,52,1,HIGH,9.894,4000.0
196,196,23,1,NORMAL,14.020,4000.0


2. Clean and Prepare Data

In [13]:
# Run this section to inspect X
X = drug_data.drop(columns = ['Age','Sex','Cholesterol','Na_to_K','Drug'])
X

Unnamed: 0.1,Unnamed: 0
0,0
1,1
2,2
3,3
4,4
...,...
193,193
194,194
195,195
196,196


In [14]:
# Uncomment this section to inpect y
y = drug_data['Cholesterol']
y

0        HIGH
1        HIGH
2        HIGH
3        HIGH
4        HIGH
        ...  
193      HIGH
194      HIGH
195      HIGH
196    NORMAL
197    NORMAL
Name: Cholesterol, Length: 198, dtype: object

3. Learning and predicting with a decision tree

In [17]:
# Define and fit the model
model = DecisionTreeClassifier()
model.fit(X, y)
# Predict two values
predictions = model.predict( [ [ 7.4 ] ])

#display predictions
predictions



array(['HIGH'], dtype=object)

4. Calculating accuracy

In [18]:
# Train 80% of the data set and use the rest to test
X_train, X_test, y_train, y_test = train_test_split(
                                    X, y, test_size=0.2)

model = DecisionTreeClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

# Compute model accuracy
score = accuracy_score(y_test, predictions)
score

0.6

5. Persisting Models

In [19]:

model = DecisionTreeClassifier()
model.fit(X, y)

# Save the model to file
joblib.dump(model, 'MODELNAME.joblib')


['MODELNAME.joblib']

5.b. Import the model and make predictions

In [20]:
# Load saved model. Make sure that you have run the previous
# section at least once, and that the file exists.

model = joblib.load('MODELNAME.joblib')
predictions = model.predict(X)
predictions

array(['HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH',
       'HIGH', 'NORMAL', 'HIGH', 'NORMAL', 'HIGH', 'HIGH', 'HIGH',
       'NORMAL', 'NORMAL', 'HIGH', 'HIGH', 'NORMAL', 'NORMAL', 'HIGH',
       'NORMAL', 'HIGH', 'HIGH', 'NORMAL', 'HIGH', 'NORMAL', 'NORMAL',
       'HIGH', 'NORMAL', 'HIGH', 'NORMAL', 'NORMAL', 'HIGH', 'NORMAL',
       'NORMAL', 'NORMAL', 'NORMAL', 'HIGH', 'HIGH', 'NORMAL', 'NORMAL',
       'HIGH', 'NORMAL', 'NORMAL', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH',
       'NORMAL', 'NORMAL', 'NORMAL', 'NORMAL', 'HIGH', 'NORMAL', 'HIGH',
       'NORMAL', 'HIGH', 'NORMAL', 'NORMAL', 'NORMAL', 'NORMAL', 'HIGH',
       'NORMAL', 'HIGH', 'NORMAL', 'HIGH', 'NORMAL', 'HIGH', 'HIGH',
       'HIGH', 'NORMAL', 'NORMAL', 'NORMAL', 'HIGH', 'NORMAL', 'HIGH',
       'NORMAL', 'HIGH', 'HIGH', 'HIGH', 'NORMAL', 'HIGH', 'HIGH', 'HIGH',
       'HIGH', 'NORMAL', 'NORMAL', 'HIGH', 'NORMAL', 'HIGH', 'NORMAL',
       'HIGH', 'NORMAL', 'HIGH', 'HIGH', 'NORMAL', 'NORMAL', 'NORMAL'

6. (Optional) Visualizing decision trees

In [25]:
model = DecisionTreeClassifier()
model.fit(X, y)

tree.export_graphviz(model, out_file = 'MODELNAME1.dot',
    feature_names = ['CLASS1'],
    class_names = sorted(y.unique()),
    label = 'all',
    rounded = True,
    filled = True)

#Download the file music-recommender.dot and open it in VS Code.
