<div href="pre-processing">
    <h2>Drug Classification using Decision Trees</h2>
</div>


We will use Decision Tree classification algorithm to build a model from the historical data of patients, and their response to different medications. Then it will be used to train a decision tree to predict the class of an unknown patient, or to find a proper drug for a new patient.

In [None]:
# Suppress warnings:
def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn

In [None]:
import sys
import numpy as np 
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
import sklearn.tree as tree

<div id="about_dataset">
    <h2>About the dataset</h2>
    The aim is to build a model to find out which drug might be appropriate for a future patient with the same illness. The features of this dataset are Age, Sex, Blood Pressure, and the Cholesterol of the patients, and the target is the drug that each patient responded to.
    <br>
</div>


<div id="downloading_data"> 
    <h2>Downloading the Data</h2>
    To download the data, we will use pandas library to read it directly into a dataframe from IBM Object Storage.
</div>


In [None]:
my_data = pd.read_csv('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-ML0101EN-SkillsNetwork/labs/Module%203/data/drug200.csv', delimiter=",")
my_data.head()

<div href="pre-processing">
    <h2>Pre-processing</h2>
</div>

In [None]:
X = my_data[['Age', 'Sex', 'BP', 'Cholesterol', 'Na_to_K']].values
X[0:5]

In [None]:
from sklearn import preprocessing
X[:,1] = preprocessing.LabelEncoder().fit(['F', 'M']).transform(X[:,1])
X[:,2] = preprocessing.LabelEncoder().fit(['LOW', 'NORMAL', 'HIGH']).transform(X[:,2])
X[:,3] = preprocessing.LabelEncoder().fit(['NORMAL', 'HIGH']).transform(X[:,3])
X[0:5]

In [None]:
y = my_data["Drug"]
y[0:5]

<hr>

<div id="setting_up_tree">
    <h2>Setting up the Decision Tree</h2>
</div>

In [None]:
from sklearn.model_selection import train_test_split
X_trainset, X_testset, y_trainset, y_testset = train_test_split(X, y, test_size=0.3, random_state=3)
drugTree = DecisionTreeClassifier(criterion="entropy", max_depth = 4)
drugTree

In [None]:
drugTree.fit(X_trainset,y_trainset)
predTree = drugTree.predict(X_testset)
print (predTree [0:5])
print (y_testset [0:5])

<hr>

<div id="evaluation">
    <h2>Evaluation</h2>
</div>


In [None]:
from sklearn import metrics
import matplotlib.pyplot as plt
print("Decision Trees's Accuracy:", metrics.accuracy_score(y_testset, predTree))

<hr>
<div id="visualization">
    <h2>Visualization</h2>
</div>

In [None]:
from sklearn.tree import export_graphviz
export_graphviz(drugTree, out_file='tree.dot', filled=True, feature_names=['Age', 'Sex', 'BP', 'Cholesterol', 'Na_to_K'])
!dot -Tpng tree.dot -o tree.png