# Building a Clinical Decision Support Tool with Decision Trees

Aim: To use classification algorithm to build a model from historical data of patients, and their respond to different medications. Then you use the trained decision tree to predict the class of a unknown patient, or to find a proper drug for a new patient.

About Dataset:

Imagine that you are a medical researcher compiling data for a study. You have collected data about a set of patients, all of whom suffered from the same illness. During their course of treatment, each patient responded to one of 5 medications, Drug A, Drug B, Drug c, Drug x and y. 

Part of your job is to build a model to find out which drug might be appropriate for a future patient with the same illness. 
The feature sets of this dataset are 

Age, 

Sex, 

Blood Pressure,

and Cholesterol of patients,
 
and the target is the drug that each patient responded to. 


In [1]:
import numpy as np 
import pandas as pd
from sklearn.tree import DecisionTreeClassifier

In [3]:
df = pd.read_csv(r"C:\Users\USER-PC\OneDrive\Data Analytics\Data Science_Machine Learning\Machine Learning (1)\drug200.csv")
df.head()

Unnamed: 0.1,Unnamed: 0,Age,Sex,BP,Cholesterol,Na_to_K,Drug
0,0,23,F,HIGH,HIGH,25.355,drugY
1,1,47,M,LOW,HIGH,13.093,drugC
2,2,47,M,LOW,HIGH,10.114,drugC
3,3,28,F,NORMAL,HIGH,7.798,drugX
4,4,61,F,LOW,HIGH,18.043,drugY


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Unnamed: 0   200 non-null    int64  
 1   Age          200 non-null    int64  
 2   Sex          200 non-null    object 
 3   BP           200 non-null    object 
 4   Cholesterol  200 non-null    object 
 5   Na_to_K      200 non-null    float64
 6   Drug         200 non-null    object 
dtypes: float64(1), int64(2), object(4)
memory usage: 11.1+ KB


In [5]:
X = df[['Age', 'Sex', 'BP', 'Cholesterol', 'Na_to_K']].values
X[0:5]

array([[23, 'F', 'HIGH', 'HIGH', 25.355],
       [47, 'M', 'LOW', 'HIGH', 13.093],
       [47, 'M', 'LOW', 'HIGH', 10.114],
       [28, 'F', 'NORMAL', 'HIGH', 7.797999999999999],
       [61, 'F', 'LOW', 'HIGH', 18.043]], dtype=object)

In [7]:
X = np.asarray(df[['Age', 'Sex', 'BP', 'Cholesterol', 'Na_to_K']])
X[0:5]

array([[23, 'F', 'HIGH', 'HIGH', 25.355],
       [47, 'M', 'LOW', 'HIGH', 13.093],
       [47, 'M', 'LOW', 'HIGH', 10.114],
       [28, 'F', 'NORMAL', 'HIGH', 7.797999999999999],
       [61, 'F', 'LOW', 'HIGH', 18.043]], dtype=object)

In [8]:
# Converting Cat to num values

from sklearn import preprocessing
le_sex = preprocessing.LabelEncoder()
le_sex.fit(['F','M'])
X[:,1] = le_sex.transform(X[:,1])

le_BP = preprocessing.LabelEncoder()
le_BP.fit([ 'LOW', 'NORMAL', 'HIGH'])
X[:,2] = le_BP.transform(X[:,2])


le_Chol = preprocessing.LabelEncoder()
le_Chol.fit([ 'NORMAL', 'HIGH'])
X[:,3] = le_Chol.transform(X[:,3]) 

X[0:5]


array([[23, 0, 0, 0, 25.355],
       [47, 1, 1, 0, 13.093],
       [47, 1, 1, 0, 10.114],
       [28, 0, 2, 0, 7.797999999999999],
       [61, 0, 1, 0, 18.043]], dtype=object)

In [9]:
y = df['Drug']

In [16]:
# Setting up Decision Tree Classifier
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=3)


In [17]:

print('shape of X_trainset:', X_train.shape)
print('shape of y_trainset:', y_train.shape)
print('shape 0f x_testset', X_test.shape)
print('shape of y_testset:', y_test.shape)

shape of X_trainset: (140, 5)
shape of y_trainset: (140,)
shape 0f x_testset (60, 5)
shape of y_testset: (60,)


Modeling

In [14]:
decision_tree  = DecisionTreeClassifier(criterion="entropy", max_depth = 4)
decision_tree 

DecisionTreeClassifier(criterion='entropy', max_depth=4)

In [22]:
# Fit model on training data
decision_tree.fit(X_train, y_train)

# Make predictions on test data 
y_pred = decision_tree.predict(X_test)

# Evaluate model on test data
print (y_pred[0:5])
print (y_test[0:5])

['drugY' 'drugX' 'drugX' 'drugX' 'drugX']
40     drugY
51     drugX
139    drugX
197    drugX
170    drugX
Name: Drug, dtype: object


Prediction using ytest and y_pred

In [24]:
# Evaluation using metrics

#from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score


# Calculate accuracy 
accuracy = accuracy_score(y_test, y_pred)

print("Accuracy: ", accuracy)


Accuracy:  0.9833333333333333


In [25]:

from sklearn.metrics import confusion_matrix


# Calculate confusion matrix
cnf_matrix = confusion_matrix(y_test, y_pred)

print(cnf_matrix)


[[ 7  0  0  0  0]
 [ 0  5  0  0  0]
 [ 0  0  5  0  0]
 [ 0  0  0 20  1]
 [ 0  0  0  0 22]]


In [None]:
AttributeError: 'numpy.ndarray' object has no attribute 'columns'