# Drug Classification using Decision Tree Algorithm
This project focuses on predicting the appropriate medication for patients based on their medical attributes. [The Dataset](https://www.kaggle.com/code/chikonzeroselemani/decisiontree-randomforest#notebook-container) consists of patients who suffered from the same illness and responded to one of five drugs: Drug A, Drug B, Drug C, Drug X, or Drug Y.

The goal is to build a classification model using a *Decision Tree* , which can then be used to predict the most suitable drug for a new patient with similar medical characteristics.

In [1]:
# Import Libraries
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import (
    accuracy_score,
    confusion_matrix,
    classification_report,
)

import matplotlib.pyplot as plt
from sklearn import tree

In [2]:
df = pd.read_csv("data/drug200.csv")
df.head()

Unnamed: 0,Age,Sex,BP,Cholesterol,Na_to_K,Drug
0,23,F,HIGH,HIGH,25.355,drugY
1,47,M,LOW,HIGH,13.093,drugC
2,47,M,LOW,HIGH,10.114,drugC
3,28,F,NORMAL,HIGH,7.798,drugX
4,61,F,LOW,HIGH,18.043,drugY


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Age          200 non-null    int64  
 1   Sex          200 non-null    object 
 2   BP           200 non-null    object 
 3   Cholesterol  200 non-null    object 
 4   Na_to_K      200 non-null    float64
 5   Drug         200 non-null    object 
dtypes: float64(1), int64(1), object(4)
memory usage: 9.5+ KB


In [4]:
df['Drug'].unique()

array(['drugY', 'drugC', 'drugX', 'drugA', 'drugB'], dtype=object)

In [5]:
# Initialize label encoder
encoder = LabelEncoder()

In [6]:
df['BP'] = encoder.fit_transform(df['BP'])
df['Sex'] = encoder.fit_transform(df['Sex'])
df['Cholesterol'] = encoder.fit_transform(df['Cholesterol'])
df['Na_to_K'] = encoder.fit_transform(df['Na_to_K'])
df['Drug'] = encoder.fit_transform(df['Drug'])

df.head()

Unnamed: 0,Age,Sex,BP,Cholesterol,Na_to_K,Drug
0,23,0,0,0,167,4
1,47,1,1,0,89,2
2,47,1,1,0,43,2
3,28,0,2,0,10,3
4,61,0,1,0,133,4


In [7]:
# Separate features and target
x = df.drop('Drug', axis=1)
y = df['Drug']

In [8]:
# Split Data
x_train, x_test, y_train, y_test = train_test_split( x, y, test_size=0.2, random_state=0)

In [9]:
# Train Model
model = DecisionTreeClassifier(random_state=0)
model.fit(x_train, y_train)

0,1,2
,criterion,'gini'
,splitter,'best'
,max_depth,
,min_samples_split,2
,min_samples_leaf,1
,min_weight_fraction_leaf,0.0
,max_features,
,random_state,0
,max_leaf_nodes,
,min_impurity_decrease,0.0


In [10]:
# Evaluate Model Accuracy
y_pred = model.predict(x_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

Accuracy: 1.00


## Creating a New Sample for Inference

In [11]:
# values = [[age, sex, BP, Cholesterol, Na_to_K]]
values = [[33, 1, 2, 0, 135]] 

columns = x.columns
new_df = pd.DataFrame(values, columns=columns)

In [12]:
result = model.predict(new_df)

if result == 4:
    print("DrugY")
elif result == 3:
    print("DrugX")
elif result == 2:
    print("DrugC")
elif result == 1:
    print("DrugB")
elif result == 0:
    print("DrugA")

DrugY
