**Build an ANN model for Drug classification. **

This project aims to analyze the relationship between various medical parameters and drug effectiveness. The dataset consists of patient information, 
including age, sex, blood pressure levels (BP), cholesterol levels, sodium-to-potassium ratio (Na_to_K), drug type, and corresponding labels. The goal is to 
develop a model that can accurately predict the class or category of a given drug based on its features. 


Task 1 - Read the dataset and do data pre-processing  

In [2]:
#reading the dataset
import pandas as pd

df = pd.read_csv('drug200.csv')
df

Unnamed: 0,Age,Sex,BP,Cholesterol,Na_to_K,Drug
0,23,F,HIGH,HIGH,25.355,DrugY
1,47,M,LOW,HIGH,13.093,drugC
2,47,M,LOW,HIGH,10.114,drugC
3,28,F,NORMAL,HIGH,7.798,drugX
4,61,F,LOW,HIGH,18.043,DrugY
...,...,...,...,...,...,...
195,56,F,LOW,HIGH,11.567,drugC
196,16,M,LOW,HIGH,12.006,drugC
197,52,M,NORMAL,HIGH,9.894,drugX
198,23,M,NORMAL,NORMAL,14.020,drugX


In [71]:
# preprocessing the data
from tensorflow import keras
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder
x = df.drop(columns = ['Drug'],axis=1)
y = df['Drug']

x = pd.get_dummies(x,columns=['Sex','BP','Cholesterol'])
y = pd.get_dummies(y,columns=['Drug'])
print(x)
print(y)


#splitting the train and test data
from sklearn.model_selection import train_test_split
xtrain,xtest,ytrain,ytest = train_test_split(x,y,test_size=0.4,random_state=5) 



     Age  Na_to_K  Sex_0  Sex_1  BP_0  BP_1  BP_2  Cholesterol_0  \
0     23   25.355      1      0     1     0     0              1   
1     47   13.093      0      1     0     1     0              1   
2     47   10.114      0      1     0     1     0              1   
3     28    7.798      1      0     0     0     1              1   
4     61   18.043      1      0     0     1     0              1   
..   ...      ...    ...    ...   ...   ...   ...            ...   
195   56   11.567      1      0     0     1     0              1   
196   16   12.006      0      1     0     1     0              1   
197   52    9.894      0      1     0     0     1              1   
198   23   14.020      0      1     0     0     1              0   
199   40   11.349      1      0     0     1     0              0   

     Cholesterol_1  
0                0  
1                0  
2                0  
3                0  
4                0  
..             ...  
195              0  
196            

In [89]:
# creating ANN

from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

model = Sequential()
# input layer
model.add(Dense(48,input_dim=9,activation='relu'))
# 3 hidden layers
model.add(Dense(36,activation='relu'))
model.add(Dense(24,activation='relu'))
model.add(Dense(12,activation='relu'))
# output layer
model.add(Dense(5,activation='softmax'))

model.summary()

model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])
model.fit(xtrain,ytrain,epochs=50,batch_size=6,validation_data=(xtest,ytest))

Model: "sequential_36"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_175 (Dense)           (None, 48)                480       
                                                                 
 dense_176 (Dense)           (None, 36)                1764      
                                                                 
 dense_177 (Dense)           (None, 24)                888       
                                                                 
 dense_178 (Dense)           (None, 12)                300       
                                                                 
 dense_179 (Dense)           (None, 5)                 65        
                                                                 
Total params: 3,497
Trainable params: 3,497
Non-trainable params: 0
_________________________________________________________________
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5

<keras.callbacks.History at 0x7fa46c7e4a00>

Task 3 - Test the model with random data

In [90]:
import numpy as np
from sklearn.metrics import accuracy_score

# xtest are selected randomnly
ypred=model.predict(xtest)

y_pred_ann=[]
for i in ypred:
    li = []
    for j in i:
        j = np.round(j,0)
        li.append(j)
    y_pred_ann.append(li)

accuracy_score(ytest,y_pred_ann)



0.825