# Image Classification with CNN for Malaria Data

In this project, we will create a classification model using malaria images labeled as infected and uninfected, obtained from Kaggle.com.We will use Keras deep learning. With this model, when a new malaria cell image is provided, the machine will automatically determine whether it is infected or uninfected.

<img src='https://eu.biogents.com/wp-content/uploads/malaria-transmission-cycle-without-headline-72dpi.jpg' >

<a href='https://www.kaggle.com/datasets/iarunava/cell-images-for-detecting-malaria/data'> Click to Reach Malaria Data <a>

### Import Dataset

In [1]:
import cv2
import pandas as pd
import os

import matplotlib.pyplot as plt

import numpy as np

In [2]:
labels=['Infected','Uninfected']
img_path='archive/'

In [3]:
img_list=[]
label_list=[]
for label in labels:
    for img_file in os.listdir(img_path+label):
        img_list.append(img_path+label+'/'+img_file)
        label_list.append(label) 

In [4]:
df=pd.DataFrame({'img':img_list,'label':label_list})

In [5]:
df.head()

Unnamed: 0,img,label
0,archive/Infected/C100P61ThinF_IMG_20150918_144...,Infected
1,archive/Infected/C100P61ThinF_IMG_20150918_144...,Infected
2,archive/Infected/C100P61ThinF_IMG_20150918_144...,Infected
3,archive/Infected/C100P61ThinF_IMG_20150918_144...,Infected
4,archive/Infected/C100P61ThinF_IMG_20150918_144...,Infected


In [6]:
df.tail()

Unnamed: 0,img,label
27553,archive/Uninfected/C99P60ThinF_IMG_20150918_14...,Uninfected
27554,archive/Uninfected/C99P60ThinF_IMG_20150918_14...,Uninfected
27555,archive/Uninfected/C99P60ThinF_IMG_20150918_14...,Uninfected
27556,archive/Uninfected/C99P60ThinF_IMG_20150918_14...,Uninfected
27557,archive/Uninfected/C99P60ThinF_IMG_20150918_14...,Uninfected


In [7]:
d={'Infected':1,'Uninfected':0}

In [8]:
df['encode_label']=df['label'].map(d)

In [9]:
df.sample(5)

Unnamed: 0,img,label,encode_label
25038,archive/Uninfected/C65P26N_ThinF_IMG_20150818_...,Uninfected,0
464,archive/Infected/C105P66ThinF_IMG_20150924_095...,Infected,1
9916,archive/Infected/C59P20thinF_IMG_20150803_1134...,Infected,1
4146,archive/Infected/C140P101ThinF_IMG_20151005_21...,Infected,1
22748,archive/Uninfected/C238NThinF_IMG_20151207_115...,Uninfected,0


### Deep Learning

We will identify x and y datas.

In [10]:
x=[]
for img in df['img']:
    img=cv2.imread(img)
    img=cv2.resize(img,(32,32)) #We resized the image to 32x32 pixels.
    img=img/255.0 #normalize the data
    x.append(img)

In [11]:
x=np.array(x)

In [12]:
y=df['encode_label']

Let's import train test split and we will split%20 of our datas as test data.

In [13]:
from sklearn.model_selection import train_test_split

In [14]:
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2,random_state=42)

CNN - Convolutional Neural Networks

In [15]:
from keras.models import Sequential
from keras.layers import Conv2D, Dense, Flatten, Input, MaxPooling2D,Dropout,BatchNormalization,Reshape

In [16]:
model = Sequential()
model.add(Input(shape=(32,32,3)))
model.add(Conv2D(32,(3,3),activation='relu'))
model.add(MaxPooling2D((2,2)))
model.add(Conv2D(64,(3,3),activation='relu'))
model.add(MaxPooling2D((2,2)))
model.add(Flatten())
model.add(Dense(128))
model.add(Dense(2,activation='softmax'))
model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])

In [17]:
history = model.fit(x_train,y_train,epochs=10,validation_data=(x_test,y_test),verbose=1)

Epoch 1/10
[1m689/689[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 5ms/step - accuracy: 0.6142 - loss: 0.6523 - val_accuracy: 0.8500 - val_loss: 0.3247
Epoch 2/10
[1m689/689[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 4ms/step - accuracy: 0.9142 - loss: 0.2306 - val_accuracy: 0.9321 - val_loss: 0.1803
Epoch 3/10
[1m689/689[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 4ms/step - accuracy: 0.9446 - loss: 0.1708 - val_accuracy: 0.9390 - val_loss: 0.1686
Epoch 4/10
[1m689/689[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 4ms/step - accuracy: 0.9483 - loss: 0.1562 - val_accuracy: 0.9463 - val_loss: 0.1590
Epoch 5/10
[1m689/689[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 4ms/step - accuracy: 0.9522 - loss: 0.1409 - val_accuracy: 0.9447 - val_loss: 0.1659
Epoch 6/10
[1m689/689[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 4ms/step - accuracy: 0.9537 - loss: 0.1375 - val_accuracy: 0.9136 - val_loss: 0.2453
Epoch 7/10
[1m689/689[0m 

In [18]:
model.save('my__model.h5')



We trained our model and achieved a 96% accuracy score. We saved our model, and now we can use it in any algorithm.

### Transfer Learning

We trained a new model from scratch using CNN. However, we can also achieve our goal by using pre-trained models and providing them with our own data. Among the pre-trained models, the most widely used in the field of image processing are the ResNet50 and VGG16 models. Here, we will use the VGG16 model.

In [19]:
from keras.models import Sequential
from keras.layers import Conv2D, Dense, Flatten, Input, MaxPooling2D,Dropout,BatchNormalization,Reshape
from tensorflow.keras.applications import VGG16,ResNet50
from tensorflow.keras.preprocessing.image import ImageDataGenerator

We will reload our data, but this time we will use a more advanced method.

In [20]:
data_dir='archive'
img_width,img_heigth=32,32

train_datagen=ImageDataGenerator(rescale=1/255, validation_split=.20)

train_datagenerator=train_datagen.flow_from_directory(directory=data_dir,target_size=(img_width,img_heigth),
                                class_mode='binary', subset='training')

test_datagen=ImageDataGenerator(rescale=1/255)
test_datagenerator=train_datagen.flow_from_directory(directory=data_dir,target_size=(img_width,img_heigth),
                                class_mode='binary', subset='validation')

base_model=VGG16(weights='imagenet', input_shape=(img_width,img_heigth,3),include_top=False)
model=Sequential()

model.add(base_model)
for layer in base_model.layers:
    layer.trainable=False

model.add(Flatten())
model.add(Dense(1024,activation='relu'))
model.add(Dense(1,activation='sigmoid'))

model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])

model.fit(train_datagenerator,epochs=10,validation_data=test_datagenerator)

Found 22048 images belonging to 2 classes.
Found 5510 images belonging to 2 classes.
Epoch 1/10


  self._warn_if_super_not_called()


[1m689/689[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m44s[0m 63ms/step - accuracy: 0.8225 - loss: 0.3873 - val_accuracy: 0.8641 - val_loss: 0.3129
Epoch 2/10
[1m689/689[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m49s[0m 72ms/step - accuracy: 0.8699 - loss: 0.3034 - val_accuracy: 0.8432 - val_loss: 0.3571
Epoch 3/10
[1m689/689[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m49s[0m 72ms/step - accuracy: 0.8782 - loss: 0.2871 - val_accuracy: 0.8817 - val_loss: 0.2900
Epoch 4/10
[1m689/689[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m49s[0m 72ms/step - accuracy: 0.8743 - loss: 0.2881 - val_accuracy: 0.8599 - val_loss: 0.3232
Epoch 5/10
[1m689/689[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m50s[0m 72ms/step - accuracy: 0.8816 - loss: 0.2743 - val_accuracy: 0.8813 - val_loss: 0.2895
Epoch 6/10
[1m689/689[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m49s[0m 72ms/step - accuracy: 0.8862 - loss: 0.2672 - val_accuracy: 0.8786 - val_loss: 0.2814
Epoch 7/10
[1m689/689[0m 

<keras.src.callbacks.history.History at 0x1e32997eab0>

### Conclusion

We turned the file paths of the malaria cell images obtained from Kaggle.com into a DataFrame with the help of the Pandas and OS libraries. Then, we sequentially read all the images using the OpenCV library and normalized them for easier processing. Afterward, we trained our model using Keras deep learning and achieved a 96% accuracy score. As a second step, we used the VGG16 model by feeding our data into it, training the pre-trained model with our data, and making it suitable for our purpose. Both methods can be used for Image Classification, but since training a model from scratch is a time-consuming process, the common approach is to use a pre-trained model and apply transfer learning.