First we need to import the dependencies like ***numpy***, ***tensorflow***, ***matplotlib***, etc.

---



In [None]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
import tensorflow_datasets as tfds
from keras.applications.vgg16 import preprocess_input
from keras.regularizers import L2
from keras.layers import Dense, Input, Conv2D, MaxPool2D, Flatten, BatchNormalization, Dropout
from keras.optimizers import Adam
from keras.losses import BinaryCrossentropy

Then using the tensorflow_datasets, we can import the malaria dataset directly from thier servers.

In [None]:
dataset, dataset_info = tfds.load('malaria', with_info = True, as_supervised= True,split = ['train'])

Then we have to divide the dataset into training, cross-validation and testing sets for later use. We can adjust the split ratio of each of these sub-sets.

In [None]:
def divide_dataset(dataset):
  train_split = 0.8
  val_split = 0.1
  test_split = 0.1
  size = len(dataset)
  train_dataset = dataset.take(int(train_split * size))
  temp = dataset.skip(int(train_split * size))
  validation_dataset = temp.take(int(val_split * size))
  test_dataset = temp.skip(int(val_split * size))
  return train_dataset, validation_dataset, test_dataset

def preprocessing(image, label):
  image = preprocess_input(image)
  image = tf.image.resize(image, (224,224))
  return image, label
train_dataset, val_dataset, test_dataset = divide_dataset(dataset[0])

In [None]:
train_dataset = train_dataset.map(preprocessing)
val_dataset = val_dataset.map(preprocessing)
test_dataset = test_dataset.map(preprocessing)

For better learning, we make the training dataset to shuffle itself after each iteration in order to avoid overfitting. We set the batch size to 12.

In [None]:
train_Dataset = train_dataset.shuffle(buffer_size = 100, reshuffle_each_iteration= True).batch(12).prefetch(tf.data.AUTOTUNE)
val_Dataset = val_dataset.shuffle(buffer_size = 100, reshuffle_each_iteration= True).batch(12).prefetch(tf.data.AUTOTUNE)
test_Dataset = test_dataset.batch(1)

To visualize some of the training example, we can use matplotlib to make subplots of these images.

In [None]:
plt.figure(figsize = (10,5))
for i, (image, label) in enumerate(train_dataset.take(16)):
  plt.subplot(4,4,i+1)
  if i == 0:
    print(image)
  plt.imshow(image)
  plt.title(label.numpy())
  plt.axis('off')
plt.show()

In our model, we will be using a single *Convolutional* Layer with kernel of shape (5,5) and 32 units, two *Dense* Layers with *Regularization* (lambda_ = 0.01) and tow *Dropout* Layers just to avoid overfitting. Lastly, we used sigmoid activation as its a **Binary Classification** problem.

In [None]:
model = tf.keras.Sequential([
    Input(shape = (224,224,3)),
    Conv2D(32, (5,5), activation = 'relu'),
    MaxPool2D(2,2),
    Flatten(),
    Dense(1028, activation = 'relu', kernel_regularizer = L2(0.01)),
    Dropout(0.2),
    Dense(1028, activation = 'relu', kernel_regularizer = L2(0.01)),
    Dropout(0.2),
    Dense(1, activation = 'sigmoid')
])
model.compile(loss = BinaryCrossentropy(), optimizer = Adam(), metrics = 'accuracy')

About training for 10 Epochs, both the training and validation accuracy was above 89%. Which indicates no overfitting neither underfitting. The model was converging.

In [None]:
model.fit(train_Dataset, validation_data = val_Dataset, epochs = 10)

In [None]:
model.evaluate(test_Dataset)

Then we can make predictions on the data, which was never seen by the model before.

In [None]:
predictions = model.predict(test_Dataset)