# Tuberculosis Diagnosis using Transfer Learning
This notebook is written and executed by **Dr Raheel Siddiqi** on *05-10-2019*. The notebook presents an experiment to classify X-ray images as 'NORMAL' or 'containing manifestation of Tuberculosis (TB)' i.e. it is a binary classification problem. Transfer Learning is used to exploit the feature extractor of the *VGG16* pre-trained model. The dataset used is: [**China Set - The Shenzhen set - Chest X-ray Database**](https://www.kaggle.com/kmader/pulmonary-chest-xray-abnormalities). 5-fold cross validation is used to evaluate the model.

In [1]:
import tensorflow as tf
from tensorflow.python import keras

print('Tensorflow Version: ', tf.__version__)
print('Keras Version: ', keras.__version__)

Tensorflow Version:  1.13.1
Keras Version:  2.2.4-tf


In [2]:
from tensorflow.python.keras.applications import VGG16
import os
import numpy as np
from tensorflow.python.keras.preprocessing.image import ImageDataGenerator
from tensorflow.python.keras import models
from tensorflow.python.keras import layers
from tensorflow.python.keras import optimizers

def get_model():
    model = models.Sequential()
    conv_base=VGG16(weights='imagenet',include_top=False,input_shape=(100,100,3))
    model.add(conv_base)
    model.add(layers.Flatten())
    model.add(layers.Dense(512, activation='relu'))
    model.add(layers.Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy',optimizer=optimizers.Adam(lr=1e-4),metrics=['accuracy'])
    return model   

In [3]:
image_height = 100
image_width = 100
batch_size = 2
no_of_epochs  = 15

In [4]:
import numpy as np
import matplotlib.pyplot as plt
import os
import cv2
from tqdm import tqdm
%matplotlib inline

DATADIR = "D:\\TensorFlow Programs\\TB Diagnosis\\ChinaSet_AllFiles\\ChinaSet_AllFiles\\ChinaSetImages"

CATEGORIES = ["NORMAL", "TB"]

data=[]

for category in CATEGORIES:  
    path = os.path.join(DATADIR,category)
    class_num=CATEGORIES.index(category)
    for img in tqdm(os.listdir(path)):  
        try:
            img_array = cv2.imread(os.path.join(path,img))
            img_array = cv2.resize(img_array, (image_height, image_width))
            img_array = cv2.cvtColor(img_array, cv2.COLOR_BGR2RGB)        
            img_array = img_array.astype(np.float32)/255.
            data.append([img_array, class_num])
        except Exception as e:   
            pass

print(len(data))

100%|████████████████████████████████████████████████████████████████████████████████| 326/326 [00:35<00:00,  8.73it/s]
100%|████████████████████████████████████████████████████████████████████████████████| 336/336 [00:34<00:00,  7.56it/s]


662


In [5]:
import random

random.shuffle(data)
for sample in data[:10]:
    print(sample[1])

0
1
1
0
0
1
1
0
1
1


In [6]:
X = []
y = []

for features,label in data:
    X.append(features)
    y.append(label)

X = np.array(X).reshape(-1, image_width, image_height, 3)
print(X.shape)

(662, 100, 100, 3)


In [7]:
print(len(X))

662


In [8]:
print(X.shape)

(662, 100, 100, 3)


In [9]:
k=5
num_validation_samples=len(X)//k
validation_scores=[]
for fold in range(k):
    validation_data=X[num_validation_samples*fold:num_validation_samples*(fold+1)]
    validation_labels=y[num_validation_samples*fold:num_validation_samples*(fold+1)]
    if fold==0:
        training_data=X[num_validation_samples*(fold+1):]
        training_labels=y[num_validation_samples*(fold+1):]    
    else:
        training_data=np.append(X[:num_validation_samples*fold], X[num_validation_samples*(fold+1):],axis=0)
        training_labels=np.append(y[:num_validation_samples*fold], y[num_validation_samples*(fold+1):],axis=0)
    model=get_model()
    model.fit(training_data,training_labels,batch_size=4,epochs=10) # 10 epochs per model
    validation_score=model.evaluate(validation_data,validation_labels)
    validation_scores.append(validation_score[1])
print('Average Validation Score: ', np.average(validation_scores))

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.cast instead.
Epoch 1/10
Epoch 2/10


Epoch 3/10


Epoch 4/10


Epoch 5/10


Epoch 6/10


Epoch 7/10


Epoch 8/10


Epoch 9/10


Epoch 10/10


Epoch 1/10


Epoch 2/10


Epoch 3/10


Epoch 4/10


Epoch 5/10


Epoch 6/10


Epoch 7/10


Epoch 8/10


Epoch 9/10


Epoch 10/10


Epoch 1/10


Epoch 2/10


Epoch 3/10


Epoch 4/10


Epoch 5/10


Epoch 6/10


Epoch 7/10


Epoch 8/10


Epoch 9/10


Epoch 10/10


Epoch 1/10


Epoch 2/10


Epoch 3/10


Epoch 4/10


Epoch 5/10


Epoch 6/10


Epoch 7/10


Epoch 8/10


Epoch 9/10


Epoch 10/10


Epoch 1/10


Epoch 2/10


Epoch 3/10


Epoch 4/10


Epoch 5/10


Epoch 6/10


Epoch 7/10


Epoch 8/10


Epoch 9/10


Epoch 10/10


Average Validation Score:  0.8333333


In [10]:
print('Average Validation Score: ', np.average(validation_scores))

Average Validation Score:  0.8333333
