# Recognition of abnormality from Breast Thermograms Using ConvXGB

## Table of Contents
- [1 - Packages](#1)
- [2 - Pre - Processing ( Anisotropic Diffusion Filtering)](#2)
- [3 - Load the Dataset](#3)
- [4 - Test and Train Data](#4)
- [5 - CNN Baseline Model](#5)
- [6 - Json File ( For Storing Weights)](#6)
- [7 - Loading CNN Model](#7)
- [8 - XGBoost](#8)
- [9 - Evaluation Metric](#9)

<a name='1'></a>
# 1 - Packages

In [1]:
import numpy as np
import os
import pickle
import keras
from scipy import stats
import tensorflow as tf
import xgboost as xgb
from datetime import datetime
from sklearn.metrics import accuracy_score
from keras.models import Sequential
from keras.utils import np_utils
from keras.preprocessing.image import ImageDataGenerator
from keras.layers import Dense, Activation, Flatten, Dropout, BatchNormalization
from keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras import optimizers
from keras import regularizers
from keras.callbacks import LearningRateScheduler
import cv2
import warnings
from keras.models import model_from_json

<a name='2'></a>
# 2- Pre - Processing ( Anisotropic Diffusion Filtering)

In [4]:
def anisodiff(img,niter=4,kappa=50,gamma=0.1,step=(1.,1.),option=1,ploton=False):
   
    # ...you could always diffuse each color channel independently if you
    # really want
    img = img[150:440, 50:550]
    if img.ndim == 3:
        warnings.warn("Only grayscale images allowed, converting to 2D matrix")
        img = img.mean(2)

    # initialize output array
    img = img.astype('float32')
    imgout = img.copy()

    # initialize some internal variables
    deltaS = np.zeros_like(imgout)
    deltaE = deltaS.copy()
    NS = deltaS.copy()
    EW = deltaS.copy()
    gS = np.ones_like(imgout)
    gE = gS.copy()

    # create the plot figure, if requested
    if ploton:
        import pylab as pl
        from time import sleep

        fig = pl.figure(figsize=(20,5.5),num="Anisotropic diffusion")
        ax1,ax2 = fig.add_subplot(1,2,1),fig.add_subplot(1,2,2)

        ax1.imshow(img,interpolation='nearest')
        ih = ax2.imshow(imgout,interpolation='nearest',animated=True)
        ax1.set_title("Original image")
        ax2.set_title("Iteration 0")

        fig.canvas.draw()

    for ii in range(niter):

        # calculate the diffs
        deltaS[:-1,: ] = np.diff(imgout,axis=0)
        deltaE[: ,:-1] = np.diff(imgout,axis=1)

        # conduction gradients (only need to compute one per dim!)
        if option == 1:
            gS = np.exp(-(deltaS/kappa)**2.)/step[0]
            gE = np.exp(-(deltaE/kappa)**2.)/step[1]
        elif option == 2:
            gS = 1./(1.+(deltaS/kappa)**2.)/step[0]
            gE = 1./(1.+(deltaE/kappa)**2.)/step[1]

        # update matrices
        E = gE*deltaE
        S = gS*deltaS

        # subtract a copy that has been shifted 'North/West' by one
        # pixel. don't as questions. just do it. trust me.
        NS[:] = S
        EW[:] = E
        NS[1:,:] -= S[:-1,:]
        EW[:,1:] -= E[:,:-1]

        # update the image
        imgout += gamma*(NS+EW)

        if ploton:
            iterstring = "Iteration %i" %(ii+1)
            ih.set_data(imgout)
            ax2.set_title(iterstring)
            fig.canvas.draw()
            # sleep(0.01)

    return imgout

<a name='3'></a>
# 3 - Load the Dataset

The Data Set Consists of total 117 Breast Thermal Images From visual DMI - IR Data Set(https://visual.ic.uff.br/dmi/).
- 86 Images (Without Cancer)
- 32 Images (With Cancer)
- Data Set Zip Folder (https://drive.google.com/file/d/1ILEfL8uose4R7BQGQGZevCw7SCiel62o/view?usp=sharing)

In [5]:
path_test = "C:/Users/rsvmu/Downloads/Data_2"
CATEGORIES = ["Withcancer","Withoutcancer"]
IMG_SIZE_X = 480
IMG_SIZE_Y = 640

**Note:** Here, The createTrainingData Fucntion besides loading/creating the data for model the two main steps are taking place before giving it to the model:
 - First step - Is converting the RGB image **(480, 640, 3)** to Gray_scale **(480, 640)**
 - Second step - Using the above function **anisdiff** filter and apply that on Gray_scale Images.
 - The Input Image Shape is **(290,500)**

In [6]:
training = []
def createTrainingData():
    for category in CATEGORIES:
        path = os.path.join(path_test, category)
        class_num = CATEGORIES.index(category)
        for img in os.listdir(path):
            img_array = cv2.imread(os.path.join(path,img),0) #Changing RGB to Gray_Scale 
            aniso = anisodiff(img_array,) # Applying anistropic Diffusion Filter to Gray_scale Images
            new_array = cv2.resize(aniso, (IMG_SIZE_X, IMG_SIZE_Y))
            training.append([new_array, class_num])
createTrainingData()

In [7]:
X =[]
y =[]
for features, label in training:
    X.append(features)
    y.append(label)
X = np.array(X).reshape(-1, IMG_SIZE_X, IMG_SIZE_Y)

In [8]:
X = X.astype('float32')
X /= 255
from keras.utils import np_utils
Y = np_utils.to_categorical(y, 2)
print(Y[5])
print(Y.shape)

[1. 0.]
(215, 2)


<a name='4'></a>
# 4 - Test and Train Data
- Test Data - 20 Percent of Whole data 102 Images

In [9]:
from sklearn.model_selection import train_test_split

In [10]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 4)

In [11]:
batch_size = 2
nb_classes = 2
nb_epochs = 20
img_rows, img_columns = 480, 640
img_channel = 3
nb_filters = 32
nb_pool = 5
nb_conv = 5

<a name='5'></a>
# 5 - CNN Baseline Model
- I have first Used two Convolutional Layers with no padding and for hidden layers took RELU Activation Function and for Output layer we took sigmoid Function.

In [12]:
# model = tf.keras.Sequential([
#     tf.keras.layers.Conv2D(64, (3,3), padding='same', activation=tf.nn.relu,
#                            input_shape=(480, 640,1)),
#     tf.keras.layers.MaxPooling2D((2, 2), strides=1),
#     tf.keras.layers.Conv2D(64, (3,3), padding='same', activation=tf.nn.relu),
#     tf.keras.layers.MaxPooling2D((2, 2), strides=1),
#     tf.keras.layers.Conv2D(64, (3,3), padding='same', activation=tf.nn.relu),
#     tf.keras.layers.MaxPooling2D((2, 2), strides=1),
#     tf.keras.layers.Conv2D(64, (3,3), padding='same', activation=tf.nn.relu),
#     tf.keras.layers.MaxPooling2D((2, 2), strides=1),
#     tf.keras.layers.Conv2D(64, (3,3), padding='same', activation=tf.nn.relu),
#     tf.keras.layers.MaxPooling2D((2, 2), strides=1),
#     tf.keras.layers.Flatten(),
#     tf.keras.layers.Dense(128, activation=tf.nn.relu),
#     tf.keras.layers.Dense(1,  activation=tf.nn.sigmoid)
    
# ])

In [13]:
model = Sequential()
# Input Layer
# filters = Similar to number of Neurons, kernel_size = (3,3), strides = (1,1), padding = 'same'(zero padding), activation = 'relu'
model.add(Conv2D(filters = 64, kernel_size = (3,3), strides = (1,1), padding = 'same', activation = 'LeakyReLU',
                    input_shape = (480, 640,1)))
# model.add(BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"))
#     # MaxPooling
model.add(MaxPooling2D())
model.add(Dropout(0.7))
    # Conv2D - II
model.add(Conv2D(filters = 64, kernel_size = (3,3), strides = (1,1), padding = 'same', activation = 'relu'))
# model.add(BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"))
    # MaxPooling
model.add(MaxPooling2D())
# model.add(Dropout(0.5))   
    # Conv2D - III
model.add(Conv2D(filters = 64, kernel_size = (3,3), strides = (1,1), padding = 'same', activation = 'LeakyReLU'))
# model.add(BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"))
    # MaxPooling
model.add(MaxPooling2D())

# model.add(Dropout(0.5))    
    # Flatten Layer
model.add(Flatten())
    
    # Fully Connected Layer
model.add(Dense(units = 128, activation = 'relu'))
model.add(Dense(units = 1, activation = 'sigmoid'))

In [14]:
y_train = np.array(y_train)
y_test = np.array(y_test)

In [15]:
model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])
# model.fit(X_train, y_train, batch_size = batch_size, epochs = nb_epochs, verbose = 1, validation_data = (X_test, y_test))

In [16]:
from tensorflow.keras import callbacks
filepath = "path/Mask_BestModel.hdf5"
checkpoint = callbacks.ModelCheckpoint(filepath, monitor='val_loss', save_best_model = True, mode = 'min', verbose = 1)
checkpoint

<keras.callbacks.ModelCheckpoint at 0x26bfd2e7550>

In [17]:
history = model.fit(X_train, y_train, batch_size = batch_size, epochs = nb_epochs, validation_data = (X_test, y_test), callbacks = [checkpoint], verbose = 1)

Epoch 1/20
Epoch 1: saving model to path\Mask_BestModel.hdf5
Epoch 2/20
Epoch 2: saving model to path\Mask_BestModel.hdf5
Epoch 3/20
Epoch 3: saving model to path\Mask_BestModel.hdf5
Epoch 4/20
Epoch 4: saving model to path\Mask_BestModel.hdf5
Epoch 5/20
Epoch 5: saving model to path\Mask_BestModel.hdf5
Epoch 6/20
Epoch 6: saving model to path\Mask_BestModel.hdf5
Epoch 7/20
Epoch 7: saving model to path\Mask_BestModel.hdf5
Epoch 8/20
Epoch 8: saving model to path\Mask_BestModel.hdf5
Epoch 9/20
Epoch 9: saving model to path\Mask_BestModel.hdf5
Epoch 10/20
Epoch 10: saving model to path\Mask_BestModel.hdf5
Epoch 11/20
Epoch 11: saving model to path\Mask_BestModel.hdf5
Epoch 12/20
Epoch 12: saving model to path\Mask_BestModel.hdf5
Epoch 13/20
Epoch 13: saving model to path\Mask_BestModel.hdf5
Epoch 14/20
Epoch 14: saving model to path\Mask_BestModel.hdf5
Epoch 15/20
Epoch 15: saving model to path\Mask_BestModel.hdf5
Epoch 16/20
Epoch 16: saving model to path\Mask_BestModel.hdf5
Epoch 17/2

<a name='6'></a>
# 6 - Json File ( For Storing Weights)
- After Completeing the CNN we are storing the weights into json file and read the file to give this weights to a XgBoost.

In [18]:
json_model = model.to_json()

In [19]:
with open("model.json", "w") as json_file:
    json_file.write(json_model)
# serialize weights to HDF5
model.save_weights("model.h5")
print("Saved model to disk")

Saved model to disk


In [20]:
mean = np.mean(X_train,axis=(0,1,2))
std = np.std(X_train,axis=(0,1,2))
X_train = (X_train-mean)/(std+1e-7)
X_test = (X_test-mean)/(std+1e-7)

<a name='7'></a>
# 7 - Loading CNN Model

In [21]:
def load_cnn_model(X_test, y_test):
	model.load_weights("path/Mask_BestModel.hdf5")
	opt_rms = optimizers.Adam(learning_rate=0.001,decay=1e-6)
	model.compile(
		loss='binary_crossentropy',
		optimizer=opt_rms,
		metrics=['accuracy'])
	'''
	y_test_ = np_utils.to_categorical(y_test, 10)
	scores = loaded_model.evaluate(X_test, y_test_, batch_size=128, verbose=1)
	print('\nTest result: %.3f loss: %.3f\n' % (scores[1]*100,scores[0]))
	'''
	return model

In [22]:
cnn_model = load_cnn_model(X_test, y_test)
print("Loaded CNN model from disk")

Loaded CNN model from disk


In [23]:
def get_feature_layer(model, data):
	
	total_layers = len(model.layers)
	fl_index = total_layers - 2
	feature_layer_model = keras.Model(
		inputs=model.input,
		outputs=model.get_layer(index=fl_index).output)
	
	feature_layer_output = feature_layer_model.predict(data)
	
	return feature_layer_output

In [24]:
X_train_cnn =  get_feature_layer(cnn_model, X_train)
print("Features extracted of training data")
X_test_cnn = get_feature_layer(cnn_model, X_test)
print("Features extracted of test data\n")

Features extracted of training data
Features extracted of test data



<a name='8'></a>
# 8 - XGBoost

In [25]:
def xgb_model(X_train, y_train, X_test, y_test):

	dtrain = xgb.DMatrix(
		X_train,
		label=y_train
	)

	dtest = xgb.DMatrix(
		X_test,
		label=y_test
	)

	results = {}

	params = {
		'max_depth':5,
		'eta':0.05,
		'objective':'multi:softprob',
        'min_child_weight':2,
        'learning_rate':0.5,
		'num_class':2,
		'eval_metric':'merror'
	}

	watchlist = [(dtrain, 'train'),(dtest, 'eval')]
	n_round = 400

	model = xgb.train(
		params,
		dtrain,
		n_round,
		watchlist,
		evals_result=results)

	pickle.dump(model, open("cnn_xgboost_final.pickle.dat", "wb"))

	return model

<a name='9'></a>
# 9 - Evalution Metric

- merror - Multiclass classification error rate. It is calculated as **#(wrong cases)/#(all cases)**

In [26]:
print("Build and save of CNN-XGBoost Model.")
model = xgb_model(X_train_cnn, y_train, X_test_cnn, y_test)

Build and save of CNN-XGBoost Model.
[0]	train-merror:0.03106	eval-merror:0.22222
[1]	train-merror:0.01863	eval-merror:0.20370
[2]	train-merror:0.02484	eval-merror:0.22222
[3]	train-merror:0.01242	eval-merror:0.20370
[4]	train-merror:0.01242	eval-merror:0.20370
[5]	train-merror:0.01242	eval-merror:0.20370
[6]	train-merror:0.01242	eval-merror:0.20370
[7]	train-merror:0.01242	eval-merror:0.20370
[8]	train-merror:0.01242	eval-merror:0.20370
[9]	train-merror:0.00621	eval-merror:0.22222
[10]	train-merror:0.00621	eval-merror:0.22222
[11]	train-merror:0.00621	eval-merror:0.22222
[12]	train-merror:0.00621	eval-merror:0.22222
[13]	train-merror:0.00000	eval-merror:0.20370
[14]	train-merror:0.00000	eval-merror:0.22222
[15]	train-merror:0.00000	eval-merror:0.22222
[16]	train-merror:0.00000	eval-merror:0.20370
[17]	train-merror:0.00000	eval-merror:0.20370
[18]	train-merror:0.00000	eval-merror:0.20370
[19]	train-merror:0.00000	eval-merror:0.20370
[20]	train-merror:0.00000	eval-merror:0.20370
[21]	tr



[64]	train-merror:0.00000	eval-merror:0.24074
[65]	train-merror:0.00000	eval-merror:0.24074
[66]	train-merror:0.00000	eval-merror:0.24074
[67]	train-merror:0.00000	eval-merror:0.24074
[68]	train-merror:0.00000	eval-merror:0.24074
[69]	train-merror:0.00000	eval-merror:0.24074
[70]	train-merror:0.00000	eval-merror:0.24074
[71]	train-merror:0.00000	eval-merror:0.24074
[72]	train-merror:0.00000	eval-merror:0.24074
[73]	train-merror:0.00000	eval-merror:0.24074
[74]	train-merror:0.00000	eval-merror:0.24074
[75]	train-merror:0.00000	eval-merror:0.24074
[76]	train-merror:0.00000	eval-merror:0.24074
[77]	train-merror:0.00000	eval-merror:0.24074
[78]	train-merror:0.00000	eval-merror:0.24074
[79]	train-merror:0.00000	eval-merror:0.24074
[80]	train-merror:0.00000	eval-merror:0.24074
[81]	train-merror:0.00000	eval-merror:0.24074
[82]	train-merror:0.00000	eval-merror:0.24074
[83]	train-merror:0.00000	eval-merror:0.24074
[84]	train-merror:0.00000	eval-merror:0.24074
[85]	train-merror:0.00000	eval-mer

[240]	train-merror:0.00000	eval-merror:0.25926
[241]	train-merror:0.00000	eval-merror:0.25926
[242]	train-merror:0.00000	eval-merror:0.25926
[243]	train-merror:0.00000	eval-merror:0.25926
[244]	train-merror:0.00000	eval-merror:0.25926
[245]	train-merror:0.00000	eval-merror:0.25926
[246]	train-merror:0.00000	eval-merror:0.25926
[247]	train-merror:0.00000	eval-merror:0.25926
[248]	train-merror:0.00000	eval-merror:0.25926
[249]	train-merror:0.00000	eval-merror:0.25926
[250]	train-merror:0.00000	eval-merror:0.25926
[251]	train-merror:0.00000	eval-merror:0.25926
[252]	train-merror:0.00000	eval-merror:0.25926
[253]	train-merror:0.00000	eval-merror:0.25926
[254]	train-merror:0.00000	eval-merror:0.25926
[255]	train-merror:0.00000	eval-merror:0.25926
[256]	train-merror:0.00000	eval-merror:0.25926
[257]	train-merror:0.00000	eval-merror:0.25926
[258]	train-merror:0.00000	eval-merror:0.25926
[259]	train-merror:0.00000	eval-merror:0.25926
[260]	train-merror:0.00000	eval-merror:0.25926
[261]	train-m

In [84]:
print(1-0.19876)

0.81481


In [159]:
print(1-0.16667)

0.83333
