# Tumor Image classification

## Data Preparation

Data Augmentation is performed on the given dataset. ImageDataGenerator function in Keras can used for preprocessing the data.

The images in the given dataset didnt require any augmentation as the dataset had object of interest in focus. The data was imbalanced even after combining classes 0 , 1 into class 0 and classes 2 , 3 into class 1. Classes 1 was about half the size of class 0. Hence Class 1 was upsampled by using duplicatioon.

In [2]:
from keras.preprocessing.image import ImageDataGenerator


train_datagen = ImageDataGenerator(rescale = 1./255, 
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True)

test_datagen = ImageDataGenerator(rescale = 1./255)
training_set = train_datagen.flow_from_directory('E:\\NUS\\Interview dataset\\orig\\Train',
target_size = (100, 100),
batch_size = 32,
classes = ['0','1'],                                                 
class_mode = 'binary')
test_set = test_datagen.flow_from_directory('E:\\NUS\\Interview dataset\\orig\\Test',
target_size = (100, 100),
batch_size = 32,
class_mode = 'binary')


Found 135 images belonging to 2 classes.
Found 125 images belonging to 2 classes.


In [3]:
print(test_set.image_shape)
Y_test = test_set.classes

(100, 100, 3)


In [4]:
Y_train = training_set.classes
training_set.image_shape

(100, 100, 3)

## Baseline Model

Simple CNN model with 2 layer convolutions with a filter size 3 * 3. The model runs for 25 epochs.

In [15]:
#Importing the Keras libraries and packages
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
# Initialising the CNN
classifier = Sequential()
# Step 1 - Convolution
classifier.add(Conv2D(32, (3, 3), input_shape = (100, 100, 3), activation = 'relu')) 
# Step 2 - Pooling
classifier.add(MaxPooling2D(pool_size = (2, 2)))
# Adding a second convolutional layer
classifier.add(Conv2D(32, (3, 3), activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))
# Step 3 - Flattening
classifier.add(Flatten())
# Step 4 - Full connection
classifier.add(Dense(units = 128, activation = 'relu'))
classifier.add(Dense(units = 1, activation = 'sigmoid'))
# Compiling the CNN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

print(classifier.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_7 (Conv2D)            (None, 98, 98, 32)        896       
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 49, 49, 32)        0         
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 47, 47, 32)        9248      
_________________________________________________________________
max_pooling2d_8 (MaxPooling2 (None, 23, 23, 32)        0         
_________________________________________________________________
flatten_4 (Flatten)          (None, 16928)             0         
_________________________________________________________________
dense_7 (Dense)              (None, 128)               2166912   
_________________________________________________________________
dense_8 (Dense)              (None, 1)                 129       
Total para

In [33]:
classifier.fit_generator(training_set,
steps_per_epoch = 10,
epochs = 25,
validation_data = test_set,
validation_steps = 10)



Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<keras.callbacks.History at 0x17201232f60>

In [5]:
y = training_set.classes


### Confusion Matrix
The model doesn't classify label 3 ie class 2 properly because the number of images is half of first class. Hence in the following model we have created a dataset with 2 times the existing images of label 3 (upsampling). 


In [35]:
#Confution Matrix and Classification Report
#https://gist.github.com/RyanAkilos/3808c17f79e77c4117de35aa68447045

import numpy as np
from sklearn.metrics import classification_report, confusion_matrix

Y_pred = classifier.predict_generator(test_set)
y_pred = np.round(Y_pred)
print('Confusion Matrix')
print(confusion_matrix(test_set.classes, y_pred))
print('Classification Report')
target_names = ['0', '1']
print(classification_report(test_set.classes, y_pred, target_names=target_names))




Confusion Matrix
[[55 25]
 [33 12]]
Classification Report
             precision    recall  f1-score   support

          0       0.62      0.69      0.65        80
          3       0.32      0.27      0.29        45

avg / total       0.52      0.54      0.52       125




## Fine tuning parameters

In [14]:
# Fitting the CNN to the upsampled images for label 3

from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(rescale = 1./255,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True)
test_datagen = ImageDataGenerator(rescale = 1./255)
training_set = train_datagen.flow_from_directory('E:\\NUS\\Interview dataset\\orig\\Train',
target_size = (64, 64),
batch_size = 32,
class_mode = 'binary')
test_set = test_datagen.flow_from_directory('E:\\NUS\\Interview dataset\\orig\\Test',
target_size = (64, 64),
batch_size = 32,
class_mode = 'binary')
classifier.fit_generator(training_set,
steps_per_epoch = 50,
epochs = 25,
validation_data = test_set,
validation_steps = 50)


Found 500 images belonging to 2 classes.
Found 120 images belonging to 2 classes.
Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<keras.callbacks.History at 0x202316ac5c0>

In [15]:
#Confution Matrix and Classification Report
#https://gist.github.com/RyanAkilos/3808c17f79e77c4117de35aa68447045

import numpy as np
from sklearn.metrics import classification_report, confusion_matrix

Y_pred = classifier.predict_generator(test_set, 120/ 33)
y_pred = np.argmax(Y_pred, axis=1)
print('Confusion Matrix')
print(confusion_matrix(test_set.classes, y_pred))
print('Classification Report')
target_names = ['0', '1']
print(classification_report(test_set.classes, y_pred, target_names=target_names))




Confusion Matrix
[[60  0]
 [60  0]]
Classification Report
             precision    recall  f1-score   support

          0       0.50      1.00      0.67        60
          3       0.00      0.00      0.00        60

avg / total       0.25      0.50      0.33       120



  'precision', 'predicted', average, warn_for)


In [17]:
test_set.classes


array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

## 2. Deeper CNN 

A deeper model compared to previous simple CNN is used.

In [37]:
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense

model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(100, 100, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(256, (5, 5)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

# the model so far outputs 3D feature maps (height, width, features)
model.add(Flatten())  # this converts our 3D feature maps to 1D feature vectors
model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])


model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])




In [41]:
from keras.utils.vis_utils import model_to_dot
from keras.utils import plot_model
plot_model(model, to_file='model_shapes.png', show_shapes=True)

In [38]:
model.fit_generator(training_set,
steps_per_epoch = 10,
epochs = 25,
validation_data = test_set,
validation_steps = 10)


Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<keras.callbacks.History at 0x17201253898>

## Confusion Matrix

Similar results to Model 1 were obtained.

In [40]:
#Confution Matrix and Classification Report
#https://gist.github.com/RyanAkilos/3808c17f79e77c4117de35aa68447045

import numpy as np
from sklearn.metrics import classification_report, confusion_matrix

Y_pred = model.predict_generator(test_set)
y_pred = np.round(Y_pred)
print('Confusion Matrix')
print(confusion_matrix(test_set.classes, y_pred))
print('Classification Report')
target_names = ['0', '3']
print(classification_report(test_set.classes, y_pred, target_names=target_names))




Confusion Matrix
[[57 23]
 [33 12]]
Classification Report
             precision    recall  f1-score   support

          0       0.63      0.71      0.67        80
          3       0.34      0.27      0.30        45

avg / total       0.53      0.55      0.54       125

