# Introduction
For my final capstone, I will create a model that processes a piece of an xray of breast tissue and classifies whether or not image contains cancerous tissue. Breast cancer is the most common form of cancer in women, contributing just over a quarter of diagnoses, so having the ability to improve the timeline and accuracy of a diagnosis would greatly improve the disease.

My solution will be valuable because it will capture this information more quickly and accurately than a person could. It would be a model used by hospitals and other healthcare organizations to identify the presence of cancerous tissue. This model would be able to see abnormalities at a much earlier stage which would allow treatment to commence earlier and have a higher success rate. Ideally, the model could be adjusted for other types of tumor or disease classifications.

I will use the [Breast Histopathology Images dataset](https://www.kaggle.com/paultimothymooney/breast-histopathology-images) from the National Library of Medicine. There are 277,524 pieces of 162 images scanned which should be plenty of data to learn from. The data is imbalanced - there is a 34/66 split between the classes. To handle this, I will likely choose a random subset of the images marked IDC negative to ensure I have a balanced dataset.

Because the data is images, I don't believe it doesn't to be cleaned and prepped - it will simply be imported into Python and converted into a dataframe in which each column represents a pixel. I will then use dimensionality reduction and unsupervised learning to obtain in-depth knowledge about the information the data provides and visualize any patterns or clusters the information may fall into. I will need to do some research on how to use supervised learning on image processing/classification as it is not yet clear to me how to do so. Finally, I will use neural networks, and CNN in particular, to classify the scans and create a model that can take an un-labeled scan and report whether or not cancerous tissue is present.

I anticipate importing and processing the images as well as tuning the parameters in the neural networks to be the biggest challenges I'll face. I plan on spending the majority of my time on these two items. For the most part, I should be able to recycle code from previous projects which should expedite my prepping and modeling. I'll do additional research on how best to use neural networks for image classification in an effort to obtain the best model.

In [None]:
def main():
    outPath = '/Users/sophiaperides/Desktop/Thinkful/chest_xrays_folder'
    path = '/Users/sophiaperides/Desktop/Thinkful/chest_xrays_folder'

    # iterate through the names of contents of the folder
    for image_path in os.listdir(path):

        # create the full input path and read the file
        input_path = os.path.join(path, image_path)
        image_to_save = ndimage.imread(input_path)

        # rotate the image
        rotated = ndimage.rotate(image_to_rotate, 45)

        # create full output path, 'example.jpg' 
        # becomes 'rotate_example.jpg', save the file to disk
        fullpath = os.path.join(outPath, 'rotated_'+image_path)
        misc.imsave(fullpath, rotated)

if __name__ == '__main__':
    main()

In [1]:
import os
os.chdir('/Users/sophiaperides/Desktop/Thinkful')
import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
%matplotlib inline
from datetime import datetime, timedelta
import re


# Importing the files
from zipfile import ZipFile
import cv2
from glob import glob
from random import sample

# Preprocessing/Cleaning Libraries
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn import preprocessing
import scipy.stats as stats

# Import various componenets for model building
import tensorflow as tf
import keras
from keras import regularizers
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D
from keras.layers import LSTM, Input, TimeDistributed
from keras.models import Model
from keras.optimizers import RMSprop

# Dimensionality Reduction & Unsupervised Learning
from sklearn.decomposition import PCA
from sklearn.manifold import LocallyLinearEmbedding, TSNE
import umap
from sklearn.cluster import KMeans, AgglomerativeClustering, DBSCAN
from sklearn.mixture import GaussianMixture
# import hdbscan
from sklearn import datasets, metrics
from sklearn.neural_network import BernoulliRBM
from scipy.cluster.hierarchy import dendrogram
from scipy.cluster.hierarchy import linkage as lnkg
from sklearn.neural_network import MLPClassifier


# Import the backend
from keras import backend as K
from keras.layers import Activation, Dropout, Flatten, Dense, BatchNormalization
from keras.preprocessing.image import load_img, ImageDataGenerator, img_to_array
# Data Augmentation
import argparse

Using TensorFlow backend.


# Importing and Prepping the Data
I gravely underestimated how difficult this task would be. I had an initial folder for the data which contained folders for each patient. Each of these folders contained two folders labeled 0 and 1 which contained images with or without cancerous tissue, depending on the folder. At first, I tried to iterate through each patient folder to grab the 0 and 1 folders within, and iterate through each of those folder to grab each image to import it, but was unable to do so. In the end I mapped to the initial folder and iterated through each patient folder to create arrays containing the pixel data for each image. Using the length of these arrays, I was able to create an array with the target values, with 0 for a scan without cancerous tissue and 1 for a scan with.

# Importing the Images by Folder

In [81]:
images_0_8863 = []
for root, dirs, files in os.walk('/Users/sophiaperides/Desktop/Thinkful/breast-histopathology-images/8863/0/'):
    for filename in files:
        image_data = cv2.imread('/Users/sophiaperides/Desktop/Thinkful/breast-histopathology-images/8863/0/'+filename, 0)
#         print(filename)
#         print(image_data.shape)
        if image_data.shape==(50,50):
#             image_data = image_data.reshape(1, 50, 50)
            images_0_8863.append(image_data)
            
images_0_8863 = np.asarray(images_0_8863)    
print('There are {} images'.format(len(images_0_8863)))
print('The type of the data is', type(images_0_8863))
print('The shape of the data is', images_0_8863.shape)
images_0_8863_class = np.zeros((len(images_0_8863),), dtype=int)

There are 768 images
The type of the data is <class 'numpy.ndarray'>
The shape of the data is (768, 50, 50)


In [82]:
# Creating a list of image names
images_1_8863 = []
for root, dirs, files in os.walk('/Users/sophiaperides/Desktop/Thinkful/breast-histopathology-images/8863/1/'):
    for filename in files:
        image_data = cv2.imread('/Users/sophiaperides/Desktop/Thinkful/breast-histopathology-images/8863/1/'+filename, 0)
#         print(filename)
#         print(image_data.shape)
        if image_data.shape==(50,50):
#             image_data = image_data.reshape(2500,)
            images_1_8863.append(image_data)
            
images_1_8863 = np.asarray(images_1_8863)
print('There are {} images'.format(len(images_1_8863)))
print('The type of the data is', type(images_1_8863))
print('The shape of the data is', images_1_8863.shape)
images_1_8863_class = np.ones((len(images_1_8863),), dtype=int)

There are 207 images
The type of the data is <class 'numpy.ndarray'>
The shape of the data is (207, 50, 50)


In [None]:
images_0_8864 = []
for root, dirs, files in os.walk('/Users/sophiaperides/Desktop/Thinkful/breast-histopathology-images/8864/0/'):
    for filename in files:
        image_data = cv2.imread('/Users/sophiaperides/Desktop/Thinkful/breast-histopathology-images/8864/0/'+filename, 0)
#         print(filename)
#         print(image_data.shape)
        if image_data.shape==(50,50):
#             image_data = image_data.reshape(2500,)
            images_0_8864.append(image_data)
            
images_0_8864 = np.asarray(images_0_8864)    
print('There are {} images'.format(len(images_0_8864)))
print('The type of the data is', type(images_0_8864))
print('The shape of the data is', images_0_8864.shape)
images_0_8864_class = np.zeros((len(images_0_8864)), dtype=int)

In [None]:
# Creating a list of image names
images_1_8864 = []
for root, dirs, files in os.walk('/Users/sophiaperides/Desktop/Thinkful/breast-histopathology-images/8864/1/'):
    for filename in files:
        image_data = cv2.imread('/Users/sophiaperides/Desktop/Thinkful/breast-histopathology-images/8864/1/'+filename, 0)
#         print(filename)
#         print(image_data.shape)
        if image_data.shape==(50,50):
#             image_data = image_data.reshape(2500,)
            images_1_8864.append(image_data)
            
images_1_8864 = np.asarray(images_1_8864)
print('There are {} images'.format(len(images_1_8864)))
print('The type of the data is', type(images_1_8864))
print('The shape of the data is', images_1_8864.shape)
images_1_8864_class = np.ones((len(images_1_8864),), dtype=int)

In [None]:
images_0_8865 = []
for root, dirs, files in os.walk('/Users/sophiaperides/Desktop/Thinkful/breast-histopathology-images/8865/0/'):
    for filename in files:
        image_data = cv2.imread('/Users/sophiaperides/Desktop/Thinkful/breast-histopathology-images/8865/0/'+filename, 0)
#         print(filename)
#         print(image_data.shape)
        if image_data.shape==(50,50):
#             image_data = image_data.reshape(2500,)
            images_0_8865.append(image_data)
            
images_0_8865 = np.asarray(images_0_8865)    
print('There are {} images'.format(len(images_0_8865)))
print('The type of the data is', type(images_0_8865))
print('The shape of the data is', images_0_8865.shape)
images_0_8865_class = np.zeros((len(images_0_8865)), dtype=int)

In [None]:
# Creating a list of image names
images_1_8865 = []
for root, dirs, files in os.walk('/Users/sophiaperides/Desktop/Thinkful/breast-histopathology-images/8865/1/'):
    for filename in files:
        image_data = cv2.imread('/Users/sophiaperides/Desktop/Thinkful/breast-histopathology-images/8865/1/'+filename, 0)
#         print(filename)
#         print(image_data.shape)
        if image_data.shape==(50,50):
#             image_data = image_data.reshape(2500,)
            images_1_8865.append(image_data)
            
images_1_8865 = np.asarray(images_1_8865)
print('There are {} images'.format(len(images_1_8865)))
print('The type of the data is', type(images_1_8865))
print('The shape of the data is', images_1_8865.shape)
images_1_8865_class = np.ones((len(images_1_8865),), dtype=int)

# Defining/Cleaning/Normalizing the Data/Images/Pixels

In [None]:
# Create x_train and y_train
print(images_0_8863.shape)
print(images_1_8863.shape)
print(images_0_8863_class.shape)
print(images_1_8863_class.shape)


x_train = np.append(images_0_8863, images_1_8863, axis=0)
y_train = np.append(images_0_8863_class, images_1_8863_class, axis=0)
print('x_train shape:',x_train.shape)
print('y_train shape:',y_train.shape)

# Create x_test and y_test
x_test = np.append(images_0_8865, images_1_8865, axis=0)
y_test = np.append(images_0_8865_class, images_1_8865_class, axis=0)

print('x_test shape:', x_test.shape)
print('y_test shape: ', y_test.shape)

In [5]:
X_1 = [] # training data labeled 1
for root, dirs, files in os.walk('//Users//sophiaperides//Desktop//Thinkful//1'):
    for filename in files:
        image_data = cv2.imread('/Users/sophiaperides/Desktop/Thinkful/1/'+filename, 0)
#         print(filename)
#         print(type(image_data))
#         print(image_data.shape)
        if image_data.shape==(50,50):
#             image_data = image_data.reshape(1, 50, 50)
            X_1.append(image_data)
            
X_1 = np.asarray(X_1)    
print('There are {} images'.format(len(X_1)))
print('The type of the data is', type(X_1))
print('The shape of the data is', X_1.shape)
X_1_target = np.ones((len(X_1),), dtype=int)
X_0_target = np.ones((len(X_1),), dtype=int)

There are 1725 images
The type of the data is <class 'numpy.ndarray'>
The shape of the data is (1725, 50, 50)


In [8]:
X_0 = [] # training data labeled 0
for root, dirs, files in os.walk('//Users//sophiaperides//Desktop//Thinkful//0//'):
    for filename in files:
        image_data = cv2.imread('//Users//sophiaperides//Desktop//Thinkful//0//'+filename, 0)
#         print(filename)
#         print(image_data.shape)
        if image_data.shape==(50,50):
#             image_data = image_data.reshape(1, 50, 50)
            X_0.append(image_data)
            
X_0 = np.asarray(X_0)
print('There are {} images'.format(len(X_0)))
print('The type of the data is', type(X_0))
print('The shape of the data is', X_0.shape)
X_0 = X_0[:(len(X_1)),:,:]
print('There are {} images once sampled'.format(len(X_0)))

There are 7886 images
The type of the data is <class 'numpy.ndarray'>
The shape of the data is (7886, 50, 50)
There are 1725 images once sampled


In [10]:
X_1_test = [] # testing data labeled 1
for root, dirs, files in os.walk('//Users//sophiaperides//Desktop//Thinkful//1_test//'):
    for filename in files:
        image_data = cv2.imread('//Users//sophiaperides//Desktop//Thinkful//1_test//'+filename, 0)
#         print(filename)
#         print(image_data.shape)
        if image_data.shape==(50,50):
#             image_data = image_data.reshape(1, 50, 50)
            X_1_test.append(image_data)
            
X_1_test = np.asarray(X_1_test)    
print('There are {} images'.format(len(X_1_test)))
print('The type of the data is', type(X_1_test))
print('The shape of the data is', X_1_test.shape)
X_1_test_target = np.ones((len(X_1_test),), dtype=int)
X_0_test_target = np.zeros((len(X_1_test),), dtype=int)

There are 987 images
The type of the data is <class 'numpy.ndarray'>
The shape of the data is (987, 50, 50)


In [13]:
X_0_test = [] # testing data labeled 0
for root, dirs, files in os.walk('//Users//sophiaperides//Desktop//Thinkful//0_test//'):
    for filename in files:
        image_data = cv2.imread('//Users//sophiaperides//Desktop//Thinkful//0_test//'+filename, 0)
#         print(filename)
#         print(image_data.shape)
        if image_data.shape==(50,50):
#             image_data = image_data.reshape(1, 50, 50)
            X_0_test.append(image_data)
            
X_0_test = np.asarray(X_0_test)    
print('There are {} images'.format(len(X_0_test)))
print('The type of the data is', type(X_0_test))
print('The shape of the data is', X_0_test.shape)
X_0_test = X_0_test[:(len(X_1_test)),:,:]
print('There are {} images once sampled'.format(len(X_0_test)))

There are 2411 images
The type of the data is <class 'numpy.ndarray'>
The shape of the data is (2411, 50, 50)
There are 987 images once sampled


# Defining/Cleaning/Normalizing the Data/Images/Pixels

In [14]:
# Create x_train and y_train
print(X_0.shape)
print(X_1.shape)
print(X_0_target.shape)
print(X_1_target.shape)


x_train = np.append(X_1, X_0, axis=0)
y_train = np.append(X_1_target, X_0_target, axis=0)
print('x_train shape:',x_train.shape)
print('y_train shape:',y_train.shape)

# Create x_test and y_test
x_test = np.append(X_1_test, X_0_test, axis=0)
y_test = np.append(X_1_test_target, X_0_test_target, axis=0)

print('x_test shape:', x_test.shape)
print('y_test shape: ', y_test.shape)

(1725, 50, 50)
(1725, 50, 50)
(1725,)
(1725,)
x_train shape: (3450, 50, 50)
y_train shape: (3450,)
x_test shape: (1974, 50, 50)
y_test shape:  (1974,)


In [15]:
# Convert to float32 for type consistency
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

# Normalize values to 1 from 0 to 255 (256 values of pixels)
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)
# Convert class vectors to binary class matrices
# So instead of one column with 10 values, create 10 binary columns
y_train = keras.utils.to_categorical(y_train, 2)
y_test = keras.utils.to_categorical(y_test, 2)

x_train shape: (3450, 50, 50)
x_test shape: (1974, 50, 50)


In [16]:
# Per Yunus: first of all keep your images as (1,50,50) not as (1,2500).
img_rows = 50
img_cols = 50
if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

# CNN with TanH Activation

In [70]:
# Building the Model
num_classes = 2
model = Sequential()
# First convolutional layer, note the specification of shape
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='tanh',
                 input_shape=input_shape,  kernel_regularizer=regularizers.l2(0.5)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (4, 4), activation='tanh',  kernel_regularizer=regularizers.l2(0.5)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.3))
model.add(Flatten())
model.add(Dense(50, activation='tanh',  kernel_regularizer=regularizers.l2(0.5)))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax',  kernel_regularizer=regularizers.l2(0.5)))

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])

model.fit(x_train, y_train,
          batch_size=64,
          epochs=4,
          verbose=1,
          validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Train on 3450 samples, validate on 1974 samples
Epoch 1/4

KeyboardInterrupt: 

# Convolutional Neural Network with Relu Activation

In [74]:
# Building the Model
num_classes = 2
model_cnn_relu = Sequential()
# First convolutional layer, note the specification of shape
model_cnn_relu.add(Conv2D(32, kernel_size=(4, 4),
                 activation='relu',
                 input_shape=input_shape))
model_cnn_relu.add(MaxPooling2D(pool_size=(2, 2)))
model_cnn_relu.add(Conv2D(32, (4, 4), activation='relu', kernel_regularizer=regularizers.l1(0.5)))
model_cnn_relu.add(MaxPooling2D(pool_size=(2, 2)))
model_cnn_relu.add(Dropout(0.5))
model_cnn_relu.add(Flatten())
# model_cnn_relu.add(Dense(128, activation='relu', kernel_regularizer=regularizers.l1(0.75)))
# model_cnn_relu.add(Dropout(0.5))
model_cnn_relu.add(Dense(num_classes, activation='softmax'))

model_cnn_relu.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])

model_cnn_relu.fit(x_train, y_train,
          batch_size=128,
          epochs=10,
          verbose=1,
          validation_data=(x_test, y_test))
score = model_cnn_relu.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Train on 3450 samples, validate on 1974 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test loss: 13.224491134844532
Test accuracy: 0.5


# Model 7 with Data Augmentation

In [45]:
train_datagen = ImageDataGenerator(rescale=1./255,
                                   shear_range=0.2,
                                   zoom_range=0.2,
                                   rotation_range=45,
                                   horizontal_flip=True,
                                   vertical_flip=True,
                                   validation_split = .2)
test_datagen = ImageDataGenerator(rescale=1./255,
                                  validation_split = .2)

In [47]:
training_set = train_datagen.flow_from_directory('/Users/sophiaperides/Desktop/Thinkful/breast_tissue_train',
                                       target_size=(50, 50),
                                                batch_size=32,
                                     class_mode='categorical',
                                            subset='training',
                                            color_mode='grayscale')

validation_set = test_datagen.flow_from_directory('/Users/sophiaperides/Desktop/Thinkful/breast_tissue_test',
                                        target_size=(50, 50),
                                                 batch_size=32,
                                      class_mode='categorical',
                                               shuffle = False,
                                           subset='validation',
                                             color_mode='grayscale')

Found 7818 images belonging to 2 classes.
Found 679 images belonging to 2 classes.


In [48]:
# Building the Model
num_classes = 2
model_7 = Sequential()
# First convolutional layer, note the specification of shape
model_7.add(Conv2D(32, kernel_size=(4, 4),
                 activation='relu',
                 input_shape=input_shape))
model_7.add(Conv2D(64, (4, 4), activation='relu', ))
model_7.add(MaxPooling2D(pool_size=(2, 2)))
model_7.add(Dropout(0.25))
model_7.add(Flatten())
model_7.add(Dense(128, activation='relu'))
model_7.add(Dropout(0.5))
model_7.add(Dense(num_classes, activation='softmax'))

model_7.compile(loss=ker`as.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])

model_7.fit_generator(training_set,
                    steps_per_epoch=len(training_set),
                    epochs=10,
                    validation_data=validation_set,
                    validation_steps=len(validation_set))

# Evaluation.
scores = model_7.evaluate(x_test, y_test, verbose=0)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test loss: 0.5953276348814535
Test accuracy: 0.7264437675476074


In [None]:
- keep same activation 
- google collab

In [50]:
# Building the Model
num_classes = 2
model_8 = Sequential()
# First convolutional layer, note the specification of shape
model_8.add(Conv2D(32, kernel_size=(3, 3),
                 activation='tanh',
                 input_shape=input_shape))
model_8.add(MaxPooling2D(pool_size=(2, 2)))
model_8.add(Conv2D(64, (4, 4), activation='tanh', ))
model_8.add(MaxPooling2D(pool_size=(2, 2)))
model_8.add(Dropout(0.25))
model_8.add(Flatten())
model_8.add(Dense(128, activation='tanh'))
model_8.add(Dropout(0.5))
model_8.add(Dense(num_classes, activation='softmax'))


model_8.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])

model_8.fit_generator(training_set,
                    steps_per_epoch=len(training_set),
                    epochs=10,
                    validation_data=validation_set,
                    validation_steps=len(validation_set))

# Evaluation.
scores = model_8.evaluate(x_test, y_test, verbose=0)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test loss: 0.5077049956556028
Test accuracy: 0.7674772143363953


In [55]:
predictions = model_8.predict(x_test)
y_pred = (predictions > 0.5)
matrix = metrics.confusion_matrix(y_test.argmax(axis=1), y_pred.argmax(axis=1))
matrix

array([[908,  79],
       [380, 607]])

|| Class 1 Predicted | Class 2 Predicted   |
|------|------|------|
Actual Class 1| 908 | 79 |
Actual Class 2 | 380 | 607 |

|| Class 1 Predicted | Class 2 Predicted   |
|------|------|------|
Actual Class 1| 70.5% | 11.5% |
Actual Class 2 | 29.5% | 88.5% |

In [63]:
# Building the Model
num_classes = 2
model_8 = Sequential()
# First convolutional layer, note the specification of shape
model_8.add(Conv2D(64, kernel_size=(3, 3),
                 activation='tanh',
                 input_shape=input_shape))
model_8.add(Conv2D(64, (4, 4), activation='tah', padding='same'))
model_8.add(MaxPooling2D(pool_size=(2, 2)))
model_8.add(Dropout(0.35))
model_8.add(Flatten())
model_8.add(Dense(128, activation='tanh', kernel_regularizer=regularizers.l2(1)))
model_8.add(Dropout(0.5))
model_8.add(Dense(num_classes, activation='softmax'))

model_8.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])

model_8.fit_generator(training_set,
                    steps_per_epoch=len(training_set),
                    epochs=10,
                    validation_data=validation_set,
                    validation_steps=len(validation_set))

# Evaluation.
scores = model_8.evaluate(x_test, y_test, verbose=0)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test loss: 0.7974979342117136
Test accuracy: 0.5


In [77]:
classifier = Sequential()

# Step 1 - Convolution
classifier.add(Conv2D(32, (3, 3), padding='same', input_shape=input_shape, activation = 'relu'))
classifier.add(Conv2D(32, (3, 3), activation='relu'))
classifier.add(MaxPooling2D(pool_size=(2, 2)))
classifier.add(Dropout(0.5)) # antes era 0.25

 

# Adding a second convolutional layer
classifier.add(Conv2D(64, (3, 3), padding='same', activation = 'relu'))
classifier.add(Conv2D(64, (3, 3), activation='relu'))
classifier.add(MaxPooling2D(pool_size=(2, 2)))
classifier.add(Dropout(0.25)) # antes era 0.25

 

# Adding a third convolutional layer
classifier.add(Conv2D(64, (3, 3), padding='same', activation = 'relu'))
classifier.add(Conv2D(64, (3, 3), activation='relu'))
classifier.add(MaxPooling2D(pool_size=(2, 2)))
classifier.add(Dropout(0.5)) # antes era 0.25

 

# Step 3 - Flattening
classifier.add(Flatten())

 

# Step 4 - Full connection
classifier.add(Dense(units = 512, activation = 'relu'))
classifier.add(Dropout(0.5))
classifier.add(Dense(units = 2, activation = 'softmax'))

classifier.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])

classifier.fit_generator(training_set,
                    steps_per_epoch=len(training_set),
                    epochs=10,
                    validation_data=validation_set,
                    validation_steps=len(validation_set))

# Evaluation.
scores = classifier.evaluate(x_test, y_test, verbose=0)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test loss: 0.5438061677817757
Test accuracy: 0.7502533197402954


# Demos 

In [84]:
features = np.append(images_0_8863, images_1_8863, axis=0)
target = np.append(images_0_8863_class, images_1_8863_class, axis=0)

x_train, x_test, y_train, y_test = train_test_split(features, target,
                test_size = .3, random_state = 465)

In [85]:
# Building the Model
num_classes = 2
model_8 = Sequential()
# First convolutional layer, note the specification of shape
model_8.add(Conv2D(32, kernel_size=(3, 3),
                 activation='tanh',
                 input_shape=input_shape))
model_8.add(MaxPooling2D(pool_size=(2, 2)))
model_8.add(Conv2D(64, (4, 4), activation='tanh', ))
model_8.add(MaxPooling2D(pool_size=(2, 2)))
model_8.add(Dropout(0.25))
model_8.add(Flatten())
model_8.add(Dense(128, activation='tanh'))
model_8.add(Dropout(0.5))
model_8.add(Dense(num_classes, activation='softmax'))


model_8.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])

model_8.fit_generator(training_set,
                    steps_per_epoch=len(training_set),
                    epochs=10,
                    validation_data=validation_set,
                    validation_steps=len(validation_set))

# Evaluation.
scores = model_8.evaluate(x_test, y_test, verbose=0)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


ValueError: Error when checking input: expected conv2d_90_input to have 4 dimensions, but got array with shape (293, 50, 50)