
# Working with Custom Images

So far everything we've worked with has been nicely formatted for us already by Keras.

Let's explore what its like to work with a more realistic data set.

## The Data



ORIGINAL DATA SOURCE:

The dataset contains 2 folders - Infected - Uninfected

And a total of 27,558 images.

Acknowledgements
This Dataset is taken from the official NIH Website: https://ceb.nlm.nih.gov/repositories/malaria-datasets/ 

**Note: We will be dealing with real image files, NOT numpy arrays. Which means a large part of this process will be learning how to work with and deal with large groups of image files. This is too much data to fit in memory as a numpy array, so we'll need to feed it into our model in batches. **

### Visualizing the Data


-------
Let's take a closer look at the data.

In [None]:
import os
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.image import imread
# Technically not necessary in newest versions of jupyter
%matplotlib inline

In [None]:
my_data_dir = 'cell_images'

In [None]:
# CONFIRM THAT THIS REPORTS BACK 'test', and 'train'
os.listdir(my_data_dir) 

In [None]:
test_path = my_data_dir+'\\test\\'
train_path = my_data_dir+'\\train\\'

In [None]:
os.listdir(test_path)

In [None]:
os.listdir(train_path)

In [None]:
os.listdir(train_path+'\\parasitized')[0]

In [None]:
para_cell = train_path+'\\parasitized'+'\\C100P61ThinF_IMG_20150918_144104_cell_162.png'

In [None]:
para_img= imread(para_cell)

In [None]:
plt.imshow(para_img)
plt.show()

In [None]:
para_img.shape

In [None]:
unifected_cell_path = train_path+'\\uninfected\\'+os.listdir(train_path+'\\uninfected')[0]
unifected_cell = imread(unifected_cell_path)
plt.imshow(unifected_cell)
plt.show()

**Let's check how many images there are.**

In [None]:
len(os.listdir(train_path+'\\parasitized'))

In [None]:
len(os.listdir(train_path+'\\uninfected'))

**Let's find out the average dimensions of these images.**

In [None]:
unifected_cell.shape

In [None]:
para_img.shape

In [None]:
# Other options: https://stackoverflow.com/questions/1507084/how-to-check-dimensions-of-all-images-in-a-directory-using-python
dim1 = []
dim2 = []
for image_filename in os.listdir(test_path+'\\uninfected'):
    
    img = imread(test_path+'\\uninfected'+'\\'+image_filename)
    d1,d2,colors = img.shape
    dim1.append(d1)
    dim2.append(d2)

In [None]:
sns.jointplot(dim1)
plt.show()

In [None]:
np.mean(dim1)

In [None]:
np.mean(dim2)

In [None]:
image_shape = (130,130,3)

## Preparing the Data for the model

There is too much data for us to read all at once in memory. We can use some built in functions in Keras to automatically process the data, generate a flow of batches from a directory, and also manipulate the images.

### Image Manipulation

Its usually a good idea to manipulate the images with rotation, resizing, and scaling so the model becomes more robust to different images that our data set doesn't have. We can use the **ImageDataGenerator** to do this automatically for us. Check out the documentation for a full list of all the parameters you can use here!

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

In [None]:
help(ImageDataGenerator)

In [None]:
image_gen = ImageDataGenerator(rotation_range=20, # rotate the image 20 degrees
                               width_shift_range=0.10, # Shift the pic width by a max of 5%
                               height_shift_range=0.10, # Shift the pic height by a max of 5%
                               rescale=1/255, # Rescale the image by normalzing it.
                               shear_range=0.1, # Shear means cutting away part of the image (max 10%)
                               zoom_range=0.1, # Zoom in by 10% max
                               horizontal_flip=True, # Allo horizontal flipping
                               fill_mode='nearest' # Fill in missing pixels with the nearest filled value
                              )

In [None]:
plt.imshow(para_img)
plt.show()

In [None]:
plt.imshow(image_gen.random_transform(para_img))
plt.show()

In [None]:
plt.imshow(image_gen.random_transform(para_img))
plt.show()

### Generating many manipulated images from a directory


In order to use .flow_from_directory, you must organize the images in sub-directories. This is an absolute requirement, otherwise the method won't work. The directories should only contain images of one class, so one folder per class of images.

Structure Needed:

* Image Data Folder
    * Class 1
        * 0.jpg
        * 1.jpg
        * ...
    * Class 2
        * 0.jpg
        * 1.jpg
        * ...
    * ...
    * Class n

In [None]:
image_gen.flow_from_directory(train_path)

In [None]:
image_gen.flow_from_directory(test_path)

# Creating the Model

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation, Dropout, Flatten, Dense, Conv2D, MaxPooling2D

In [None]:
#https://stats.stackexchange.com/questions/148139/rules-for-selecting-convolutional-neural-network-hyperparameters
model = Sequential()

model.add(Conv2D(filters=32, kernel_size=(3,3),input_shape=image_shape, activation='relu',))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(filters=64, kernel_size=(3,3),input_shape=image_shape, activation='relu',))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(filters=64, kernel_size=(3,3),input_shape=image_shape, activation='relu',))
model.add(MaxPooling2D(pool_size=(2, 2)))


model.add(Flatten())


model.add(Dense(128))
model.add(Activation('relu'))

# Dropouts help reduce overfitting by randomly turning neurons off during training.
# Here we say randomly turn off 50% of neurons.
model.add(Dropout(0.5))

# Last layer, remember its binary so we use sigmoid
model.add(Dense(1))
model.add(Activation('sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [None]:
model.summary()

## Early Stopping

In [None]:
from tensorflow.keras.callbacks import EarlyStopping

In [None]:
early_stop = EarlyStopping(monitor='val_loss',patience=2)

## Training the Model

In [None]:
help(image_gen.flow_from_directory)

In [None]:
batch_size = 16

In [None]:
train_image_gen = image_gen.flow_from_directory(train_path,
                                               target_size=image_shape[:2],
                                                color_mode='rgb',
                                               batch_size=batch_size,
                                               class_mode='binary')

In [None]:
test_image_gen = image_gen.flow_from_directory(test_path,
                                               target_size=image_shape[:2],
                                               color_mode='rgb',
                                               batch_size=batch_size,
                                               class_mode='binary',shuffle=False)

In [None]:
train_image_gen.class_indices

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
results = model.fit(train_image_gen,epochs=1,
                              validation_data=test_image_gen,
                             callbacks=[early_stop])

In [None]:
from tensorflow.keras.models import load_model
model.save('malaria_detector.h5')

# Evaluating the Model

In [None]:
losses = pd.DataFrame(model.history.history)

In [None]:
losses[['loss','val_loss']].plot()
plt.show()

In [None]:
model.metrics_names

In [None]:
model.evaluate(test_image_gen)

In [None]:
from tensorflow.keras.preprocessing import image

In [None]:
# https://datascience.stackexchange.com/questions/13894/how-to-get-predictions-with-predict-generator-on-streaming-test-data-in-keras
pred_probabilities = model.predict(test_image_gen)

In [None]:
pred_probabilities

In [None]:
test_image_gen.classes

In [None]:
#predictions = pred_probabilities > 0.5

In [None]:
# Numpy can treat this as True/False for us
predictions

In [None]:
from sklearn.metrics import classification_report,confusion_matrix

In [None]:
print(classification_report(test_image_gen.classes,predictions))

In [None]:
confusion_matrix(test_image_gen.classes,predictions)

# Predicting on an Image

In [None]:
# Your file path will be different!
para_cell

In [None]:
my_image = image.load_img(para_cell,target_size=image_shape)


In [None]:
my_image

In [None]:
type(my_image)

In [None]:
my_image = image.img_to_array(my_image)

In [None]:
type(my_image)

In [None]:
my_image.shape

In [None]:
my_image

In [None]:
my_image = np.expand_dims(my_image, axis=0)

In [None]:
my_image.shape

In [None]:
my_image

In [None]:
model.predict(my_image)

In [None]:
train_image_gen.class_indices

In [None]:
test_image_gen.class_indices

# Great Job!