## APTOS 2019 Blindness Detection challeng
---------------------------------------
we are taking this basic 14 steps.
1. [Import Libraries](#1)
1. [Loading Data ](#2)
1. [Data Visualization](#3)
1. [Train and Test dataset](#4)
1. [Data Pre-Processing](#6)
1. [Image Data Generator](#7)
1. [Model Architecture Design](#8)
1. [Keras Callback Funcations](#9)
1. [Transfer Learning](#10)
1. [Validation Accuracy & Loss](#11)
1. [Validation Accuracy](#12)
1. [Test-Time Augmentation](#13)
1. [Visualization Test Result](#14)
------------------------------------
- Design CNN from Scratch
- Use pre-train model for Blindness Detection
 
 diagnosis Of Diabetic Retinopathy in these five stages
- NO DR
- Mild
- Moderate 
- Servere
- Proliferative DR

<a id="1"></a> 
# Import Libraries

**psutil** library is useful mainly for system monitoring.

**GC** is use for garbage collection.

**tqdm** is use for showing progress.

**math** is for mathemetical operation.


In [None]:
import numpy as np
import pandas as pd
import os
import cv2
import PIL
import gc
import psutil
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from tensorflow import set_random_seed
from tqdm import tqdm
from math import ceil
import math
import sys
print('all data processing libraries imported')

import keras
from keras.preprocessing.image import ImageDataGenerator, load_img, array_to_img, img_to_array
from keras.models import Sequential, Model
from keras.applications.resnet50 import ResNet50, preprocess_input
from keras.layers import Conv2D, MaxPooling2D, GlobalAveragePooling2D, Input
from keras.layers import Dropout, Flatten, Dense
from keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau
from keras.activations import softmax, relu, elu
from keras.optimizers import Adam, rmsprop, RMSprop   ,SGD
from keras.layers import BatchNormalization, LeakyReLU
from tqdm import tqdm
gc.enable()
print('all Deep Learning processing libraries imported')
print(os.listdir("../input/"))

<a id="2"></a>
#### Exploratory Data Analysis
- Load the  Data 
- Data plotting
- Data analysis

Setup all the model


In [None]:
SEED = 7
np.random.seed(SEED)
set_random_seed(SEED)
dir_path = "../input/aptos2019-blindness-detection/"
IMG_DIM = 250  # 224 399 #
BATCH_SIZE = 36
CHANNEL_SIZE = 3
NUM_EPOCHS = 1
TRAIN_DIR = 'train_images'
TEST_DIR = 'test_images'
FREEZE_LAYERS = 2  # freeze the first this many layers for training
CLASS = {0: "No DR", 1: "Mild", 2: "Moderate", 3: "Severe", 4: "Proliferative DR"}

<a id="2"></a>
## Loading Data 

In [None]:
df_train = pd.read_csv(os.path.join(dir_path, "train.csv"))
df_test = pd.read_csv(os.path.join(dir_path, "test.csv"))
NUM_CLASSES = df_train['diagnosis'].nunique()
print(np.shape(df_train))
print(np.shape(df_test))
print("Training set has {} samples and {} classes.".format(df_train.shape[0], df_train.shape[1]))
print("Testing set has {} samples and {} classes.".format(df_test.shape[0], df_test.shape[1]))

<a id="3"></a>
# Data Visualization and Exploratory Data Analysis.
> Data distrubution per class



In [None]:
sample_data = df_train.diagnosis.value_counts()
print(sample_data)
sample_data.plot(kind='bar');
plt.title('number sample per class');
plt.show()
#plt.pie(sample_data,shadow=0, labels=["No DR", "Mild", "Moderate", "Severe", "Proliferative DR"])
#plt.title('sample in pie chart');
#plt.show()

Histogram is clearing showing that training data is Imbalanced. Because in class ‘No DR’ records are approx. 1750 while in class ‘Severe’ less than 250. So, we found that very  imbalancing data set, because of this we have to augmented the data. 

There are many ways to do image data augmentation.


<a id="4"></a>
### Train and Test dataset 


In [None]:
# Train & Test samples ratio
# Plot Data
labels = 'Train', 'Test'
sizes = df_train.shape[0], df_test.shape[0]
colors = 'red', 'blue'
plt.pie(sizes, labels=labels, autopct='%1.1f%%', shadow=0)
plt.show()

<a id="6"></a>
#### Split DataSet by using train_test_split model

In [None]:
x_train, x_test, y_train, y_test = train_test_split(df_train.id_code, df_train.diagnosis, test_size=0.3,
                                                    random_state=SEED, stratify=df_train.diagnosis)
#you can better idea from printing the data.
print(x_train.head())
print(y_train.head())
print(x_test.head())
print(y_test.head())


Data visualization is a process in  AI, which will give you better insight of data.

in the ***draw_img** function (**img, target_dir and class_label** are variables)

In [None]:
def plot_img(img, target_dir, class_label='0'):
    fig, axis = plt.subplots(2, 6, figsize=(15, 6))
    #enumerate for getting row or column line by line.
    for idnx, (idx, row) in enumerate(img.iterrows()):
        imgPath = os.path.join(dir_path, f"{target_dir}/{row['id_code']}.png")
        img = cv2.imread(imgPath)
        row = idnx // 6
        col = idnx % 6
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        axis[row, col].imshow(img)
    plt.suptitle(class_label)
    plt.show()

In [None]:
CLASS_ID = 0
plot_img(df_train[df_train.diagnosis == CLASS_ID].head(12), 'train_images', CLASS[CLASS_ID])

CLASS_ID = 1
plot_img(df_train[df_train.diagnosis == CLASS_ID].head(12), 'train_images', CLASS[CLASS_ID])

CLASS_ID = 2
plot_img(df_train[df_train.diagnosis == CLASS_ID].head(12), 'train_images', CLASS[CLASS_ID])

CLASS_ID = 3
plot_img(df_train[df_train.diagnosis == CLASS_ID].head(12), 'train_images', CLASS[CLASS_ID])

Sample images of dataset.
- As we can see the image shape is not in standard shape, we need to resize data set image.
- Some images are very small, and some are very large they are not in same standard.
- Some are having large black area like image Proliferative[1,2] has lot of black area. Which is not relevant for your problem? May we would be requiring doing the image cropping.


In [None]:
CLASS_ID = 'Test DataSet'
plot_img(df_test.sample(12, random_state=SEED), 'test_images', CLASS_ID)

- In Test data, there are some image are bigger and some are having black area. So, testing images also require doing image pre-processing.  
- May be would be require creating our image Generator.

sys.maxsize predict the maximum number of array here image.

<a id="7"></a>
### GrayScale Images
Converting the Ratina Images into Grayscale. So, we can usnderstand the ROI .

unique : ndarray...The sorted unique values.
.loc : Access a group of rows and columns by label(s) or a boolean array.

In [None]:
# Display some random images from Data Set with class categories ing gray
figure = plt.figure(figsize=(20, 16))
for target_class in (y_train.unique()):
    for i, (idx, row) in enumerate(
            df_train.loc[df_train.diagnosis == target_class].sample(5, random_state=SEED).iterrows()):
        axis = figure.add_subplot(5, 5, target_class * 5 + i + 1)
        image = f"../input/aptos2019-blindness-detection/train_images/{row['id_code']}.png"
        img = cv2.imread(image)
        img=cv2.addWeighted ( img,4, cv2.GaussianBlur( img , (0,0) , IMG_DIM/10) ,-4 ,128) # the trick is to add this line
        img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
        img = cv2.resize(img, (IMG_DIM, IMG_DIM))
        img = cv2.addWeighted(img, 4, cv2.GaussianBlur(img, (0, 0), IMG_DIM / 10), -4, 128)
        plt.imshow(img, cmap='gray')
        axis.set_title(CLASS[target_class])


It's clearly showing, that the image [0,1] has give regin  black around the EYE ball. Which is ust noise, that will not add any value fo model. We need to remove this black area. in my next iteration will work on that to crop black are from image. 

## Image Cropping
Some images has big blank space. they will take only computation power and add noise to model.
So better will will crop the blank spaces from images. 


In [None]:
# Display some random images from Data Set with class categories. showig Gray image removing other channel and adding lighting to image.
figure = plt.figure(figsize=(20, 16))
for target_class in (y_train.unique()):
    #     print(CLASSS[target_class],target_class)
    for i, (idx, row) in enumerate(
            df_train.loc[df_train.diagnosis == target_class].sample(5, random_state=SEED).iterrows()):
        ax = figure.add_subplot(5, 5, target_class * 5 + i + 1)
        imagefile = f"../input/aptos2019-blindness-detection/train_images/{row['id_code']}.png"
        img = cv2.imread(imagefile)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
        img = cv2.resize(img, (IMG_DIM, IMG_DIM))
        img = cv2.addWeighted(img, 4, cv2.GaussianBlur(img, (0, 0), IMG_DIM / 10), -4, 128)
        plt.imshow(img, cmap='gray')
        ax.set_title('%s-%d-%s' % (CLASS[target_class], idx, row['id_code']))
#         print(row['id_code'])
#     plt.show()


In [None]:
# print("available RAM:", psutil.virtual_memory())
gc.collect()
# print("available RAM:", psutil.virtual_memory())

df_train.id_code = df_train.id_code.apply(lambda x: x + ".png")
df_test.id_code = df_test.id_code.apply(lambda x: x + ".png")
df_train['diagnosis'] = df_train['diagnosis'].astype('str')


<a id="7"></a>
# Image Data Generator
In this section willl use Keras ImageDataGenerator class for generating data for Keras model. It is used for data generation, increasing the data size. with the help of ImageDataGenerator we will do image "augment" via a number of random transformations, so that our model would never see twice the exact same picture. 

Training Deep Learning model can perform better with more data, and augementation technique can create variations of data that can increase the ababiliy of fit model to gene



In [None]:
# Creating the imageDatagenerator Instance 
datagenerator=ImageDataGenerator(#rescale=1./255,
#                                       validation_split=0.15, 
                                         horizontal_flip=True,
                                         vertical_flip=True, 
                                         rotation_range=40, 
                                         zoom_range=0.2, 
                                         shear_range=0.1,
                                        fill_mode='nearest')

In [None]:
imgPath = f"../input/aptos2019-blindness-detection/train_images/cd54d022e37d.png"
# Loading image
img = load_img(imgPath)
data = img_to_array(img)
samples =np.expand_dims(data, 0)
i=5
it=datagenerator.flow(samples , batch_size=1)
for i in range(5):
    plt.subplot(230 + 1 + i)
    batch = it.next()
    image = batch[0].astype('uint8')
    plt.imshow(image)
plt.show()

In [None]:
train_datagen = ImageDataGenerator(rescale=1. / 255, 
                                         validation_split=0.15, 
                                         horizontal_flip=True,
                                         vertical_flip=True, 
                                         rotation_range=40, 
                                         zoom_range=0.2, 
                                         shear_range=0.1,
                                        fill_mode='nearest')
# valid_datagen=image.ImageDataGenerator(rescale=1./255)

In [None]:
train_generator = train_datagen.flow_from_dataframe(dataframe=df_train,
                                                    directory="../input/aptos2019-blindness-detection/train_images/",
                                                    x_col="id_code",
                                                    y_col="diagnosis",
                                                    batch_size=BATCH_SIZE,
                                                    class_mode="categorical",
                                                    target_size=(IMG_DIM, IMG_DIM),
                                                    subset='training',
                                                    shaffle=True,
                                                    seed=SEED,
                                                    )
valid_generator = train_datagen.flow_from_dataframe(dataframe=df_train,
                                                    directory="../input/aptos2019-blindness-detection/train_images/",
                                                    x_col="id_code",
                                                    y_col="diagnosis",
                                                    batch_size=BATCH_SIZE,
                                                    class_mode="categorical",
                                                    target_size=(IMG_DIM, IMG_DIM),
                                                    subset='validation',
                                                    shaffle=True,
                                                    seed=SEED
                                                    )
del x_train
# # del x_test
del y_train
# del y_test
gc.collect()
#  color_mode= "grayscale",


<a id="8"></a>
# Model Architecture Design

In [None]:
def design_model():
    model = Sequential()
    model.add(Conv2D(filters=16, kernel_size=(2, 2), input_shape=[IMG_DIM, IMG_DIM, CHANNEL_SIZE], activation=relu))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Conv2D(filters=32, kernel_size=(2, 2), activation=relu))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Conv2D(filters=64, kernel_size=(2, 2), activation=relu))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(GlobalAveragePooling2D())
    model.add(Dense(units=1000, activation=relu))
    model.add(Dropout(rate=0.2))
    model.add(Dense(units=1000, activation=relu))
    model.add(Dropout(rate=0.2))
    model.add(Dense(5, activation='softmax'))
    return model


model = design_model()
# model.summary()


### Compile model

In [None]:
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd,metrics=["accuracy"])

<a id="9"></a>
# Keras Callback Funcations
- Call Back functions Eraly Stoping and Learning Rate Reducing

In [None]:
eraly_stop = EarlyStopping(monitor='val_loss', min_delta=0.0001, patience=3, verbose=1, mode='auto')
# Reducing the Learning Rate if result is not improving. 
reduce_lr = ReduceLROnPlateau(monitor='val_loss', min_delta=0.0004, patience=2, factor=0.1, min_lr=1e-6, mode='auto',
                              verbose=1)

In [None]:
NUB_TRAIN_STEPS = train_generator.n // train_generator.batch_size
NUB_VALID_STEPS = valid_generator.n // valid_generator.batch_size

NUB_TRAIN_STEPS, NUB_VALID_STEPS

In [None]:
# model.fit_generator(generator=train_generator,
#                     validation_data=valid_generator,
#                     steps_per_epoch=STEP_SIZE_TRAIN,
#                     validation_steps=STEP_SIZE_TRAIN,
#                     verbose=1,
#                     callbacks=[checkpoint],
#                     use_multiprocessing=True,
#                     workers=3,
#                     shuffle=True,
#                     max_queue_size=16,
#                     epochs=NB_EPOCHS)


<a id="10"></a>
# Transfer Learning 

In [None]:
def create_resnet(img_dim, CHANNEL, n_class):
    input_tensor = Input(shape=(img_dim, img_dim, CHANNEL))

    base_model = ResNet50(weights=None, include_top=False, input_tensor=input_tensor)
    base_model.load_weights('../input/resnet50weightsfile/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5')
#     base_model.load_weights('../input/restnet101/resnet101_weights_tf.h5')

    

    x = GlobalAveragePooling2D()(base_model.output)
    x = Dropout(0.4)(x)
    x = Dense(2048, activation=elu)(x)
    x = Dropout(0.4)(x)
    x = BatchNormalization()(x)
    x = Dense(1024, activation=elu)(x)
    x = Dropout(0.4)(x)
    x = BatchNormalization()(x)
    x = Dense(512, activation=elu)(x)
    x = Dropout(0.4)(x)
    x = BatchNormalization()(x)
    output_layer = Dense(n_class, activation='softmax', name="Output_Layer")(x)
    model_resnet = Model(input_tensor, output_layer)

    return model_resnet

model_resnet = create_resnet(IMG_DIM, CHANNEL_SIZE, NUM_CLASSES)

In [None]:
# # Layers 
# for i, lay in enumerate(model_resnet.layers):
#     print(i,lay.name)
# Training All Layers

for layers in model_resnet.layers:
    layers.trainable = True


In [None]:
lr = 1e-3
optimizer = SGD(lr=lr, decay=1e-6, momentum=0.9, nesterov=True) # Adam(lr=lr, decay=0.01) 
model_resnet.compile(optimizer=optimizer, loss=keras.losses.categorical_crossentropy, metrics=['accuracy'])
# model.summary()
gc.collect()


In [None]:
history = model_resnet.fit_generator(generator=train_generator,
                                     steps_per_epoch=NUB_TRAIN_STEPS,
                                     validation_data=valid_generator,
                                     validation_steps=NUB_VALID_STEPS,
                                     epochs=NUM_EPOCHS,
                                     #                            shuffle=True,  
                                     callbacks=[eraly_stop, reduce_lr],
                                     verbose=2)
gc.collect()


<a id="11"></a>
# Display Validation Accuracy & Loss


In [None]:
history.history.keys()

In [None]:
accu = history.history['acc']
val_acc = history.history['val_acc']

plt.plot(accu, label="Accuracy")
plt.plot(val_acc)
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
plt.legend(['Acc', 'val_acc'])
plt.plot(np.argmax(history.history["val_acc"]), np.max(history.history["val_acc"]), marker="x", color="r",
         label="best model")
plt.show()



In [None]:
plt.figure(figsize=(8, 8))
plt.title("Learning curve")
plt.plot(history.history["loss"], label="loss")
plt.plot(history.history["val_loss"], label="val_loss")
plt.plot(np.argmin(history.history["val_loss"]), np.min(history.history["val_loss"]), marker="x", color="r",
         label="best model")
plt.xlabel("Epochs")
plt.ylabel("log_loss")
plt.legend();


<a id="12"></a>
## Validation Accuracy

In [None]:
# STEP_SIZE_TEST=test_generator.n//test_generator.batch_size
(eval_loss, eval_accuracy) = tqdm(
    model_resnet.evaluate_generator(generator=valid_generator, steps=NUB_VALID_STEPS, pickle_safe=False))
print("[INFO] accuracy: {:.2f}%".format(eval_accuracy * 100))
print("[INFO] Loss: {}".format(eval_loss))


In [None]:
test_datagen = ImageDataGenerator(rescale=1. / 255, validation_split=0.2, horizontal_flip=True)

test_generator = test_datagen.flow_from_dataframe(dataframe=df_test,
                                                  directory="../input/aptos2019-blindness-detection/test_images/",
                                                  x_col="id_code",
                                                  target_size=(IMG_DIM, IMG_DIM),
                                                  batch_size=1,
                                                  shuffle=False,
                                                  class_mode=None,
                                                  seed=SEED)
# del df_test
print(df_test.shape[0])
# del train_datagen
# del traabsin_generator
gc.collect()


Kapkaha

<a id="13"></a>
# Test-Time Augmentation
In the below section, we are doning TTA imporving the prediction accuracy. It will transform image and predict 

In [None]:
tta_steps = 5
preds_tta = []
for i in tqdm(range(tta_steps)):
    test_generator.reset()
    preds = model_resnet.predict_generator(generator=test_generator, steps=ceil(df_test.shape[0]))
    #     print('Before ', preds.shape)
    preds_tta.append(preds)
#     print(i,  len(preds_tta))


In [None]:
final_pred = np.mean(preds_tta, axis=0)
predicted_class_indices = np.argmax(final_pred, axis=1)
len(predicted_class_indices)


In [None]:
# test_generator.filenames.apply(lambda x: x[-4])
results = pd.DataFrame({"id_code": test_generator.filenames, "diagnosis": predicted_class_indices})
results.id_code = results.id_code.apply(lambda x: x[:-4])  # results.head()
results.to_csv("submission.csv", index=False)


  <a id="14"></a>
 # Visualization Test Result
- this section will visualize the predicted classes of test data.

In [None]:
results['diagnosis'].value_counts().plot(kind='bar')
plt.title('Test Samples Per Class')


References:

1. https://medium.com/@vijayabhaskar96/tutorial-on-keras-imagedatagenerator-with-flow-from-dataframe-8bd5776e45c1
1. https://medium.com/@vijayabhaskar96/tutorial-on-keras-flow-from-dataframe-1fd4493d237c
1. https://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-inception-xception-keras/
1. https://jkjung-avt.github.io/keras-image-cropping/
1. https://www.kaggle.com/aleksandradeis/aptos2019-blindness-detection-eda