## About this kernel

In this kernel, we will explore the complete workflow for the APTOS 2019 competition. We will go through:

1. Loading & Exploration: A quick overview of the dataset
2. Resize Images: We will resize both the training and test images to 224x224, so that it matches the ImageNet format.
3. Mixup & Data Generator: We show how to create a data generator that will perform random transformation to our datasets (flip vertically/horizontally, rotation, zooming). This will help our model generalize better to the data, since it is fairly small (only ~3000 images).
4. Quadratic Weighted Kappa: A thorough overview of the metric used for this competition, with an intuitive example. Check it out!
5. Model: We will use a DenseNet-121 pre-trained on ImageNet. We will finetune it using Adam for 15 epochs, and evaluate it on an unseen validation set.
6. Training & Evaluation: We take a look at the change in loss and QWK score through the epochs.

### Unused Methods

Throughout V15-V18 of this kernel, I ablated a few methods that I presented in this kernel. The highest LB score was achieved after I removed:
* Mixup
* Optimized Threshold

I decided to keep them in the kernel if it ever becomes useful for you.

### Citations & Resources

* I had the idea of using mixup from [KeepLearning's ResNet50 baseline](https://www.kaggle.com/mathormad/aptos-resnet50-baseline). Since the implementation was in PyTorch, I instead used an [open-sourced keras implementation](https://github.com/yu4u/mixup-generator).
* The transfer learning procedure is mostly inspired from my [previous kernel for iWildCam](https://www.kaggle.com/xhlulu/densenet-transfer-learning-iwildcam-2019). The workflow was however heavily modified since then.
* Used similar [method as Abhishek](https://www.kaggle.com/abhishek/optimizer-for-quadratic-weighted-kappa) to find the optimal threshold.
* [Lex's kernel](https://www.kaggle.com/lextoumbourou/blindness-detection-resnet34-ordinal-targets) prompted me to try using Multilabel instead of multiclass classification, which slightly improved the kappa score.

In [None]:
import json
import math
import os

import cv2
from PIL import Image
import numpy as np
from keras import layers
from keras.applications import DenseNet121
from keras.callbacks import Callback, ModelCheckpoint
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.optimizers import Adam
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import cohen_kappa_score, accuracy_score
import scipy
import tensorflow as tf
from tqdm import tqdm,tqdm_notebook

resize_size = 224
%matplotlib inline

Set random seed for reproducibility.

In [None]:
np.random.seed(2019)
tf.set_random_seed(2019)

# Loading & Exploration

In [None]:
train_df = pd.read_csv('../input/aptos2019-blindness-detection/train.csv')
test_df = pd.read_csv('../input/aptos2019-blindness-detection/test.csv')
print(train_df.shape)
print(test_df.shape)
train_df.head()

In [None]:
train_df['diagnosis'].hist()
train_df['diagnosis'].value_counts()

### Displaying some Sample Images

In [None]:
def display_samples(df, columns=3, rows=1, diag=1,start=0.5):
    fig=plt.figure(figsize=(5*columns, 4*rows))

    df = df[df['diagnosis']==diag]
    df=df.reset_index()
    
    l=int(len(df)*start)
    
    for i in range(columns*rows):
        #id of each pic
        image_path = df.loc[l+i,'id_code']
        #severity of each pic
        image_id = df.loc[l+i,'diagnosis']
        img = cv2.imread(f'../input/aptos2019-blindness-detection/train_images/{image_path}.png')
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        
        fig.add_subplot(rows, columns, i+1)
        plt.title(image_id)
        plt.imshow(img)
    
    plt.tight_layout()
    
    
for i in range(5):
    display_samples(train_df, diag=i, start=0.5)

# Resize Images

We will resize the images to 224x224, then create a single numpy array to hold the data.

In [None]:
def get_pad_width(im, new_shape, is_rgb=True):
    pad_diff = new_shape - im.shape[0], new_shape - im.shape[1]
    t, b = math.floor(pad_diff[0]/2), math.ceil(pad_diff[0]/2)
    l, r = math.floor(pad_diff[1]/2), math.ceil(pad_diff[1]/2)
    if is_rgb:
        pad_width = ((t,b), (l,r), (0, 0))
    else:
        pad_width = ((t,b), (l,r))
    return pad_width

def preprocess_image(image_path, desired_size=224):
    im = Image.open(image_path)
    im = im.resize((desired_size, )*2, resample=Image.LANCZOS)
    
    return im

In [None]:
#resize the train pics to 224*224*3

N = train_df.shape[0]
x_train_rsz = np.empty((N, resize_size, resize_size, 3), dtype=np.uint8)

for i, image_id in enumerate(tqdm_notebook(train_df['id_code'])):
    x_train_rsz[i, :, :, :] = preprocess_image(
        f'../input/aptos2019-blindness-detection/train_images/{image_id}.png',desired_size = resize_size
    )

In [None]:
#resize the test pics to 224*224*3

N = test_df.shape[0]
x_test_rsz = np.empty((N, resize_size, resize_size, 3), dtype=np.uint8)

for i, image_id in enumerate(tqdm_notebook(test_df['id_code'])):
    x_test_rsz[i, :, :, :] = preprocess_image(
        f'../input/aptos2019-blindness-detection/test_images/{image_id}.png',desired_size = resize_size
    )

# Substract the local average color
I blur the reisized image using the Gaussian filter, and then I substract the blur image from the original one.

In [None]:
#cv2.addWeight mix 2 images
#cv2.addWeight(src1, alpha, src2, beta, gamma)
#src1:1st image
#alpha:weight of the first array elements
#src2:2nd image
#beta:weight of the second array elements
#gamma:scalar added to each sum

scale=300
plt.figure(figsize=(20,10))

a=x_train_rsz[0]
b=cv2.GaussianBlur(a,(0,0),scale/30)
c=cv2.addWeighted(a,4,b,-4,128)
plt.subplot(2,3,1)
plt.title('original image')
plt.imshow(a)
plt.subplot(2,3,2)
plt.title('blur image')
plt.imshow(b)
plt.subplot(2,3,3)
plt.title('original-blur')
plt.imshow(c)

In [None]:
#Substracted the local average color
scale=300

N = train_df.shape[0]
x_train = np.empty((N, resize_size, resize_size, 3), dtype=np.uint8)

for i in tqdm_notebook(range(len(x_train))):
    a=x_train_rsz[i]
    x_train[i]=cv2.addWeighted(a, 4,
                               cv2.GaussianBlur(a,(0,0), scale/30), -4,128)

In [None]:
#Substracted the local average color
N = test_df.shape[0]
x_test = np.empty((N, resize_size, resize_size, 3), dtype=np.uint8)


for i in tqdm_notebook(range(len(x_test))):
    a=x_test_rsz[i]
    x_test[i]=cv2.addWeighted(a, 4,cv2.GaussianBlur(a,(0,0), scale/30), -4,128)

In [None]:
#1-hot representation
y_train = pd.get_dummies(train_df['diagnosis']).values

#x_train.shape=(3662,224,224,3)
print(x_train.shape)
#y_train.shape=(3662,5)
print(y_train.shape)
#x_test.shape=(1928,224,224,3)
print(x_test.shape)

In [None]:
def show_sample(pics,label,row=3,columns=3):
    fig=plt.figure(figsize=(5*columns, 4*row))
    
    for i in range(row*columns):
        fig.add_subplot(row, columns, i+1)
        plt.title(label[i])
        plt.imshow(pics[i])
show_sample(x_train, y_train)

## increase data (diag=4)

In [None]:
l = len(x_train)
for j in range(2):
    for i in tqdm_notebook(range(l)):
        if (y_train[i] == np.array([0,0,0,0,1])).all():
            x_train = np.vstack((x_train, x_train[i].reshape(1,resize_size,resize_size,3)))
            y_train = np.vstack((y_train, y_train[i].reshape(1,5)))

print(len(x_train))
print(len(y_train))

## Creating multilabels

Instead of predicting a single label, we will change our target to be a multilabel problem; i.e., if the target is a certain class, then it encompasses all the classes before it. E.g. encoding a class 4 retinopathy would usually be `[0, 0, 0, 1]`, but in our case we will predict `[1, 1, 1, 1]`. For more details, please check out [Lex's kernel](https://www.kaggle.com/lextoumbourou/blindness-detection-resnet34-ordinal-targets).

In [None]:
print(y_train.shape)

y_train_multi = np.empty(y_train.shape, dtype=y_train.dtype)
y_train_multi[:, 4] = y_train[:, 4]

for i in range(3, -1, -1):
    y_train_multi[:, i] = np.logical_or(y_train[:, i], y_train_multi[:, i+1])

print("Original y_train:", y_train.sum(axis=0))
print("Multilabel version:", y_train_multi.sum(axis=0))

Now we can split it into a training and validation set.

In [None]:
print(x_train.shape)
print(y_train_multi.shape)

x_train, x_val, y_train, y_val = train_test_split(
    x_train, y_train_multi,  
#     x_train, y_train, 
    test_size=0.15, 
    random_state=2019
)

> 









































# Mixup & Data Generator

Please Note: Although I show how to construct Mixup, **it is currently unused**. Please see notice at the top of the kernel.

In [None]:
class MixupGenerator():
    def __init__(self, X_train, y_train, batch_size=32, alpha=0.2, shuffle=True, datagen=None):
        self.X_train = X_train
        self.y_train = y_train
        self.batch_size = batch_size
        self.alpha = alpha
        self.shuffle = shuffle
        self.sample_num = len(X_train)
        self.datagen = datagen

    def __call__(self):
        while True:
            indexes = self.__get_exploration_order()
            itr_num = int(len(indexes) // (self.batch_size * 2))

            for i in range(itr_num):
                batch_ids = indexes[i * self.batch_size * 2:(i + 1) * self.batch_size * 2]
                X, y = self.__data_generation(batch_ids)

                yield X, y

    def __get_exploration_order(self):
        indexes = np.arange(self.sample_num)

        if self.shuffle:
            np.random.shuffle(indexes)

        return indexes

    def __data_generation(self, batch_ids):
        _, h, w, c = self.X_train.shape
        l = np.random.beta(self.alpha, self.alpha, self.batch_size)
        X_l = l.reshape(self.batch_size, 1, 1, 1)
        y_l = l.reshape(self.batch_size, 1)

        X1 = self.X_train[batch_ids[:self.batch_size]]
        X2 = self.X_train[batch_ids[self.batch_size:]]
        X = X1 * X_l + X2 * (1 - X_l)

        if self.datagen:
            for i in range(self.batch_size):
                X[i] = self.datagen.random_transform(X[i])
                X[i] = self.datagen.standardize(X[i])

        if isinstance(self.y_train, list):
            y = []

            for y_train_ in self.y_train:
                y1 = y_train_[batch_ids[:self.batch_size]]
                y2 = y_train_[batch_ids[self.batch_size:]]
                y.append(y1 * y_l + y2 * (1 - y_l))
        else:
            y1 = self.y_train[batch_ids[:self.batch_size]]
            y2 = self.y_train[batch_ids[self.batch_size:]]
            y = y1 * y_l + y2 * (1 - y_l)

        return X, y

In [None]:
BATCH_SIZE = 32

def create_datagen():
    return ImageDataGenerator(
        # randamly expand or shrink between 1-zoom_range~1+zoom_range
        zoom_range=0.15,  # set range for random zoom
        # set mode for filling points outside the input boundaries
        # fill in the margin with the color designated by cval
        fill_mode='constant',
        cval=0.,  # value used for fill_mode = "constant"
        horizontal_flip=True,  # randomly flip images
        vertical_flip=True,  # randomly flip images
        brightness_range=[0.3,1.0] #add by Daisuke
    )

# Using original generator
data_generator = create_datagen().flow(x_train, y_train, batch_size=BATCH_SIZE, seed=2019)
# Using Mixup
mixup_generator = MixupGenerator(x_train, y_train, batch_size=BATCH_SIZE, alpha=0.2, datagen=create_datagen())()

print(y_train[0:5])

### What is the weighted kappa?

The wikipedia page offer a very concise explanation: 
> The weighted kappa allows disagreements to be weighted differently and is especially useful when **codes are ordered**. Three matrices are involved, the matrix of observed scores, the matrix of expected scores based on chance agreement, and the weight matrix. Weight matrix cells located on the diagonal (upper-left to bottom-right) represent agreement and thus contain zeros. Off-diagonal cells contain weights indicating the seriousness of that disagreement.

Simply put, if two scores disagree, then the penalty will depend on how far they are apart. That means that our score will be higher if (a) the real value is 4 but the model predicts a 3, and the score will be lower if (b) the model instead predicts a 0. This metric makes sense for this competition, since the labels 0-4 indicates how severe the illness is. Intuitively, a model that predicts a severe retinopathy (3) when it is in reality a proliferative retinopathy (4) is probably better than a model that predicts a mild retinopathy (1).

### Creating keras callback for QWK

In [None]:
class Metrics(Callback):
    def on_train_begin(self, logs={}):
        self.val_kappas = []

    def on_epoch_end(self, epoch, logs={}):
        X_val, y_val = self.validation_data[:2]
        y_val = y_val.sum(axis=1) - 1
        
        y_pred = self.model.predict(X_val) > 0.5
        y_pred = y_pred.astype(int).sum(axis=1) - 1
        
        _val_kappa = cohen_kappa_score(
            y_val,
            y_pred, 
            weights='quadratic'
        )

        self.val_kappas.append(_val_kappa)

        print(f"val_kappa: {_val_kappa:.4f}")
        
        if _val_kappa == max(self.val_kappas):
            print("Validation Kappa has improved. Saving model.")
            self.model.save('model.h5')

        return

# Model: DenseNet-121

In [None]:
densenet = DenseNet121(
    weights='../input/densenet-keras/DenseNet-BC-121-32-no-top.h5',
    include_top=False,
#     input_shape=(224,224,3)
    input_shape=(resize_size,resize_size,3)
)

In [None]:
def build_model():
    model = Sequential()
    model.add(densenet)
    model.add(layers.GlobalAveragePooling2D())
    model.add(layers.Dropout(0.5))
    model.add(layers.Dense(5, activation='sigmoid'))
#     model.add(layers.Dense(5, activation='softmax'))
    
    model.compile(
        loss='binary_crossentropy',
#         loss='categorical_crossentropy',
        optimizer=Adam(lr=0.00005),
        metrics=['accuracy']
    )
    
    return model

In [None]:
model = build_model()
# model.summary()

# Training & Evaluation

In [None]:
kappa_metrics = Metrics()

history = model.fit_generator(
    data_generator,
    steps_per_epoch=x_train.shape[0] / BATCH_SIZE,
    epochs=20,
    validation_data=(x_val, y_val),
    callbacks=[kappa_metrics]
)

In [None]:
with open('history.json', 'w') as f:
    json.dump(history.history, f)

history_df = pd.DataFrame(history.history)
history_df[['loss', 'val_loss']].plot()
history_df[['acc', 'val_acc']].plot()

print(history_df)

In [None]:
plt.plot(kappa_metrics.val_kappas)

print(kappa_metrics.val_kappas)

## Further Analysis

In [None]:
#the output of the model(model.predict(x_val)) is probability
y_val_pro=model.predict(x_val)
print(y_val_pro[32])
y_val_pre = y_val_pro > 0.5
print(y_val_pre[32])
a=y_val_pre.astype(int)
y_val_pre = y_val_pre.astype(int).sum(axis=1) - 1
y_val_act=y_val.astype(int).sum(axis=1)-1
diff=[]
diff=y_val_act-y_val_pre
diff_col=np.where(diff!=0)
diff_col_ov2=np.where(np.abs(diff)>1)
same_col=np.where(diff==0)

#Cohen kappa
QWK=cohen_kappa_score(y_val_act,y_val_pre,weights='quadratic')

print('Accuracy=%.2f％(%.d/%.d)'%(accuracy_score(y_val_act, y_val_pre)*100,len(same_col[0]),len(y_val_act)))
print('diff=2 is %.d'%len(diff_col_ov2[0]))
print('Cohen kappa=%.4f'%QWK)

In [None]:
from sklearn.metrics import confusion_matrix
y_val_act-y_val_pre
# fig, ax = plt.subplots(figsize=(6, 6))
# h, xedges, yedges, img = ax.hist2d(y_val_act, y_val_pre, bins=5)

# # bins を描画する。
# ax.set_xticks(xedges)
# ax.set_yticks(yedges)
# ax.grid()

# plt.show()

In [None]:
#Wrong Classification
L = 3
M = 3
plt.figure(figsize=(12,24))
for i in range(L*M):
    plt.subplot(M,L,i+1)
    plt.title('model:'+str(y_val_pre[diff_col[0][i]])+
              ',act:'+str(y_val_act[diff_col[0][i]])+'\n'+str((y_val_pro[diff_col[0][i]]).round(3)))
    plt.imshow(x_val[diff_col[0][i]])
plt.tight_layout()  

# #Same Classification
# plt.figure(figsize=(20,10))
# for i in range(L*L):
#     plt.subplot(L,L,i+1)
#     plt.title('model:'+str(y_val_pre[same_col[0][i]])+',act:'+str(y_val_act[same_col[0][i]]))
#     plt.imshow(x_val[same_col[0][i]])
# plt.tight_layout()

## Random Forest

In [None]:
from sklearn.ensemble import RandomForestClassifier as RFC

y_train_pro = model.predict(x_train)
print("the output of CNN is the probability of diagnosis")
print(y_train_pro[32])

In [None]:
clf=RFC(n_estimators=20,max_depth=10)
y_train_act=y_train.sum(axis=1)-1
print(y_train_act)
clf.fit(y_train_pro, y_train_act)

In [None]:
y_val_pre2=clf.predict(y_val_pro)
diff=[]
diff=y_val_act-y_val_pre2
same_col=np.where(diff==0)
print('Accuracy=%.2f％(%.d/%.d)'%(accuracy_score(y_val_act, y_val_pre2)*100,len(same_col[0]),len(y_val_act)))
print('diff=2 is %.d'%len(diff_col_ov2[0]))
QWK=cohen_kappa_score(y_val_act,y_val_pre2,weights='quadratic')
print('Cohen kappa=%.4f'%QWK)

In [None]:
print(y_val_pre2)
print(y_val_act)
print(cohen_kappa_score(y_val_act,y_val_pre2,weights='quadratic'))
y_val_pre2[3] = 3
print(y_val_pre2)
print(y_val_act)
print(cohen_kappa_score(y_val_act,y_val_pre2,weights='quadratic'))

In [None]:
print(act)
print(pre)

## LightGBM

In [None]:
import lightgbm as lgb
train_data = lgb.Dataset(y_train_pro, label=y_train_act)

params = {
    'task': 'train',
    'boosting_type': 'gbdt',
    'objective': 'multiclass',
    'num_class': 5,
    'verbose': 2,
}

gbm = lgb.train(
    params,
    train_data,
#     valid_sets=eval_data,
    num_boost_round=100,
    verbose_eval=5,
)

In [None]:
preds = gbm.predict(y_val_pro)
y_gbm_pre = []
for i in range(len(preds)):
    y_gbm_pre.append(np.argmax(preds[i]))

## Find best threshold

Please Note: Although I show how to construct a threshold optimizer, **it is currently unused**. Please see notice at the top of the kernel.

In [None]:
model.load_weights('model.h5')
y_val_pred = model.predict(x_val)

def compute_score_inv(threshold):
    y1 = y_val_pred > threshold
    y1 = y1.astype(int).sum(axis=1) - 1
    y2 = y_val.sum(axis=1) - 1
    score = cohen_kappa_score(y1, y2, weights='quadratic')
    
    return 1 - score

simplex = scipy.optimize.minimize(
    compute_score_inv, 0.5, method='nelder-mead'
)

best_threshold = simplex['x'][0]

## Submit

In [None]:
#the output of the model is probability
y_test = model.predict(x_test)
# y_test = clf.predict(y_test)
y_test = gbm.predict(y_test)
y_gbm_pre = []
for i in range(len(y_test)):
    y_gbm_pre.append(np.argmax(y_test[i]))
test_df['diagnosis'] = y_gbm_pre
test_df.to_csv('submission.csv',index=False)

In [None]:
test_df