<a href="https://colab.research.google.com/github/singhayushh/EC881--Assignment/blob/linux/DiabeticRetinopathy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Overview

The goal is to make a highly accurate diabetic retinopathy model by using different CNN architectures (viz. MobileNet, EfficientNet, Inception V3 and ResNet) for a comparative study and push the best working model to deployment for live usage of the AI for actual patient images in various medicinal institutes.

The model can be massively improved with:

- high-resolution images
- better data sampling
- ensuring there is no leaking between training and validation sets.
- better target variable (age) normalization
- pretrained models
- attention/related techniques to focus on areas

************

### Authors

- [Ayush Singh](https://github.com/singhayushh)
- [Agni Sain](https://linkedin.com/in/)
- [Mayukh Sen](https://linkedin.com/in/)
- [Aryan Shaw](https://linkedin.com/in/)

************

### The Model

The model we create will be run through four different CNN architectures:
- MobileNet v3
- Inception v3
- ResNet
- EfficientNet

All four will be trained and tested on the same data and based on the output, the most accurate model will be used in the live production environment.


## 1. Download Data

#### 1.1. Download kaggle.json

We start by uploading the kaggle.json in order to use datasets from kaggle. This json file can be download from Account > API section of your kaggle profile.

In [None]:
# install kaggle cli
!pip install kaggle

# download kaggle.json from user prompt
from google.colab import files
files.upload()

#### 1.2. Create ~/.kaggle dir

Create a new ~/.kaggle directory at the project root. We also need to move the kaggle.json to ~/.kaggle

In [None]:
# create .kaggle directory at root to save kaggle.json
!mkdir ~/.kaggle

# move downloaded kaggle.json to the new directory
!mv ./kaggle.json ~/.kaggle/

# allow execute permissions
!chmod 600 ~/.kaggle/kaggle.json

#### 1.3. Download dataset

In this step, we will download the 'diabetic-retinopathy-detection` dataset from as well as extract its zip contents to train and test directories.

In [None]:
# download dataset from kaggle cli
!kaggle competitions download -c 'diabetic-retinopathy-detection'

Downloading diabetic-retinopathy-detection.zip to /content
100% 82.2G/82.2G [06:38<00:00, 228MB/s]
100% 82.2G/82.2G [06:38<00:00, 222MB/s]


In [None]:
# Extract the dataset
!7za x 'diabetic-retinopathy-detection.zip'

# Create train and test directories
!mkdir train
!mkdir test

# Move ZIP files to their directories
!mv train.* train
!mv test.* test

# Extract data
!7za x train.zip.001
!7za x test.zip.001

## 2. Image Preprocessing

#### 2.1. Crop and Resize

All images were scaled down to 256 by 256. Despite taking longer to train, the detail present in photos of this size is much greater then at 128 by 128.

In [None]:
# package imports
import os
import sys
from PIL import ImageFile
from skimage import io
from skimage.transform import resize
import numpy as np

ImageFile.LOAD_TRUNCATED_IMAGES = True

# utility function to create directory with given name if absent
def create_directory(directory):
    if not os.path.exists(directory):
        os.makedirs(directory)

# crop and resize given image and save to new path
def crop_and_resize_images(path, new_path, cropx, cropy, img_size=256):
    create_directory(new_path)
    dirs = [l for l in os.listdir(path) if l != '.DS_Store']
    total = 0
    for item in dirs:
        img = io.imread(path+item)
        y,x,channel = img.shape
        startx = x//2-(cropx//2)
        starty = y//2-(cropy//2)
        img = img[starty:starty+cropy,startx:startx+cropx]
        img = resize(img, (256,256))
        io.imsave(str(new_path + item), img)
        total += 1
        print("Saving: ", item, total)


if __name__ == '__main__':
    crop_and_resize_images(path='/content/train/', new_path='/content/train-resized-256/', cropx=1800, cropy=1800, img_size=256)
    crop_and_resize_images(path='/content/test/', new_path='/content/test-resized-256/', cropx=1800, cropy=1800, img_size=256)

#### 2.2. Training data pruning

Scikit-Image raised multiple warnings during resizing, due to these images having no color space. Because of this, any images that were completely black were removed from the training data.

In [None]:
# package imports
import time
import numpy as np
import pandas as pd
from PIL import Image

# create a 'image' named column with non-black images
def find_black_images(file_path, df):
    lst_imgs = [l for l in df['image']]
    return [1 if np.mean(np.array(Image.open(file_path + img))) == 0 else 0 for img in lst_imgs]

if __name__ == '__main__':
    start_time = time.time()
    trainLabels = pd.read_csv('/content/labels/trainLabels.csv')

    trainLabels['image'] = [i + '.jpeg' for i in trainLabels['image']]
    trainLabels['black'] = np.nan

    trainLabels['black'] = find_black_images('/content/train-resized-256/', trainLabels)
    trainLabels = trainLabels.loc[trainLabels['black'] == 0]
    trainLabels.to_csv('trainLabels_master.csv', index=False, header=True)

    print("Completed")
    print("--- %s seconds ---" % (time.time() - start_time))

#### 2.3. Image Rotation

In order to reduce noise from the images, all images were rotated and mirrored.

In [None]:
# packages import
import pandas as pd
import numpy as np
from skimage import io
from skimage.transform import rotate
from cv2 import cv2
import os
import time

def rotate_images(file_path, degrees_of_rotation, lst_imgs):
    for l in lst_imgs:
        img = io.imread(file_path + str(l) + '.jpeg')
        img = rotate(img, degrees_of_rotation)
        io.imsave(file_path + str(l) + '_' + str(degrees_of_rotation) + '.jpeg', img)


def mirror_images(file_path, mirror_direction, lst_imgs):
    for l in lst_imgs:
        img = cv2.imread(file_path + str(l) + '.jpeg')
        img = cv2.flip(img, 1)
        cv2.imwrite(file_path + str(l) + '_mir' + '.jpeg', img)

if __name__ == '__main__':
    start_time = time.time()
    trainLabels = pd.read_csv("/content/labels/trainLabels_master.csv")

    trainLabels['image'] = trainLabels['image'].str.rstrip('.jpeg')
    trainLabels_no_DR = trainLabels[trainLabels['level'] == 0]
    trainLabels_DR = trainLabels[trainLabels['level'] >= 1]

    lst_imgs_no_DR = [i for i in trainLabels_no_DR['image']]
    lst_imgs_DR = [i for i in trainLabels_DR['image']]

    # Mirror Images with no DR one time
    print("Mirroring Non-DR Images")
    mirror_images('/content/train-resized-256/', 1, lst_imgs_no_DR)


    # Rotate all images that have any level of DR
    print("Rotating 90 Degrees")
    rotate_images('/content/train-resized-256/', 90, lst_imgs_DR)

    print("Rotating 120 Degrees")
    rotate_images('/content/train-resized-256/', 120, lst_imgs_DR)

    print("Rotating 180 Degrees")
    rotate_images('/content/train-resized-256/', 180, lst_imgs_DR)

    print("Rotating 270 Degrees")
    rotate_images('/content/train-resized-256/', 270, lst_imgs_DR)

    print("Mirroring DR Images")
    mirror_images('/content/train-resized-256/', 0, lst_imgs_DR)

    print("Completed")
    print("--- %s seconds ---" % (time.time() - start_time))