## Image Preprocessing

Functions that re-size images to 512x512 pixels and move them to the correct data folder in batches

### Imports

In [28]:
# imports
from keras.preprocessing.image import load_img
from keras.preprocessing.image import save_img
from keras.preprocessing.image import img_to_array
from keras.preprocessing.image import smart_resize

import os

### Re-sizing an Image

In [26]:
# This function resizes an image to 512x512 pixels while preserving the aspect ratio

# the function takes in 3 variables:
# filepath is where the original image is located 
# eye_condition is the name of the subfolder the cleaned image will be saved in
# filename is the name of the original image file

# learned about saving images after preprocessing here:
# https://machinelearningmastery.com/how-to-load-convert-and-save-images-with-the-keras-api/

# learned about smart_resize here:
# https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/smart_resize

def image_process(filepath, eye_condition, filename):
    
    # load image
    img = load_img(filepath+filename)

    # convert image to a numpy array
    img_array = img_to_array(img)
    
    # use smart_resize to crop and resize the image to 512x512 while preserving the aspect ratio
    # smart_resize crops the image by taking the shorter edge (height or width) and cropping the longer edge to that size 
    # by trimming an equal amount from either side (i.e., centered cropping)
    # this works very well for eye images since eyes are circles, so the centered cropping isn't going to cut anything important
    img_array = smart_resize(img_array, size=(512,512))

    # cut the file extension off of the filename 
    filename = filename[:-4]
    
    # save the image with a new filename in the cleaned_eye_image directory
    save_img(f'../data/cleaned_eye_images/{eye_condition}/{filename}_cleaned.jpg', img_array)


In [27]:
# testing the image_process function

# (commenting it out because we aren't uploading the original image data to Github)

#image_process('./eye_images/eye_images_4_types/dataset/1_normal/','normal','NL_001.png')

### Batch Re-sizing

In [29]:
# function to cycle through all the images in a folder and process them with the image_process function

# takes in 2 variables:
# filepath is the filepath where the images are located
# eye_condition is for the subfolder name for the cleaned image files

# used the CNN lesson notebook for help with the os.listdir syntax

def batch_images(filepath, eye_condition):
    for file in os.listdir(filepath):
        image_process(filepath, eye_condition, file)

#### Specific folders of images that have been processed:
NOTE: These folder names do not exist in our repository due to the large size of the unzipped files, but dataset information is available in the data folder.

In [30]:
# processing some of the 'normal' non-diseased eyes
batch_images('./eye_images/eye_images_4_types/dataset/1_normal/','normal')

In [31]:
# processing glaucoma eyes
batch_images('./eye_images/eye_images_4_types/dataset/2_glaucoma/','glaucoma')

# processing eyes with cataracts
batch_images('./eye_images/eye_images_4_types/dataset/2_cataract/','cataracts')

# processing eyes with retinopathy
batch_images('./eye_images/eye_images_4_types/dataset/3_retina_disease/','retinopathy')

In [32]:
# adding in more image files from additional sources

# processing eyes with cataracts
batch_images('./eye_images/Cataract/','cataracts')



# These all turned out to be the same as above (same number of files and same file names):

# processing non-diseased eyes
batch_images('./eye_images/Normal/','normal')

# processing glaucoma eyes
batch_images('./eye_images/Glaucoma/','glaucoma')

# processing eyes with retinopathy
batch_images('./eye_images/Retina_disease/','retinopathy')

In [33]:
# processing more eyes with retinopathy
batch_images('./eye_images/dr/','retinopathy')

In [34]:
# processing more glaucoma eyes
batch_images('./eye_images/Glaucoma_left/','glaucoma')
batch_images('./eye_images/Glaucoma_right/','glaucoma')

### Additional Ideas for Preprocessing

[This project on kaggle](https://www.kaggle.com/mohammadasimbluemoon/diabeticretinopathy-messidor-eyepac-preprocessed) suggests subtracting the average local color from images to improve processing time, which is a technique we might consider in future directions for this project, especially since processing efficiency is very important for successful implementation.
