Glaucoma Detection with Keras
=======

Retinal images
----------

<img src='glaucomapicture.jpg' width="300" height="300" align="left">

I will need to keep in mind folder structure in the future since I will be porting to sagemaker with a Dockerfile and the folder structure for Sagemaker is particularly important.

    sage
    ├── Dockerfile
    ├── local_test
    │  ├── predict.sh
    │  ├── train.sh
    │  ├── serve.sh
    │  └── test_dir => /opt/ml in container
    │      ├── input
    │      │   ├── config
    │      │   └── data
    │      │       └── training
    │      ├── model
    │      └── output
    └── program
        ├── (some scripts...)
        ├── train
        └── serve

*Setup Folder Tree*

In [24]:
import random
import shutil
import numpy as np
import os
import imgaug
import cv2
import os
import matplotlib
import matplotlib.pyplot as plt
import random
from imgaug import augmenters as iaa
import imageio
import imgaug as ia

%matplotlib inline

ver=cv2.__version__
print('Matplotlib Version: {}'.format(matplotlib.__version__))
if float(ver[0])==3 and float(ver[-3:])<4.2:
    print('Please update OpenCV. This requires a minimum of OpenCV 3.4.2')
    print('Your current version is: ',cv2.__version__)
elif float(ver[0])>3:
    print('OpenCV Version:',cv2.__version__) 
else:
    print('OpenCV Version:',cv2.__version__)

Matplotlib Version: 2.2.2
OpenCV Version: 4.0.0


In [26]:
DirList=['./local_test','./program',\
         './local_test/test_dir',\
         './local_test/test_dir/input',\
         './local_test/test_dir/input/config',\
         './local_test/test_dir/input/data',\
         './local_test/test_dir/input/data/training',\
         './local_test/test_dir/input/data/training/augmentation',\
         './local_test/test_dir/model',\
         './local_test/test_dir/output',\
         './local_test/program',
         './local_test/program/train',
         './local_test/program/serve',\
         './local_test/test_dir/input/data/images']#can add more folders I want to create in this process
for i in DirList:
    try:
        # Create target Directory
        os.mkdir(i)
        print("Directory " ,i,  " Created ") 
    except:
        print("Directory " ,i,  " already exists")

Directory  ./local_test  Created 
Directory  ./program  Created 
Directory  ./local_test/test_dir  Created 
Directory  ./local_test/test_dir/input  Created 
Directory  ./local_test/test_dir/input/config  Created 
Directory  ./local_test/test_dir/input/data  Created 
Directory  ./local_test/test_dir/input/data/training  Created 
Directory  ./local_test/test_dir/input/data/training/augmentation  Created 
Directory  ./local_test/test_dir/model  Created 
Directory  ./local_test/test_dir/output  Created 
Directory  ./local_test/program  Created 
Directory  ./local_test/program/train  Created 
Directory  ./local_test/program/serve  Created 
Directory  ./local_test/test_dir/input/data/images  Created 


*Pull Data*

In [27]:
#Takes a couple of hours to download
#!wget https://dataverse.harvard.edu/api/access/datafile/:persistentId?persistentId=doi:10.7910/DVN/1YRRAC/OGRSQO

*Extract Data*

In [28]:
!unzip processed_data.zip -d ./local_test/test_dir/input/data/images

Archive:  processed_data.zip
  inflating: ./local_test/test_dir/input/data/images/data_description.txt  
   creating: ./local_test/test_dir/input/data/images/advanced_glaucoma/
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/1.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/10.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/100.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/101.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/102.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/103.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/104.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/105.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/106.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/107.png  
  inflating: ./l

  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/192.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/193.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/194.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/195.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/196.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/197.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/198.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/199.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/2.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/20.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/200.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/201.png  
  inflating: ./local_test/test_

  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/288.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/289.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/29.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/290.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/291.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/292.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/293.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/294.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/295.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/296.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/297.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/298.png  
  inflating: ./local_test/tes

  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/38.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/380.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/381.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/382.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/383.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/384.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/385.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/386.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/387.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/388.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/389.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/39.png  
  inflating: ./local_test/test

  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/51.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/52.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/53.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/54.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/55.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/56.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/57.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/58.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/59.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/6.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/60.png  
  inflating: ./local_test/test_dir/input/data/images/advanced_glaucoma/61.png  
  inflating: ./local_test/test_dir/input/

  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/145.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/146.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/147.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/148.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/149.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/15.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/150.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/151.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/152.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/153.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/154.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/155.png  
  inflating: ./local_test/test_dir/input/data/images/early_glauco

  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/24.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/240.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/241.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/242.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/243.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/244.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/245.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/246.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/247.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/248.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/249.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/25.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucom

  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/75.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/76.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/77.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/78.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/79.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/8.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/80.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/81.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/82.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/83.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/84.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/85.png  
  inflating: ./local_test/test_dir/input/data/images/early_glaucoma/86.png  


  inflating: ./local_test/test_dir/input/data/images/normal_control/170.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/171.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/172.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/173.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/174.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/175.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/176.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/177.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/178.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/179.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/18.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/180.png  
  inflating: ./local_test/test_dir/input/data/images/normal_contr

  inflating: ./local_test/test_dir/input/data/images/normal_control/27.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/270.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/271.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/272.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/273.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/274.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/275.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/276.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/277.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/278.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/279.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/28.png  
  inflating: ./local_test/test_dir/input/data/images/normal_contro

  inflating: ./local_test/test_dir/input/data/images/normal_control/368.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/369.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/37.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/370.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/371.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/372.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/373.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/374.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/375.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/376.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/377.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/378.png  
  inflating: ./local_test/test_dir/input/data/images/normal_contr

  inflating: ./local_test/test_dir/input/data/images/normal_control/466.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/467.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/468.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/469.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/47.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/470.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/471.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/472.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/473.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/474.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/475.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/476.png  
  inflating: ./local_test/test_dir/input/data/images/normal_contr

  inflating: ./local_test/test_dir/input/data/images/normal_control/560.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/561.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/562.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/563.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/564.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/565.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/566.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/567.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/568.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/569.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/57.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/570.png  
  inflating: ./local_test/test_dir/input/data/images/normal_contr

  inflating: ./local_test/test_dir/input/data/images/normal_control/656.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/657.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/658.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/659.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/66.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/660.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/661.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/662.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/663.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/664.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/665.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/666.png  
  inflating: ./local_test/test_dir/input/data/images/normal_contr

  inflating: ./local_test/test_dir/input/data/images/normal_control/754.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/755.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/756.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/757.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/758.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/759.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/76.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/760.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/761.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/762.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/763.png  
  inflating: ./local_test/test_dir/input/data/images/normal_control/764.png  
  inflating: ./local_test/test_dir/input/data/images/normal_contr

*Setup Training, Testing, and Validation Folders*

In [29]:
!mv ./local_test/test_dir/input/data/images/advanced_glaucoma/* ./local_test/test_dir/input/data/images/early_glaucoma
!mv ./local_test/test_dir/input/data/images/early_glaucoma ./local_test/test_dir/input/data/images/cases
!mv ./local_test/test_dir/input/data/images/normal_control ./local_test/test_dir/input/data/images/controls
!rm -r ./local_test/test_dir/input/data/images/advanced_glaucoma/

Setup Data Structure
--------

In [30]:
#Create training, testing, validation function, as well as a percentage for data augmentation of each

# the path to the original images
input_data_folder = "local_test/test_dir/input/data/images"
# new directory that will contain our images after computing the training and testing split
data_folder_name = "local_test/test_dir/input/data/training"

# define the percentage of the data that will be used training
training_percent = 0.8

# the amount of validation data will be a percentage of the *training* data
validation_percent = 0.1

#the amount of augmentation needed for images
augment_percent = 0.2

In [31]:
def do_assert(condition, message="Assertion failed."):
    """
    Function that behaves equally to an `assert` statement, but raises an
    Exception.
    This is added because `assert` statements are removed in optimized code.
    It replaces `assert` statements throughout the library that should be
    kept even in optimized code.
    Parameters
    ----------
    condition : bool
        If False, an exception is raised.
    message : str, optional
        Error message.
    """
    if not condition:
        raise AssertionError(str(message))

IMSHOW_BACKEND_DEFAULT = "matplotlib"
def imshow(image, backend=IMSHOW_BACKEND_DEFAULT):
    """
    Shows an image in a window.
    dtype support::
        * ``uint8``: yes; not tested
        * ``uint16``: ?
        * ``uint32``: ?
        * ``uint64``: ?
        * ``int8``: ?
        * ``int16``: ?
        * ``int32``: ?
        * ``int64``: ?
        * ``float16``: ?
        * ``float32``: ?
        * ``float64``: ?
        * ``float128``: ?
        * ``bool``: ?
    Parameters
    ----------
    image : (H,W,3) ndarray
        Image to show.
    backend : {'matplotlib', 'cv2'}, optional
        Library to use to show the image. May be either matplotlib or OpenCV ('cv2').
        OpenCV tends to be faster, but apparently causes more technical issues.
    """
    do_assert(backend in ["matplotlib", "cv2"], "Expected backend 'matplotlib' or 'cv2', got %s." % (backend,))

    if backend == "cv2":
        image_bgr = image
        if image.ndim == 3 and image.shape[2] in [3, 4]:
            image_bgr = image[..., 0:3][..., ::-1]

        win_name = "imgaug-default-window"
        cv2.namedWindow(win_name, cv2.WINDOW_NORMAL)
        cv2.imshow(win_name, image_bgr)
        cv2.waitKey(0)
        cv2.destroyWindow(win_name)
    else:
        # import only when necessary (faster startup; optional dependency; less fragile -- see issue #225)
        import matplotlib.pyplot as plt

        dpi = 96
        h, w = image.shape[0] / dpi, image.shape[1] / dpi
        w = max(w, 6)  # if the figure is too narrow, the footer may appear and make the fig suddenly wider (ugly)
        fig, ax = plt.subplots(figsize=(w, h), dpi=dpi)
        fig.canvas.set_window_title("imgaug.imshow(%s)" % (image.shape,))
        ax.imshow(image, cmap="gray")  # cmap is only activate for grayscale images
plt.show()

In [32]:
class preprocess_data(object):
    
    #def __init__(self,filename="",**kwargs):
    
    def create_train_test_val(data_folder_name):
        # setup the training, validation, and testing directories
        training_folder = os.path.sep.join([data_folder_name, "training"])
        validation_folder = os.path.sep.join([data_folder_name, "validation"])
        testing_folder  = os.path.sep.join([data_folder_name, "testing"])
        return training_folder,validation_folder,testing_folder

    def list_images(data_folder, contains=None):
        # import the necessary packages
        image_types = (".jpg", ".jpeg", ".png", ".bmp", ".tif", ".tiff")
        # return the set of files that are valid
        return preprocess_data.list_files(data_folder, valid_extensions=image_types, contains=contains)

    def list_files(data_folder, valid_extensions=None, contains=None):
        # loop over the directory structure
        for (root_directory, directory_names, filenames) in os.walk(data_folder):
            # loop over the filenames in the current directory
            for filename in filenames:
                # if the contains string is not none and the filename does not contain
                # the supplied string, then ignore the file
                if contains is not None and filename.find(contains) == -1:
                    continue

                # determine the file extension of the current file
                extension = filename[filename.rfind("."):].lower()

                # check to see if the file is an image and should be processed
                if valid_extensions is None or extension.endswith(valid_extensions):
                    # construct the path to the image and yield it
                    image_paths = os.path.join(root_directory, filename)
                    yield image_paths

    def images_augment(augment_percent,training_paths):
        aug_num=int(len(training_paths) * augment_percent)
        for image_iter in training_paths[:aug_num]:
            #filename,extension = (os.path.splitext(os.path.basename(image_iter)))
            #path,fullname=os.path.split(image_iter) 
            path='./local_test/test_dir/input/data/training/augmentation'
            imread_image = imageio.imread(image_iter)
            random_num=random.randint(0,360)
            rotate = iaa.Affine(rotate=(-random_num, random_num))
            image_aug = rotate.augment_image(imread_image)
            #imshow(image_aug)
            imageio.imwrite(path+'/aug_image_iter_'+str(random_num)+'.png',image_aug)
        return aug_num

    def setup_folders(training_percent,validation_percent,augment_percent):
        training_folder,validation_folder,testing_folder=preprocess_data.create_train_test_val(data_folder_name)
        # grab the paths to all input images in the original input directory and shuffle them
        image_paths = list(preprocess_data.list_images(input_data_folder))
        random.shuffle(image_paths)

        # training and testing split
        comparison = int(len(image_paths) * training_percent)
        training_paths = image_paths[:comparison]
        testing_paths = image_paths[comparison:]

        # using part of the training data for validation
        comparison = int(len(training_paths) * validation_percent)
        validation_paths = training_paths[:comparison]
        training_paths = training_paths[comparison:]  

        print('Images selected for training folder: ',len(training_paths))
        print('Images selected for testing folder: ',len(testing_paths))
        print('Images selected for validation folder: ',len(validation_paths))

        # define the datasets that we'll be building
        datasets = [("training", training_paths, training_folder),
                    ("validation", validation_paths, validation_folder),
                    ("testing", testing_paths, testing_folder)]
        
        # loop over the datasets
        for (data_type, image_paths, output_folder) in datasets:
            # show which data split we are creating
            print("\nbuilding "+data_type+" collection . . .\n")

            # if the output base output directory does not exist, create it
            if not os.path.exists(output_folder):
                print("creating "+data_type+" directory . . .")
                os.makedirs(output_folder)

            # loop over the input image paths
            for path in image_paths:
                # extract the filename of the input image along with its corresponding class label
                filename = path.split(os.path.sep)[-1]
                label = path.split(os.path.sep)[-2]
                # build the path to the label directory
                label_paths = os.path.sep.join([output_folder, label])

                # if the label output directory does not exist, create it
                if not os.path.exists(label_paths):
                    print("creating "+data_type+" directory . . .")
                    os.makedirs(label_paths)

                # construct the path to the destination image and then copy the image itself
                p = os.path.sep.join([label_paths, filename])
                shutil.copy2(path, p) #Identical to copy() except that copy2() also attempts to preserve file metadata.
                
        aug_num=preprocess_data.images_augment(augment_percent,training_paths)
        # return the total number of image paths in training, validation, and testing directories
        training_total = len(list(preprocess_data.list_images(training_folder)))
        validation_total = len(list(preprocess_data.list_images(validation_folder)))
        testing_total = len(list(preprocess_data.list_images(testing_folder)))
        total_images=training_total+validation_total+testing_total
        print('\nTraining Images: ',training_total,'\nValidation Images: ',
              validation_total,'\nTesting Images: ',testing_total,
              '\nTotal Images Selected: ',total_images,'\nAugmented Images Added: ',aug_num)
        return training_paths

  


Build Training, Testing, and Validation Datasets
---------

In [33]:
training_path=preprocess_data.setup_folders(training_percent,validation_percent,augment_percent)

Images selected for training folder:  904
Images selected for testing folder:  251
Images selected for validation folder:  100

building training collection . . .

creating training directory . . .
creating training directory . . .
creating training directory . . .

building validation collection . . .

creating validation directory . . .
creating validation directory . . .
creating validation directory . . .

building testing collection . . .

creating testing directory . . .
creating testing directory . . .
creating testing directory . . .

Training Images:  904 
Validation Images:  100 
Testing Images:  251 
Total Images Selected:  1255 
Augmented Images Added:  180
