# 0 - Data Preparation

<p> 
Prior to creating this notebook and developing the code, I went through our current dataset and I compiled all of the folders of images (named according to the person) into one folder. Inside of that folder, I renamed all of the folders with images to measured glucose value of the corresponding person. This process resulted in a folder containing several other folders with glucose values as names of the folders and the folders containing images with those same glucose values. 
</p>
<p>
Also removed many "bad" images from the datasets; these images were ones that were captured incorrectly. Furthermore, many of the images in the second image capture were renamed to random numbers to allow for the file-folders to be merged into one single folder with subdirectories described above.
</p>

# 1 - Importing Prerequisites

In [2]:
#Importing Python Libraries
import os
import glob
import h5py
from PIL import Image
import numpy as np
import pandas as pd
from pathlib import Path
import tensorflow as tf
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split

In [3]:
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

Num GPUs Available:  0


# 2 - Creating Dataset

In [4]:
#Initializing Print Settings for Dataframes
pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)
pd.set_option('display.max_colwidth', -1)

  pd.set_option('display.max_colwidth', -1)


In [5]:
#Getting the Directory of this Notebook for Later Use
directory = os.getcwd() + '\data_second'
print(directory)

X:\Machine Learning\Glucose Estimation\data_second


In [6]:
#Creating Series for Image-Filepaths and Glucose Values

#Creating list with all image filepaths and one for glucose values.
files = glob.glob(directory + '\**\*')
values = [None] * len(files)

#Correcting all filepaths and adding their respective values to the other list. 
x = 0
while x < len(files):
    files[x] = files[x].replace('\\','/')
    str = files[x][51:]
    values[x] = int(str[0:str.index('/')])
    x = x + 1

#Converting lists into Panda Series for creating a Dataframe
files = pd.Series(files, name='Filepath')
values = pd.Series(values, name='Glucose')

In [7]:
#Combining the Series into a Dataframe
images = pd.concat([files, values], axis=1)
images

Unnamed: 0,Filepath,Glucose
0,X:/Machine Learning/Glucose Estimation/data_second/100/image0 (2).jpg,100
1,X:/Machine Learning/Glucose Estimation/data_second/100/image0 (3).jpg,100
2,X:/Machine Learning/Glucose Estimation/data_second/100/image0.jpg,100
3,X:/Machine Learning/Glucose Estimation/data_second/100/image1 (2).jpg,100
4,X:/Machine Learning/Glucose Estimation/data_second/100/image1 (3).jpg,100
...,...,...
1151,X:/Machine Learning/Glucose Estimation/data_second/99/image5.jpg,99
1152,X:/Machine Learning/Glucose Estimation/data_second/99/image6.jpg,99
1153,X:/Machine Learning/Glucose Estimation/data_second/99/image7.jpg,99
1154,X:/Machine Learning/Glucose Estimation/data_second/99/image8.jpg,99


# 3 - Data Processing

In [8]:
#Shuffling the Dataset

#Settings Random State for Replication and Resetting Indices for Ordering 
ds = images.sample(1156, random_state=7).reset_index(drop=True)
ds

Unnamed: 0,Filepath,Glucose
0,X:/Machine Learning/Glucose Estimation/data_second/84/image5.jpg,84
1,X:/Machine Learning/Glucose Estimation/data_second/101/524356.jpg,101
2,X:/Machine Learning/Glucose Estimation/data_second/95/image2.jpg,95
3,X:/Machine Learning/Glucose Estimation/data_second/85/image13 (2).jpg,85
4,X:/Machine Learning/Glucose Estimation/data_second/84/image13 (2).jpg,84
...,...,...
1151,X:/Machine Learning/Glucose Estimation/data_second/91/image13 (2).jpg,91
1152,X:/Machine Learning/Glucose Estimation/data_second/110/image12.jpg,110
1153,X:/Machine Learning/Glucose Estimation/data_second/140/image6.jpg,140
1154,X:/Machine Learning/Glucose Estimation/data_second/147/342.jpg,147


In [9]:
#Splitting the Dataset

#Chose higher test sample because the dataset size is small and reset indices again.
train, test = train_test_split(ds, train_size=0.75, random_state = 7)
train = train.reset_index(drop=True)
test = test.reset_index(drop=True)
train

Unnamed: 0,Filepath,Glucose
0,X:/Machine Learning/Glucose Estimation/data_second/112/image9.jpg,112
1,X:/Machine Learning/Glucose Estimation/data_second/123/image10.jpg,123
2,X:/Machine Learning/Glucose Estimation/data_second/95/image7 (2).jpg,95
3,X:/Machine Learning/Glucose Estimation/data_second/105/image11.jpg,105
4,X:/Machine Learning/Glucose Estimation/data_second/83/2.jpg,83
...,...,...
862,X:/Machine Learning/Glucose Estimation/data_second/98/image6.jpg,98
863,X:/Machine Learning/Glucose Estimation/data_second/79/image1.jpg,79
864,X:/Machine Learning/Glucose Estimation/data_second/113/image9.jpg,113
865,X:/Machine Learning/Glucose Estimation/data_second/109/image7.jpg,109


In [10]:
#Creating Image Processors for Normalizing Image Data

#Scaling the pixel RGB values of each image down by 255 to make the RGB values 0-1.
#This standardizes the data like how it would be done with numeric data.
#This process makes the model train much more efficiently.

#A validation set is created for testing model during training.
train_generator = tf.keras.preprocessing.image.ImageDataGenerator(
    rescale=1./255,
    validation_split=0.20
)

#A validation set is not needed for testing.
test_generator = tf.keras.preprocessing.image.ImageDataGenerator(
    rescale=1./255
)

In [11]:
#Uses the previous image generators to convert the images into tensors.
#The tensors are numeric matrices containing the respective RGB values for each pixel.
#The tensors have 3 dimensions: height, width, and RGB colors.
#In our case those would be: 480, 640, and 3.


#First the dataframe and it's columns are selected for creating the training data.
#Setting target size to 160 x 120 rescales the images to a smaller size for speed/efficiency.
#Setting class_mode to raw makes the generator disregard classes to make sure that the model is regression, not classification.
#The batch size determines how many images are processed in a single iteration.
#Using 32 as the batchsize helps the generator use less computing power.
#We also shuffle the data again to make sure that the model gets a random sample of the data.
#We set the random seed to make the generation replicable.

#We first create the training subset for our model (the data used to train).
train_data = train_generator.flow_from_dataframe(
    dataframe=train,
    x_col='Filepath',
    y_col='Glucose',
    target_size=(120, 160),
    color_mode='rgb',
    class_mode='raw',
    batch_size=16,
    shuffle=True,
    seed=7,
    subset='training'
)

#Then we create the validation subset for our model (the data used to test performance during training).
val_data = train_generator.flow_from_dataframe(
    dataframe=train,
    x_col='Filepath',
    y_col='Glucose',
    target_size=(120, 160),
    color_mode='rgb',
    class_mode='raw',
    batch_size=16,
    shuffle=True,
    seed=7,
    subset='validation'
)

#Finally we create the testing subset for our model (the data used to test performance after training).
test_data = test_generator.flow_from_dataframe(
    dataframe=test,
    x_col='Filepath',
    y_col='Glucose',
    target_size=(120, 160),
    color_mode='rgb',
    class_mode='raw',
    batch_size=16,
    shuffle=False
)

train_data

Found 694 validated image filenames.
Found 173 validated image filenames.
Found 289 validated image filenames.


<keras.preprocessing.image.DataFrameIterator at 0x209b75b7130>

# 4 - Model Creation

In [24]:
#Creating the model for training.


#The input layer fits the following layers to the dimensions of the tensors created by the generators.

#Convolutional layers Slides a 3x3 window across the image to extract features in the form of shapes, corners, edges, etc.
#The window is 3x3 because our image is small the window should be proportionate to the image size to detect small patterns.
#It does this by taking the dot product of that sliding window and setting it to the middle pixel to create feature images.
#The sliding window can overlap with previous slides but it cannot go outside of the image.
#Different filters use different values (weights) in the windows to find different features: edges, shapes, and other patterns.
#The number of filters starts low to detect bigger and more general features but increase to detect smaller features.
#Because the window is 3x3 and it must not cover the outside of the image, a portion of the border of the image is lost.

#Max Pool layers downscale the image tensors by taking the maximum of a certain area of an image.
#This downscaling helps by making the tensors easier to process, which is needed because more filters are used.

#Flatten layers take all of the features extracted from the image and puts them on a single plane.

#Dense layers are just normal neural perceptrons that try to train to the data and find patterns within the features.

#Dropout layers randomly remove a percentage of the previous layer's output to reduce overfitting.

#Then the output layer takes the cumalation of the patterns in the Dense layers to output a singular linear value (Glucose).


inputs = tf.keras.Input(shape=(120, 160, 3))

conv_1 = tf.keras.layers.Conv2D(128, kernel_size=3, activation='relu') (inputs)
maxp_1 = tf.keras.layers.MaxPooling2D(pool_size=(3,3), strides=2) (conv_1)
conv_2 = tf.keras.layers.Conv2D(128, kernel_size=3, activation='relu') (maxp_1)
maxp_2 = tf.keras.layers.MaxPooling2D(pool_size=(3,3), strides=2) (conv_2)
conv_3 = tf.keras.layers.Conv2D(256, kernel_size=3, activation='relu') (maxp_2)
maxp_3 = tf.keras.layers.MaxPooling2D(pool_size=(3,3), strides=2) (conv_3)
conv_4 = tf.keras.layers.Conv2D(512, kernel_size=3, activation='relu') (maxp_3)
maxp_4 = tf.keras.layers.MaxPooling2D(pool_size=(3,3), strides=2) (conv_4)

flatten = tf.keras.layers.Flatten() (maxp_4)

dropout = tf.keras.layers.Dropout(0.2) (flatten)

dense = tf.keras.layers.Dense(32, activation='relu') (dropout)

outputs = tf.keras.layers.Dense(1, activation='relu') (dense)

model = tf.keras.Model(inputs=inputs, outputs=outputs)


#Compiles the model using a standard optimizer and uses MSE for measuring performance.
#MSE is the Mean-Square-Error the model calculates for glucose compared to the actual glucose values.
#MSE is the mean of the squared deviations of the predicted values from the actual values.
model.compile(
    optimizer='adam',
    loss='mae' 
)

#Summarizes the features of the models.
model.summary()

Model: "model_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_3 (InputLayer)        [(None, 120, 160, 3)]     0         
                                                                 
 conv2d_4 (Conv2D)           (None, 118, 158, 128)     3584      
                                                                 
 max_pooling2d_4 (MaxPooling  (None, 58, 78, 128)      0         
 2D)                                                             
                                                                 
 conv2d_5 (Conv2D)           (None, 56, 76, 128)       147584    
                                                                 
 max_pooling2d_5 (MaxPooling  (None, 27, 37, 128)      0         
 2D)                                                             
                                                                 
 conv2d_6 (Conv2D)           (None, 25, 35, 256)       2951

# 5 - Model Training

In [25]:
#Fits the model to training and validation data.


#Uses 100 epochs as the number of training iterations the model goes through. 
#The EarlyStopping callback ensures that the model stops training after the validation loss stagnates for 5 iterations (epochs).
#The callback then chooses the weights from the best epoch to save for the final model.

model.fit(
    train_data,
    validation_data=val_data,
    epochs=20,
    callbacks=[
        tf.keras.callbacks.EarlyStopping(
            monitor='val_loss',
            patience=5,
            restore_best_weights=True
        )
    ]
)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20


<keras.callbacks.History at 0x28c72e33a90>

# 6 - Results

In [26]:
#Tests the model to the testing data.
#Squeezes the output array into a single list.
predicted_ages = np.squeeze(model.predict(test_data))
true_ages = test_data.labels



In [27]:
#Showing the different values that our model predicted compared to their actual counterparts.
#Our model seems to overfit towards values between 100-105
print(predicted_ages)
print(true_ages)

[93.61561  93.48403  93.693695 93.69176  93.93919  93.86616  93.65112
 95.66552  93.89751  93.70199  92.12168  93.69805  93.79745  93.7727
 93.64244  94.654106 93.56087  93.67007  93.55132  93.928185 93.791435
 93.74569  93.68096  93.796425 93.53017  91.38709  93.61092  93.76403
 93.40692  93.3911   96.65067  93.7335   93.701164 93.61939  93.72281
 93.843506 93.59104  93.77111  93.824486 93.70745  93.7502   93.69912
 93.79561  93.79061  93.86136  96.07127  95.62182  93.70385  93.842384
 93.81794  90.208595 93.70859  93.63934  93.56605  93.52999  93.72033
 97.26544  93.57     89.82853  93.75096  93.74264  89.40403  93.560265
 93.70133  93.80825  93.71322  91.87025  93.574135 93.73759  93.46586
 89.380875 93.67596  93.59968  93.54777  93.80938  93.52483  93.479515
 93.78763  95.119804 93.68341  93.65716  93.71064  93.77108  94.814705
 93.84886  93.452614 91.634346 93.59309  93.22383  93.60226  93.86582
 93.42141  93.49784  93.69982  93.727005 93.14289  93.62435  93.56608
 93.80085  93.66

In [28]:
#Finds the Root of the MSE of the previous prediction.
error = model.evaluate(test_data, verbose=0)
print("On Average We Are {:.2f} Off When Predicting Glucose".format(error))

On Average We Are 17.99 Off When Predicting Glucose


# 7 - Transfer Learning

In [13]:
#Create a Transfer Learning Model

#Transfer learning is the process of using a previously setup/trained model, changing the inputs, and adding layers.
#By doing this you can save a lot of computing power, time, and use a well-trained model's capability.
#I used the VGG16 and made sure that the bulk of the model was untrainable.
#Doing this allowed for the model to extract features from the images as it was trained to do.
#Then I routed the features into a couple of dense layers to finalize the prediction into the value we want.

transfer_model = tf.keras.applications.VGG16(weights="imagenet", include_top=False, input_shape=(120, 160, 3))

for l in transfer_model.layers:
    l.trainable = False
    
model = transfer_model.output
model = tf.keras.layers.Flatten(name="flatten")(model)
model = tf.keras.layers.Dense(128, activation="relu")(model)
model = tf.keras.layers.Dense(64, activation="relu")(model)
model = tf.keras.layers.Dense(32, activation="relu")(model)
model = tf.keras.layers.Dense(1, activation="relu")(model)

main_model = tf.keras.Model(inputs=transfer_model.input, outputs=model)

main_model.summary()

Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 120, 160, 3)]     0         
                                                                 
 block1_conv1 (Conv2D)       (None, 120, 160, 64)      1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 120, 160, 64)      36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 60, 80, 64)        0         
                                                                 
 block2_conv1 (Conv2D)       (None, 60, 80, 128)       73856     
                                                                 
 block2_conv2 (Conv2D)       (None, 60, 80, 128)       147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, 30, 40, 128)       0   

In [14]:
main_model.compile(
    optimizer='adam',
    loss='mae' 
)

main_model.fit(
    train_data,
    validation_data=val_data,
    epochs=30,
    callbacks=[
        tf.keras.callbacks.EarlyStopping(
            monitor='val_loss',
            patience=5,
            restore_best_weights=True
        )
    ]
)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30


<keras.callbacks.History at 0x209b7a123a0>

In [15]:
predicted_ages = np.squeeze(main_model.predict(test_data))
true_ages = test_data.labels



In [16]:
print(predicted_ages)
print(true_ages)

[ 97.9571    98.527016  97.74586   97.29035   97.08218   98.6561
  97.76557  100.07217   96.96534   97.60577  100.23548   98.854195
  97.90268   97.048546  98.880196  97.87058   97.017876  98.08679
  97.53776   97.19485   97.23498   97.72741   99.928375  97.35047
  97.42158  100.70876   98.02144   98.17393   97.93478   97.59707
 103.647156  98.07494   97.361     98.176216  97.72613   97.028725
  97.599785  97.99871   97.47117  101.067604  97.77884   97.66796
  97.440605  97.33287   97.291756 102.75611  100.32215   97.93549
  97.49672   97.60983  100.155     97.48345   97.958435  97.644325
  97.36541   97.76757  105.054054  97.60444   99.81589   98.045235
  97.97561   99.777176  97.69066   98.38729   98.23009   97.68894
 100.6979    97.64396   97.999344  97.93667   99.790565  99.93709
  98.3733    96.961494  98.02252   97.381386  97.59455   97.79033
  98.9969    98.53669   98.31065   97.70982   97.565414  97.91305
  97.43413   97.24263  100.77675   97.6906    99.32351   97.861336
  97.7

In [17]:
error = main_model.evaluate(test_data, verbose=0)
print("On Average We Are {:.2f} Off When Predicting Glucose".format(error))

On Average We Are 16.76 Off When Predicting Glucose
