<h1>0 - Introduction</h1>

<p>
    For version of the GlucoCheck glucose-estimation model, I used a more novel approach to calculating the patient's glucose values. Our current working prototype is based off of a laser that is activated on a patient's finger with an image diode that captures a picture of the laser shining through the patient's finger. When the laser is aimed at a patient's finger different wavelength's of light (colors) are absorbed by the patient's skin tissue, resulting in the image that is captured by the image diode. Since most blue and green light is absorbed by the laser diode, the image is left with mostly red colors.
</p>

<p>
    With these images, data manipulation, and machine learning models, we can easily estimate the patient's current blood glucose. To do this estimation, we find the intensity values of the images for different colors. Intensity values are the number of pixels in each image that have a certain value of red, blue, or green. To get all of the intensity values in an image, we find the number of pixels that have each possible value of red in them (0-255) and then map that to an array (number of images x 256). We can do this process for all three major RGB colors: red, green, and blue. For this model, we experimented with all three colors. 
</p>

<p>
    Prior to creating the datasets and models, we went through our current dataset and we compiled all of the folders of images (named according to the person) into one folder. Inside of that folder, we renamed all of the folders with images to measured glucose value of the corresponding person. This process resulted in a folder containing several other folders with glucose values as names of the folders and the folders containing images with those same glucose values. 
</p>
<p>
    We also removed many "bad" images from the datasets; these images were ones that were captured incorrectly. Furthermore, many of the images in the second image capture were renamed to random numbers to allow for the file-folders to be merged into one single folder with subdirectories described above.
</p>

<h1>1 - Initial Setup</h1>

<h4>Import Python Libraries</h4>

In [290]:
import os
import glob
import time
import h5py
import skimage
import statistics
import seaborn as sns
from PIL import Image
import numpy as np
import pandas as pd
from skimage import io
from pathlib import Path
import tensorflow as tf
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_percentage_error

In [2]:
import warnings
warnings.filterwarnings("ignore")

<h1>2 - Creating Datasets</h1>

<h4>Finding the Data Directory</h4>

In [3]:
directory = os.getcwd() + '\data_second_cleaned'
print(directory)

x:\Machine Learning\Glucose Estimation\data_second_cleaned


<h4>Creating Series for Image Filepaths and Glucose Values</h4>

In [4]:
#Creating list with all image filepaths and one for glucose values.
files = glob.glob(directory + '\**\*')
values = [None] * len(files)

#Correcting all filepaths and adding their respective values to the other list. 
x = 0
while x < len(files):
    files[x] = files[x].replace('\\','/')
    temp = files[x][59:]
    values[x] = int(temp[0:temp.index('/')])
    x = x + 1

#Converting lists into Panda Series for creating a Dataframe
files = pd.Series(files, name='Filepath')
values = pd.Series(values, name='Glucose')

<h4>Combining the Series into a Dataframe</h4>

In [4]:
images = pd.concat([files, values], axis=1)
images

Unnamed: 0,Filepath,Glucose
0,x:/Machine Learning/Glucose Estimation/data_se...,100
1,x:/Machine Learning/Glucose Estimation/data_se...,100
2,x:/Machine Learning/Glucose Estimation/data_se...,100
3,x:/Machine Learning/Glucose Estimation/data_se...,100
4,x:/Machine Learning/Glucose Estimation/data_se...,100
...,...,...
1123,x:/Machine Learning/Glucose Estimation/data_se...,99
1124,x:/Machine Learning/Glucose Estimation/data_se...,99
1125,x:/Machine Learning/Glucose Estimation/data_se...,99
1126,x:/Machine Learning/Glucose Estimation/data_se...,99


<h4>Shuffling the Dataset</h4>

In [5]:
#Settings Random State for Replication and Resetting Indices for Ordering 
images = images.sample(1128, random_state=7).reset_index(drop=True)
images

Unnamed: 0,Filepath,Glucose
0,x:/Machine Learning/Glucose Estimation/data_se...,101
1,x:/Machine Learning/Glucose Estimation/data_se...,83
2,x:/Machine Learning/Glucose Estimation/data_se...,83
3,x:/Machine Learning/Glucose Estimation/data_se...,131
4,x:/Machine Learning/Glucose Estimation/data_se...,113
...,...,...
1123,x:/Machine Learning/Glucose Estimation/data_se...,92
1124,x:/Machine Learning/Glucose Estimation/data_se...,111
1125,x:/Machine Learning/Glucose Estimation/data_se...,142
1126,x:/Machine Learning/Glucose Estimation/data_se...,147


<h4>Creating a Function for Declaring the Datasets</h4>

In [7]:
from numba import jit

#The following function creates an empty Dataframe with appropriate columns.
#The user inputs the total number of images being used.

def initDS(num):
    cols = []
    for x in range(0,256):
        cols.append(x)
    cols.append('Glucose')
    dataset = pd.DataFrame(columns=cols,index=range(0,num))
    for col in dataset.columns:
        dataset[col].values[:] = 0
    return dataset

<h4>Creating a Function for Initializing the Datasets</h4>

In [8]:
from numba import jit

#This function inputs the intensity values for a specific color for a specific image.
#   0 - Red   1 - Green   2 - Blue 

def fillDS(dataset,row,color,i):
    image = io.imread(row['Filepath'])
    rw = dataset.loc[i]
    for a in range(0,480):
        for b in range(0,640):
            sum = image[a][b][color]
            rw[sum] = rw[sum] + 1
    rw.iloc[256] = row['Glucose']

<h4>Using Functions to Create the Datasets</h4>

In [9]:
#Initializing Datasets
red_dataset = initDS(1128)
green_dataset = initDS(1128)
blue_dataset = initDS(1128)

red_dataset.shape

(1128, 257)

In [10]:
#Filling Datasets

for i, row in images.iterrows():  
    fillDS(red_dataset,row,0,i)
    fillDS(green_dataset,row,1,i)
    fillDS(blue_dataset,row,2,i)

<h4>Exporting the Datasets as CSV Files</h4>

In [11]:
red_dataset.to_csv('red_data.csv')
green_dataset.to_csv('green_data.csv')
blue_dataset.to_csv('blue_data.csv')


<p>
    As the final step for this section of the procedure, we made an additional dataset that merged the values found in all three of the previously created datasets. This dataset contains 769 columns, featuring all of the intesity values of the image. This process was done in Excel using the previously exported CSV files and was exported as another CSV file.
</p>

<h1>3 - Data Processing / Model Preparation</h1>

<h4>Loading the Merged Dataset</h4>

In [25]:
red_dataset = pd.read_csv('red_data.csv')
green_dataset = pd.read_csv('green_data.csv')
blue_dataset = pd.read_csv('blue_data.csv')
rgb_dataset = pd.read_csv('rgb_data.csv')
rgb_dataset

Unnamed: 0,r0,r1,r2,r3,r4,r5,r6,r7,r8,r9,...,b247,b248,b249,b250,b251,b252,b253,b254,b255,Glucose
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,101
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,83
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,83
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,131
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,113
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1123,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,92
1124,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,111
1125,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,142
1126,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,147


<h4>Making Train/Test Function for Models</h4>

In [27]:
#User inputs the model object and the training/testing set to measure accuracy.

def trainTest(model,set):
    if set == 'red':
        model.fit(rx_train,ry_train)
        preds = model.predict(rx_test)
        print('Red Accuracy: ' + str(100 - (mean_absolute_percentage_error(preds, ry_test)*100)) )
    
    if set == 'green':
        model.fit(gx_train,gy_train)
        preds = model.predict(gx_test)
        print('Green Accuracy: ' + str(100 - (mean_absolute_percentage_error(preds, gy_test)*100)) )
        
    if set == 'blue':
        model.fit(bx_train,by_train)
        preds = model.predict(bx_test)
        print('Blue Accuracy: ' + str(100 - (mean_absolute_percentage_error(preds, by_test)*100)) )
        
    if set == 'all':
        model.fit(ax_train,ay_train)
        preds = model.predict(ax_test)
        print('All Accuracy: ' + str(100 - (mean_absolute_percentage_error(preds, ay_test)*100)) )

<h4>Creating Training/Testing Splits</h4>

In [139]:
rx_train, rx_test, ry_train, ry_test = train_test_split( red_dataset.drop(columns=['Glucose']) , red_dataset[['Glucose']] ,test_size=0.25,random_state=7)
gx_train, gx_test, gy_train, gy_test = train_test_split( green_dataset.drop(columns=['Glucose']) , green_dataset[['Glucose']] ,test_size=0.25,random_state=7)
bx_train, bx_test, by_train, by_test = train_test_split( blue_dataset.drop(columns=['Glucose']) , blue_dataset[['Glucose']] ,test_size=0.25,random_state=7)
ax_train, ax_test, ay_train, ay_test = train_test_split( rgb_dataset.drop(columns=['Glucose']) , rgb_dataset[['Glucose']] ,test_size=0.25,random_state=7)

<h1>4 - Model Training and Testing</h1>

<h4>Random Forest - 87.9% Accuracy</h4>

In [28]:
from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor(n_estimators = 350, random_state = 7)
trainTest(model,'red')
trainTest(model,'green')
trainTest(model,'blue')
trainTest(model,'all')

Red Accuracy: 87.3087997910618
Green Accuracy: 87.03855755955992
Blue Accuracy: 86.6497289667381
All Accuracy: 87.90076354261745


<h4>Decision Tree - 83.8% Accuracy</h4>

In [140]:
from sklearn.tree import DecisionTreeRegressor

model = DecisionTreeRegressor()
trainTest(model,'red')
trainTest(model,'green')
trainTest(model,'blue')
trainTest(model,'all')

Red Accuracy: 83.42334259293924
Green Accuracy: 82.59144630694982
Blue Accuracy: 83.09587144083213
All Accuracy: 82.96156332874597


<h4>Elastic Net - 85.7% Accuracy</h4>

In [141]:
from sklearn.linear_model import ElasticNet

model = ElasticNet(alpha=150, l1_ratio=0.5,fit_intercept=False)
trainTest(model,'red')
trainTest(model,'green')
trainTest(model,'blue')
trainTest(model,'all')

Red Accuracy: 84.86713355824988
Green Accuracy: 84.32008002220915
Blue Accuracy: 85.08875139717097
All Accuracy: 85.69396594187346


<h4>KNeighbors - 90.1% Accuracy</h4>

In [142]:
from sklearn.neighbors import KNeighborsRegressor

model = KNeighborsRegressor(n_neighbors=1,p=1)
trainTest(model,'red')
trainTest(model,'green')
trainTest(model,'blue')
trainTest(model,'all')

Red Accuracy: 90.1482720590572
Green Accuracy: 83.88244210499052
Blue Accuracy: 86.14450943999462
All Accuracy: 89.31884257612406


<h4>Support Vector - 86.8% Accuracy</h4>

In [143]:
from sklearn.svm import SVR

model = SVR(kernel = 'rbf', C = 21250)
trainTest(model,'red')
trainTest(model,'green')
trainTest(model,'blue')
trainTest(model,'all')

Red Accuracy: 85.00452430491126
Green Accuracy: 84.51683585283295
Blue Accuracy: 85.6676571775809
All Accuracy: 86.81242941433484


<h4>Lasso Regression - 86.1% Accuracy</h4>

In [144]:
from sklearn.linear_model import Lasso

model = Lasso(alpha=37)
trainTest(model,'red')
trainTest(model,'green')
trainTest(model,'blue')
trainTest(model,'all')

Red Accuracy: 84.82514359981478
Green Accuracy: 84.50248224495067
Blue Accuracy: 85.25914061909751
All Accuracy: 86.14813243333506


<h4>Radius Neighbors - 86.2% Accuracy</h4>

In [145]:
from sklearn.neighbors import RadiusNeighborsRegressor

model = RadiusNeighborsRegressor(radius=7000,weights='distance',p=3)
trainTest(model,'red')
trainTest(model,'green')
trainTest(model,'blue')
trainTest(model,'all')

Red Accuracy: 85.83626109014672
Green Accuracy: 85.81387149939611
Blue Accuracy: 85.66219346414518
All Accuracy: 86.1477805493945


<h4>Bayesian Ridge - 86.2% Accuracy</h4>

In [146]:
from sklearn.linear_model import BayesianRidge

model = BayesianRidge(n_iter=10,fit_intercept=False)
trainTest(model,'red')
trainTest(model,'green')
trainTest(model,'blue')
trainTest(model,'all')

Red Accuracy: 84.93545532957893
Green Accuracy: 84.44726409631501
Blue Accuracy: 85.26009720429944
All Accuracy: 86.18759077898153


<h4>Tweedie Regressor - 86.2% Accuracy</h4>

In [147]:
from sklearn.linear_model import TweedieRegressor

model = TweedieRegressor(power=1,alpha=0,max_iter=95)
trainTest(model,'red')
trainTest(model,'green')
trainTest(model,'blue')
trainTest(model,'all')

Red Accuracy: 84.8410901618374
Green Accuracy: 84.31793843567331
Blue Accuracy: 84.18340656210901
All Accuracy: 86.20137821871194


<h4>Neural Network - 87.2% Accuracy</h4>

In [330]:
#Creating Neural Network
from keras.optimizers import adam_v2
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout

model = Sequential()
model.add(Dense(1024, input_dim=256, activation='relu'))
model.add(Dense(2048, activation='relu'))
model.add(Dense(1, activation='linear'))
model.compile(loss='mae', optimizer='adam')

In [332]:
model.fit(rx_train, ry_train, epochs=100, batch_size=200,validation_data=(rx_test, ry_test),callbacks=[EarlyStopping(monitor='val_loss', patience=50, restore_best_weights=True)])
print('Red Accuracy: ' + str(100 - (mean_absolute_percentage_error(model.predict(rx_test), ry_test)*100)) )
model.fit(gx_train, gy_train, epochs=100, batch_size=200,validation_data=(gx_test, gy_test),callbacks=[EarlyStopping(monitor='val_loss', patience=50, restore_best_weights=True)])
print('Green Accuracy: ' + str(100 - (mean_absolute_percentage_error(model.predict(gx_test), gy_test)*100)) )
model.fit(bx_train, by_train, epochs=100, batch_size=200,validation_data=(bx_test, by_test),callbacks=[EarlyStopping(monitor='val_loss', patience=50, restore_best_weights=True)])
print('Blue Accuracy: ' + str(100 - (mean_absolute_percentage_error(model.predict(bx_test), by_test)*100)) )

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<p>Red: 81.8% Accuracy</p>
<p>Green: 85.9% Accuracy</p>
<p>Blue: 84.5 % Accuracy</p>

In [333]:
model = Sequential()
model.add(Dense(1024, input_dim=768, activation='relu'))
model.add(Dense(2048, activation='relu'))
model.add(Dense(1, activation='linear'))
model.compile(loss='mae', optimizer='adam')

In [334]:
model.fit(ax_train, ay_train, epochs=100, batch_size=400,validation_data=(ax_test, ay_test),callbacks=[EarlyStopping(monitor='val_loss', patience=50, restore_best_weights=True)])
print('All Accuracy: ' + str(100 - (mean_absolute_percentage_error(model.predict(ax_test), ay_test)*100)) )

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<p>RGB Accuracy: 86.4%</p>

<p>
    Since they had the highest accuracies throughout the model tests, I will be picking the red and RGB datasets to create new models with hyperparameters tuned for the two datasets.
</p>

<h1>5 - Model Tuning for Red Dataset</h1>

In [152]:
model = RandomForestRegressor(n_estimators = 390, random_state = 7)
trainTest(model,'red')

Red Accuracy: 87.35210415417792


In [153]:
model = ElasticNet(alpha=330, l1_ratio=0.45,fit_intercept=False)
trainTest(model,'red')

Red Accuracy: 84.9332554386511


In [154]:
model = KNeighborsRegressor(n_neighbors=1,p=1,weights='distance')
trainTest(model,'red')

Red Accuracy: 90.1482720590572


In [155]:
model = SVR(kernel = 'rbf', C = 5000)
trainTest(model,'red')

Red Accuracy: 85.50724033286265


In [156]:
model = Lasso(alpha=44)
trainTest(model,'red')

Red Accuracy: 84.85944582496748


In [157]:
model = RadiusNeighborsRegressor(radius=5400,weights='distance',p=2)
trainTest(model,'red')

Red Accuracy: 85.9893587975879


In [158]:
model = BayesianRidge(n_iter=5,fit_intercept=False)
trainTest(model,'red')

Red Accuracy: 85.04574389899685


In [159]:
model = TweedieRegressor(power=1,alpha=30,max_iter=100)
trainTest(model,'red')

Red Accuracy: 84.9451237675029


In [253]:
from xgboost import XGBRegressor

model = XGBRegressor(booster='gbtree',eta=0.09,gamma=0.92)
trainTest(model,'red')

Red Accuracy: 87.58378841711493


In [161]:
from keras.callbacks import EarlyStopping
from keras.optimizers import adam_v2
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout


model = Sequential()
model.add(Dense(1024, input_dim=256, activation='relu'))
model.add(Dense(2048, activation='relu'))
model.add(Dense(1, activation='linear'))
model.compile(loss='mae', optimizer='adam')


model.fit(rx_train, ry_train, epochs=50, batch_size=300,validation_data=(rx_test, ry_test),callbacks=[EarlyStopping(monitor='val_loss', patience=50, restore_best_weights=True)])
print('All Accuracy: ' + str(100 - (mean_absolute_percentage_error(model.predict(rx_test), ry_test)*100)) )

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
All Accuracy: 85.2847872783349


<h1>6 - Model Tuning for RGB Dataset</h1>

In [162]:
model = RandomForestRegressor(n_estimators = 93, random_state = 7)
trainTest(model,'all')

All Accuracy: 88.09739374824147


In [163]:
model = DecisionTreeRegressor()
trainTest(model,'all')

All Accuracy: 83.39535204347638


In [164]:
model = ElasticNet(alpha=305, l1_ratio=0.13,fit_intercept=True)
trainTest(model,'all')

All Accuracy: 86.15206903518344


In [165]:
model = KNeighborsRegressor(n_neighbors=1,p=1,weights='distance')
trainTest(model,'all')

All Accuracy: 89.31884257612406


In [166]:
model = SVR(kernel = 'rbf', C = 20350)
trainTest(model,'all')

All Accuracy: 86.81872695815929


In [167]:
model = Lasso(alpha=43)
trainTest(model,'all')

All Accuracy: 86.1536348628427


In [168]:
model = RadiusNeighborsRegressor(radius=4475,weights='distance',p=5)
trainTest(model,'all')

All Accuracy: 86.42703805982265


In [169]:
model = BayesianRidge(n_iter=8,fit_intercept=True)
trainTest(model,'all')

All Accuracy: 86.24339879239992


In [170]:
model = TweedieRegressor(power=0,alpha=53,max_iter=100)
trainTest(model,'all')

All Accuracy: 86.3676540801893


In [274]:
from xgboost import XGBRegressor

model = XGBRegressor(booster='gbtree',eta=0.092,gamma=0.919)
trainTest(model,'all')

All Accuracy: 88.23839832616821


In [172]:
from keras.callbacks import EarlyStopping
from keras.optimizers import adam_v2
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout


model = Sequential()
model.add(Dense(1024, input_dim=768, activation='relu'))
model.add(Dense(2048, activation='relu'))
model.add(Dense(1, activation='linear'))
model.compile(loss='mae', optimizer='adam')


model.fit(ax_train, ay_train, epochs=50, batch_size=300,validation_data=(ax_test, ay_test),callbacks=[EarlyStopping(monitor='val_loss', patience=50, restore_best_weights=True)])
print('All Accuracy: ' + str(100 - (mean_absolute_percentage_error(model.predict(ax_test), ay_test)*100)) )

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
All Accuracy: 86.71729592151668


<h1>7 - Model Summary</h1>

<p>
    <strong>Datasets with Highest Accuracies Across All Models:</strong> RGB Dataset 
    <br>
    <strong>Dataset with Highest Single Accuracy:</strong> Red Dataset - 90.15%
    <br>
    <br>
    <strong>Model with Highest Accuracies Overall:</strong> KNeighbors Regressor
    <br>
    <strong>Model with Highest Accuracies:</strong> KNeighbors Regressor - 90.15%
</p>

In [173]:
import pickle

model = KNeighborsRegressor(n_neighbors=1,p=1,weights='distance')
model.fit(rx_train,ry_train)

pickle.dump(model, open('final_model.sav', 'wb'))

In [174]:
from sklearn.metrics import mean_absolute_error

loaded_model = pickle.load(open('final_model.sav', 'rb'))
preds = loaded_model.predict(rx_test)

print('Mean Absolute Error: ' + str( round(mean_absolute_error(preds,ry_test),2) ) + ' mg/dl' )
print('Accuracy: ' + str( round(100 - (mean_absolute_percentage_error(preds, ry_test)*100),2)) + ' %' )

Mean Absolute Error: 9.88 mg/dl
Accuracy: 90.15 %


<h1>8 - Ensemble Learning</h1>

<p>
    To increase the overall accuracy of this project, we considered creating an ensemble learning using the three highest accuracy models with two datasets (red and RGB) to see what combination would provide the highest overall accuracy. To achieve this, we picked the three highest performing models for each dataset and created an ensemble learner that takes the mean of their predictions.
    <br>
    <p>
        <strong>Red Dataset Models:</strong> XG Boost, Random Forest, and KNeighbors.
        <br>
        <strong>RGB Dataset Models:</strong> XG Boost, Random Forest, and KNeighbors.
    </p>
</p>

<h4>Training Red Dataset Ensemble Learner</h4>

In [275]:
model = RandomForestRegressor(n_estimators = 390, random_state = 7)
model2 = KNeighborsRegressor(n_neighbors=1,p=1,weights='distance')
model3 = XGBRegressor(booster='gbtree',eta=0.09,gamma=0.92)

model.fit(rx_train,ry_train)
model2.fit(rx_train,ry_train)
model3.fit(rx_train,ry_train)

In [301]:
preds = model.predict(rx_test)
preds2 = model2.predict(rx_test)
preds3 = model3.predict(rx_test)

preds_final = []

for x in range(0,282):
    sum = int(preds[x] + preds2[x] + preds3[x])
    preds_final.append( sum/3 )
    
print('Accuracy: ' + str( round(100 - (mean_absolute_percentage_error(preds_final, ry_test)*100),2)) + ' %' )

Accuracy: 89.0 %


<h4>Training RGB Dataset Ensemble Learner</h4>

In [302]:
model = RandomForestRegressor(n_estimators = 93, random_state = 7)
model2 = KNeighborsRegressor(n_neighbors=1,p=1,weights='distance')
model3 = XGBRegressor(booster='gbtree',eta=0.092,gamma=0.919)

model.fit(ax_train,ay_train)
model2.fit(ax_train,ay_train)
model3.fit(ax_train,ay_train)

In [304]:
preds = model.predict(ax_test)
preds2 = model2.predict(ax_test)
preds3 = model3.predict(ax_test)

preds_final = []

for x in range(0,282):
    sum = int(preds[x] + preds2[x] + preds3[x])
    preds_final.append( sum/3 )
    
print('Accuracy: ' + str( round(100 - (mean_absolute_percentage_error(preds_final, ay_test)*100),2)) + ' %' )

Accuracy: 89.34 %
