## Task: To predict patient glucose levels using a Keras-based neural network.

### Original data: https://data.world/uci/pima-indians-diabetes

## Install TensorFlow

In [None]:
devtools::install_github("rstudio/tensorflow")

## Install Keras

In [None]:
devtools::install_github("rstudio/keras")

## Load TensorFlow library

In [1]:
library(tensorflow)
#install_tensorflow()

## Load Keras library and normalize data

In [None]:
library(keras)
diabetes1<-read.csv("pima-indians-diabetes1.csv")

## Max-Min Normalization

In [2]:
normalize <- function(x) {
  return ((x - min(x)) / (max(x) - min(x)))
}

maxmindf <- as.data.frame(lapply(diabetes1, normalize))
attach(maxmindf)
maxmindf<-as.matrix(maxmindf)

## Train-validation split

In [3]:
ind <- sample(2, nrow(maxmindf), replace=TRUE, prob = c(0.7,0.3))

## Build X_train, y_train, X_val, y_val
X_train <- maxmindf[ind==1, 1:8]
X_val <- maxmindf[ind==2, 1:8]
y_train <- maxmindf[ind==1, 9]
y_val <- maxmindf[ind==2, 9]

## Sequential model

In [4]:
model <- keras_model_sequential() 
model %>% 
  layer_dense(units = 12, activation = 'relu', kernel_initializer='RandomNormal', input_shape = c(8)) %>% 
  layer_dense(units = 8, activation = 'relu') %>%
  layer_dense(units = 1, activation = 'linear')

summary(model)

________________________________________________________________________________
Layer (type)                        Output Shape                    Param #     
dense (Dense)                       (None, 12)                      108         
________________________________________________________________________________
dense_1 (Dense)                     (None, 8)                       104         
________________________________________________________________________________
dense_2 (Dense)                     (None, 1)                       9           
Total params: 221
Trainable params: 221
Non-trainable params: 0
________________________________________________________________________________


## Model compilation with mean squared error used as loss function

### Model trained over 150 epochs

In [5]:
model %>% compile(
  loss = 'mean_squared_error',
  optimizer = 'adam',
  metrics = c('mae')
)

history <- model %>% fit(
  X_train, y_train, 
  epochs = 150, batch_size = 50, 
  validation_split = 0.2
)

### Model evaluation

In [6]:
model %>% evaluate(X_val, y_val)
model
pred <- data.frame(y = predict(model, as.matrix(X_val)))
predicted=pred$y * abs(diff(range(diabetes1$Glucose))) + min(diabetes1$Glucose)
actual=y_val * abs(diff(range(diabetes1$Glucose))) + min(diabetes1$Glucose)
df<-data.frame(predicted,actual)
attach(df)

Model
________________________________________________________________________________
Layer (type)                        Output Shape                    Param #     
dense (Dense)                       (None, 12)                      108         
________________________________________________________________________________
dense_1 (Dense)                     (None, 8)                       104         
________________________________________________________________________________
dense_2 (Dense)                     (None, 1)                       9           
Total params: 221
Trainable params: 221
Non-trainable params: 0
________________________________________________________________________________



The following objects are masked _by_ .GlobalEnv:

    actual, predicted



## Glucose level predictions

In [7]:
predicted=as.matrix(predicted)
predicted

0
115.03139
155.63502
120.48860
103.43557
107.22563
98.38659
145.38961
117.22818
144.96535
113.74861


In [8]:
actual

## Mean percentage error - percentage difference between predicted and actual values

In [9]:
mpe=((predicted-actual)/actual)
mean(mpe)*100

## EXERCISE

### pima-indians-diabetes2.csv contains the predictor variables for the test set.
### pima-indians-diabetes3.csv contains the dependent variables (or glucose readings) for the test set.

### Your task is to use the existing model to generate new predictions for this test set and calculate the mean percentage error on these new predictions.

### Run below for solution.

In [10]:
diabetes2<-read.csv("pima-indians-diabetes2.csv")

The following objects are masked from maxmindf:

    Age, BloodPressure, BMI, DiabetesPedigreeFunction, Insulin,
    Outcome, Pregnancies, SkinThickness



## Max-Min Normalization

In [None]:
normalize <- function(x) {
  return ((x - min(x)) / (max(x) - min(x)))
}

maxmindf2 <- as.data.frame(lapply(diabetes2, normalize))
attach(maxmindf2)

## Using the predict function in R, generate predictions (pred_test) for the Glucose variable using maxmindf2

In [11]:
pred_test <- data.frame(y = predict(model, as.matrix(maxmindf2)))
predicted_test = pred_test$y * abs(diff(range(diabetes1$Glucose))) + min(diabetes1$Glucose)
predicted_test

## Loading test set - or unseen data for this purpose

In [12]:
diabetes3<-read.csv("pima-indians-diabetes3.csv")
diabetes3

Glucose
<int>
97
83
130
128
149
144
119
108
120
120


## Compare predicted values with actual values

In [13]:
actual_test = diabetes3$Glucose
df2<-data.frame(predicted_test,actual_test)
attach(df2)
df2

The following objects are masked _by_ .GlobalEnv:

    actual_test, predicted_test



predicted_test,actual_test
<dbl>,<int>
147.91135,97
118.40756,83
161.89970,130
113.08413,128
165.86854,149
154.13147,144
137.16663,119
96.08984,108
144.15751,120
98.36231,120


## Mean percentage error calculation

In [14]:
mpe2=((predicted_test-actual_test)/actual_test)
mean(mpe2)*100