# 0. Introduction

In this notebook I'll use the Medical Cost Personal Datasets from Machine Learning with R by Brett Lantz to follow the Neural Networks and TensorFlow course from Daniel Bourke. 

**Can you accurately predict insurance costs?** This dataset shows what the patients medicual insurance price was based on their age, sex, bmi, children, if they smoke and their region.

This is a **regression problem** with the goal to attempt to determine the strength and character of the relationship between one dependent variable (the *charges*) and a series of the features.

# 1. Import Data and Packages

In [None]:
# Data Manipulation
import numpy as np
import pandas as pd

# Visualization
import matplotlib.pyplot as plt

# TensorFlow
import tensorflow as tf

print("Setup Complete")

In [None]:
# Import data
data_filepath = "/kaggle/input/insurance/insurance.csv" # Path of the file to read
data = pd.read_csv(data_filepath)

# Print the first rows of the training data
data.head()

In [None]:
print('The number of samples into the dataset is {}.'.format(data.shape[0]))

# 2. Data Manipulation

## Split the dataframe into features and labels

In [None]:
# Creade X & y values
X = data.drop('charges', axis=1)
y = data.charges

## One-Hot Encode Categorical Variables
As we're working with regression, we need to transform the categorical variables into numerical

## Normalize the data (aka Scaling)
Converts all values to between 0 and 1 whilst preserving the original distribution

## Dividing the datasets

* **Training set.** the model learns from this data, which will be the 80% of the total data available.
* **Test set.** the model gets evaluated on this data to test what it has learned, this set will be 20% of the data available

In [None]:
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import MinMaxScaler, OneHotEncoder 
from sklearn.model_selection import train_test_split

# Create a column transformer
ct = make_column_transformer(
    (MinMaxScaler(), ["age", "bmi", "children"]), # turn all values in these columns between 0 and 1 
    (OneHotEncoder(handle_unknown="ignore"), ["sex", "smoker", "region"])
)

# Build our train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit the column transformer to our training data
ct.fit(X_train)

# Transform training and test data with normalization (MinMaxScaler) and OneHotEncoder
X_train_normal = ct.transform(X_train)
X_test_normal = ct.transform(X_test)

# One-Hot Encoding
data_one_hot = pd.get_dummies(data)
data_one_hot.head()

In [None]:
print("The split of the under_sampled data is as follows")
print("X_train: ", len(X_train_normal))
print("X_test: ", len(X_test_normal))
print("y_train: ", len(y_train))
print("y_test: ", len(y_test))

# 3. Building and Evaluating the Model

## Steps in modelling with TensorFlow

1. **Creating a model** - define the input and output layers, as well as the hidden layers of a deep learning model.

2. **Compiling a model** - define the loss function (in other words, the function which tells our model how wrong it is) and the optimizer (tells our model how to improve the patterns its learning) and evaluation metrics (what we can use to interpret the performance of our model).

3. **Fitting a model** -letting the model try to find patterns between X & y (features and labels)

In [None]:
# set a random seed
tf.random.set_seed(64)

# 1. create a model using the Sequential API
model = tf.keras.Sequential([
    tf.keras.layers.Dense(1)
])

# 2. compile the model
model.compile(loss = tf.keras.losses.mae,
             optimizer = tf.keras.optimizers.SGD(),
             metrics=['mae'])

# 3. Fit the model
model.fit(X_train_normal, y_train, epochs=100)


## Evaluating the model


In [None]:
# Check the results of the insurance model on the test data
model.evaluate(X_test_normal, y_test)

In [None]:
y_train.median(), y_train.mean()

# 4. Improving the Model
Right now it looks like our model isn't performing too well... let's try and improve it!
To (try) improve our model, we'll run 2 experiments:
1. Add an extra layer with more hidden units and use the Adam optimizer
2. Same as above but train for longer (200 epochs)
3. Add another extra layer with more hidden units

In [None]:
# Building model_1
# set a random seed
tf.random.set_seed(64)

# 1. create a model using the Sequential API
model_1 = tf.keras.Sequential([
    tf.keras.layers.Dense(10),
    tf.keras.layers.Dense(1)
])

# 2. compile the model
model_1.compile(loss = tf.keras.losses.mae,
             optimizer = tf.keras.optimizers.Adam(),
             metrics=['mae'])

# 3. Fit the model
history_1 = model_1.fit(X_train_normal, y_train, epochs=100)

In [None]:
# Check the results of the model_1 on the test data
model_1.evaluate(X_test_normal, y_test)

In [None]:
# Plot history (also known as loss curve or a training curve)
pd.DataFrame(history_1.history).plot()
plt.ylabel("loss")
plt.xlabel("epochs")

In [None]:
# Building model_2
# set a random seed
tf.random.set_seed(64)

# 1. create a model using the Sequential API
model_2 = tf.keras.Sequential([
    tf.keras.layers.Dense(10),
    tf.keras.layers.Dense(1)
])

# 2. compile the model
model_2.compile(loss = tf.keras.losses.mae,
             optimizer = tf.keras.optimizers.Adam(),
             metrics=['mae'])

# 3. Fit the model
history_2 = model_2.fit(X_train_normal, y_train, epochs=200)

In [None]:
# Check the results of the model_2 on the test data
model_2.evaluate(X_test_normal, y_test)

In [None]:
# Plot history (also known as loss curve or a training curve)
pd.DataFrame(history_2.history).plot()
plt.ylabel("loss")
plt.xlabel("epochs")

In [None]:
# Building model_3
# set a random seed
tf.random.set_seed(64)

# 1. create a model using the Sequential API
model_3 = tf.keras.Sequential([
    tf.keras.layers.Dense(100),
    tf.keras.layers.Dense(10),
    tf.keras.layers.Dense(1)
])

# 2. compile the model
model_3.compile(loss = tf.keras.losses.mae,
             optimizer = tf.keras.optimizers.Adam(),
             metrics=['mae'])

# 3. Fit the model
model_3.fit(X_train_normal, y_train, epochs=200)

In [None]:
# Check the results of the model_3 on the test data
model_3.evaluate(X_test_normal, y_test)

# Compare the models

Wait, which model is actually working better??

In [None]:
# Establish the predictions for each model
y_preds_1 = model_1.predict(X_test_normal)
y_preds_2 = model_2.predict(X_test_normal)
y_preds_3 = model_3.predict(X_test_normal)

# Make some functions to reuse MAE and MSE
def mae(y_true, y_pred):
  return tf.metrics.mean_absolute_error(y_true=y_true,
                                        y_pred=tf.squeeze(y_pred))
  
def mse(y_true, y_pred):
  return tf.metrics.mean_squared_error(y_true=y_true,
                                       y_pred=tf.squeeze(y_pred))

# Calculate the models evaluations values
mae_1 = mae(y_test, y_preds_1)
mse_1 = mse(y_test, y_preds_1)
mae_2 = mae(y_test, y_preds_2)
mse_2 = mse(y_test, y_preds_2)
mae_3 = mae(y_test, y_preds_3)
mse_3 = mse(y_test, y_preds_3)


In [None]:
# Let's compare our model's results using a pandas DataFrame
import pandas as pd

model_results = [["model_1", mae_1.numpy(), mse_1.numpy()],
                 ["model_2", mae_2.numpy(), mse_2.numpy()],
                 ["model_3", mae_3.numpy(), mse_3.numpy()]]

all_results = pd.DataFrame(model_results, columns=["model", "mae", "mse"])
all_results

For the discussed models, the last one seems like it's working the best with our data, but definitely not perfect... if you are reading this, **what could I do to improve my results?**

:) :)