## Insurance Premium Prediction

#### Regression exercise using Tensorflow and Neural Network.

Predict the medical cost billed by health insurance

The insurance.csv dataset contains 1338 observations (rows) and 7 features (columns). The dataset contains 4 numerical features (age, bmi, children and expenses) and 3 nominal features (sex, smoker and region) that were converted into factors with numerical value desginated for each level.





In [3]:
# 0. IMPORT LIBRARIES

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # avoid noisy TF oneDNN messages
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import InputLayer
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam


In [9]:
# 1. LOAD DATA
import pandas as pd

dataset_folder = "datasets/"
csv_file = "insurance.csv"
csv_data = dataset_folder + csv_file

dataset = pd.read_csv(csv_data)

dataset.head()

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,female,27.9,0,yes,southwest,16884.924
1,18,male,33.77,1,no,southeast,1725.5523
2,28,male,33.0,3,no,southeast,4449.462
3,33,male,22.705,0,no,northwest,21984.47061
4,32,male,28.88,0,no,northwest,3866.8552


In [10]:
# 2. DATA PRE-PROCESSING

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import Normalizer

# Choose first 7 columns as features
features = dataset.iloc[:,0:6] 

# Choose the final column for prediction (charges)
labels = dataset.iloc[:,-1]

# one-hot encoding for categorical variables
features = pd.get_dummies(features) 

# Split the data into training and test data
features_train, features_test, labels_train, labels_test = train_test_split(features, labels, test_size=0.33, random_state=42) 
 
# Normalize the numeric columns using ColumnTransformer -
# ColumnTransformer Applies transformers to columns of an array or pandas DataFrame.
# This estimator allows different columns or column subsets of the input
# to be transformed separately and the features generated by each transformer
# will be concatenated to form a single feature space.
ct = ColumnTransformer([('standardize', StandardScaler(), ['age', 'bmi', 'children'])], remainder='passthrough')
features_train = ct.fit_transform(features_train)
features_test = ct.transform(features_test)

In [11]:
## 3. TF.MODEL 

# TF Model using Neural Network and Tensofrflow
def design_model(features):
    model = Sequential(name = "tf_model")
    input = InputLayer(input_shape=(features.shape[1],))
  
    # Add an input layer
    # The 1st layer’s weight matrix has shape (11, 128) 
    # because we feed 11 features to 128 hidden neurons.
    model.add(input)
    
    # Add a hidden layer with 128 neurons
    # Using Activation Function type RELU (Rectified Linear Unit) 
    model.add(Dense(128, activation='relu')) 
    
    # The output layer (purple) has the weight matrix of shape (128, 1) 
    # because we have 128 input units and 1 neuron in the final layer.
    model.add(Dense(1)) 

    # Using Keras Adam Optimizer to adjust its weights or parameters in order to reach the best performance 
    # Learning rate determines how big of jumps the optimizer makes in the parameter space
    opt = Adam(learning_rate=0.01)
    model.compile(loss='mse', metrics=['mae'], optimizer=opt)
    return model

model = design_model(features_train)
print(model.summary())


Model: "tf_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_2 (Dense)             (None, 128)               1536      
                                                                 
 dense_3 (Dense)             (None, 1)                 129       
                                                                 
Total params: 1,665
Trainable params: 1,665
Non-trainable params: 0
_________________________________________________________________
None


In [16]:
# 4. FIT AND EVALUATE TF MODEL

# fit the model using 40 epochs and batch size 1
# epochs refers to the number of cycles through the full training dataset.
# batch_size is the number of data points to work through before updating the model parameters.
model.fit(features_train, labels_train, epochs=40, batch_size=1, verbose=0)

# evaluate the model on the test data
val_mse, val_mae = model.evaluate(features_test, labels_test, verbose=1)

# Regression Loss function is the Mean Squared Error mse 
# The average squared difference between the estimated values and the actual value)
# print("MSE: ", val_mse)

# Mean Absolute Error (mae) while training the model because MAE 
# can give us a better idea than mse on how far off we are from the true 
# values in the units we are predicting.
print("MAE: ", val_mae)

MAE:  2225.8291015625
