<center><h1>Insurance Cross Sell Prediction TF Keras</h1></center>
<br>
<center><img src = 'https://static.wixstatic.com/media/5758c6_14a8fd304c2a4ad1831ff6259a352424~mv2.png/v1/fill/w_567,h_297,al_c,q_85,usm_0.66_1.00_0.01/5758c6_14a8fd304c2a4ad1831ff6259a352424~mv2.webp'></center>
<br>
<center><h3>About the Dataset</h3></center>
<br>
<center>Our client is an Insurance company that has provided Health Insurance to its customers now they need your help in building a model to predict whether the policyholders (customers) from past year will also be interested in Vehicle Insurance provided by the company.
    

An insurance policy is an arrangement by which a company undertakes to provide a guarantee of compensation for specified loss, damage, illness, or death in return for the payment of a specified premium. A premium is a sum of money that the customer needs to pay regularly to an insurance company for this guarantee.

For example, you may pay a premium of Rs. 5000 each year for a health insurance cover of Rs. 200,000/- so that if, God forbid, you fall ill and need to be hospitalised in that year, the insurance provider company will bear the cost of hospitalisation etc. for upto Rs. 200,000. Now if you are wondering how can company bear such high hospitalisation cost when it charges a premium of only Rs. 5000/-, that is where the concept of probabilities comes in picture. For example, like you, there may be 100 customers who would be paying a premium of Rs. 5000 every year, but only a few of them (say 2-3) would get hospitalised that year and not everyone. This way everyone shares the risk of everyone else.

Just like medical insurance, there is vehicle insurance where every year customer needs to pay a premium of certain amount to insurance provider company so that in case of unfortunate accident by the vehicle, the insurance provider company will provide a compensation (called ‘sum assured’) to the customer.

Building a model to predict whether a customer would be interested in Vehicle Insurance is extremely helpful for the company because it can then accordingly plan its communication strategy to reach out to those customers and optimise its business model and revenue.

Now, in order to predict, whether the customer would be interested in Vehicle insurance, you have information about demographics (gender, age, region code type), Vehicles (Vehicle Age, Damage), Policy (Premium, sourcing channel) etc.</center>
<br>
- More Info - [HERE](https://www.kaggle.com/anmolkumar/health-insurance-cross-sell-prediction)
- Owner - [ANMOL KUMAR](https://www.kaggle.com/anmolkumar/health-insurance-cross-sell-prediction)

In [None]:
import tensorflow as tf
from tensorflow import keras

import os
import tempfile

import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

from plotly.offline import init_notebook_mode, iplot 
import plotly.figure_factory as ff
import plotly.graph_objs as go
import plotly.offline as py
import pycountry
py.init_notebook_mode(connected=True)

import sklearn
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

mpl.rcParams['figure.figsize'] = (12, 10)
colors = plt.rcParams['axes.prop_cycle'].by_key()['color']

In [None]:
df = pd.read_csv('/kaggle/input/health-insurance-cross-sell-prediction/train.csv')
df.head()

### Dataset Looks well structured. Target column is '**Response**' which indicates the truth value if that health insurance owner will buy a vehicle insurance or not.

In [None]:
colors = ['#835AF1']

fig = ff.create_distplot([df['Age']], ['Age'], colors=colors,
                         show_curve=True, show_hist=True)

# Add title
fig.update(layout_title_text='Distribution of Age')
fig.show()

In [None]:
df['Response'].value_counts()

## Clearly this is a case for imbalance classes.

In [None]:
# Preprocess Block

gender = {'Male': 0, 'Female': 1}
driving_license = {0: 0, 1: 1}
previously_insured = {0: 1, 1: 0}
vehicle_age = {'> 2 Years': 3, '1-2 Year': 2, '< 1 Year': 1}
vehicle_damage = {'Yes': 1, 'No': 0}

def preprocess(df):
    df['Gender'] = df['Gender'].map(gender)
    df['Driving_License'] = df['Driving_License'].map(driving_license)
    df['Previously_Insured'] = df['Previously_Insured'].map(previously_insured)
    df['Vehicle_Age'] = df['Vehicle_Age'].map(vehicle_age)
    df['Vehicle_Damage'] = df['Vehicle_Damage'].map(vehicle_damage)

    df['Policy_Sales_Channel'] = df['Policy_Sales_Channel'].apply(lambda x: np.int(x))
    df['Region_Code'] = df['Region_Code'].apply(lambda x: np.int(x))

    return df.drop('id', axis = 1)

Check [here](https://www.kaggle.com/anmolkumar/vehicle-insurance-cross-sell-roc-auc-85-6) for reference to this.

In [None]:
df = preprocess(df)

In [None]:
df.describe()

In [None]:
train, val = train_test_split(df, test_size=0.1)
print(len(train), 'train examples')
print(len(val), 'validation examples')

In [None]:
train_labels = np.array(train['Response'])
val_labels = np.array(val['Response'])
train = train.drop('Response', axis = 1)
val = val.drop('Response', axis = 1)
bool_train_labels = train_labels != 0

In [None]:
scaler = StandardScaler()
train_features = scaler.fit_transform(train)
val_features = scaler.transform(val)

In [None]:
METRICS = [
      keras.metrics.TruePositives(name='tp'),
      keras.metrics.FalsePositives(name='fp'),
      keras.metrics.TrueNegatives(name='tn'),
      keras.metrics.FalseNegatives(name='fn'), 
      keras.metrics.BinaryAccuracy(name='accuracy'),
      keras.metrics.Precision(name='precision'),
      keras.metrics.Recall(name='recall'),
      keras.metrics.AUC(name='auc'),
]

def make_model(metrics = METRICS, output_bias=None):
  if output_bias is not None:
    output_bias = tf.keras.initializers.Constant(output_bias)
  model = keras.Sequential([
      keras.layers.Dense(
          16, activation='relu',
          input_shape=(train_features.shape[-1],)),
      keras.layers.Dense(
          32, activation='relu'),
      keras.layers.Dropout(0.5),
      keras.layers.Dense(1, activation='sigmoid',
                         bias_initializer=output_bias),
  ])

  model.compile(
      optimizer=keras.optimizers.Adam(lr=1e-3),
      loss=keras.losses.BinaryCrossentropy(),
      metrics=metrics)

  return model

In [None]:
EPOCHS = 100
BATCH_SIZE = 2048

early_stopping = tf.keras.callbacks.EarlyStopping(
    monitor='val_auc', 
    verbose=1,
    patience=10,
    mode='max',
    restore_best_weights=True)

In [None]:
model = make_model()
model.summary()

In [None]:
baseline_history = model.fit(
    train_features,
    train_labels,
    batch_size=BATCH_SIZE,
    epochs=EPOCHS,
    callbacks = [early_stopping],
    validation_data=(val_features, val_labels))

In [None]:
def plot_metrics(history):
  metrics =  ['loss', 'auc', 'precision', 'recall']
  for n, metric in enumerate(metrics):
    name = metric.replace("_"," ").capitalize()
    plt.subplot(2,2,n+1)
    plt.plot(history.epoch,  history.history[metric], color=colors[0], label='Train')
    plt.plot(history.epoch, history.history['val_'+metric],
             color=colors[0], linestyle="--", label='Val')
    plt.xlabel('Epoch')
    plt.ylabel(name)
    if metric == 'loss':
      plt.ylim([0, plt.ylim()[1]])
    elif metric == 'auc':
      plt.ylim([0.8,1])
    else:
      plt.ylim([0,1])

    plt.legend()

In [None]:
plot_metrics(baseline_history)

In [None]:
test = pd.read_csv('../input/health-insurance-cross-sell-prediction/test.csv')
test = preprocess(test)
test_features = scaler.fit_transform(test)
preds = model.predict(test_features, batch_size=BATCH_SIZE)

In [None]:
prediction = pd.read_csv('../input/health-insurance-cross-sell-prediction/sample_submission.csv')
prediction['Response'] = preds

In [None]:
prediction.to_csv('submission.csv',index=False)
prediction.head()

### The relatively small model has a val_auc of 85.06 with a small training time. 
### Stay tuned for updates.