# __Vehicle Insurance__

# ** Context

Our client is an Insurance company that has provided Health Insurance to its customers now they need your help in building a model to predict whether the policyholders (customers) from past year will also be interested in Vehicle Insurance provided by the company.

An insurance policy is an arrangement by which a company undertakes to provide a guarantee of compensation for specified loss, damage, illness, or death in return for the payment of a specified premium. A premium is a sum of money that the customer needs to pay regularly to an insurance company for this guarantee.

For example, you may pay a premium of Rs. 5000 each year for a health insurance cover of Rs. 200,000/- so that if, God forbid, you fall ill and need to be hospitalised in that year, the insurance provider company will bear the cost of hospitalisation etc. for upto Rs. 200,000. Now if you are wondering how can company bear such high hospitalisation cost when it charges a premium of only Rs. 5000/-, that is where the concept of probabilities comes in picture. For example, like you, there may be 100 customers who would be paying a premium of Rs. 5000 every year, but only a few of them (say 2-3) would get hospitalised that year and not everyone. This way everyone shares the risk of everyone else.

Just like medical insurance, there is vehicle insurance where every year customer needs to pay a premium of certain amount to insurance provider company so that in case of unfortunate accident by the vehicle, the insurance provider company will provide a compensation (called ‘sum assured’) to the customer.

# 1. Importing Relevant Libraries

Importing the Relevant Libraries and Data Required for building our model!

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib
import plotly.express as px
import warnings
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
import plotly.express as px
from sklearn.linear_model import LogisticRegression
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split

import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score
from sklearn.metrics import auc
from sklearn.metrics import roc_curve
sns.set()
warnings.filterwarnings('ignore')

In [None]:
df = pd.read_csv('../input/health-insurance-cross-sell-prediction/train.csv')

# 2. Data Processing

In [None]:
df.head()

In [None]:
df.drop('id',axis=1,inplace=True) #Dropping the ID column

In [None]:
df['Vehicle_Age'].value_counts()

In [None]:
cols_to_label=[]
for i in df.columns:
    if df[i].dtypes == 'O':
        cols_to_label.append(i)

cols_to_label

Now we apply Label Encoder to our Categorical Data. 

In [None]:
df[cols_to_label] = df[cols_to_label].apply(LabelEncoder().fit_transform) 

In [None]:
df.head()

# 3. Exploratory Data Analysis

In [None]:
sns.countplot(df['Response'])

In [None]:
sns.distplot(df['Age'])

In [None]:
sns.distplot(df['Vintage'])

In [None]:
sns.distplot(df['Annual_Premium'])

In [None]:
sns.scatterplot(x='Age', y='Annual_Premium', data=df)

In [None]:
sns.countplot(df['Gender'])

In [None]:
for i in df.columns:
    print(i)
    print(df[i].value_counts())

# 4. Data Scaling

In [None]:
df.isnull().sum()

In [None]:
cols_to_scale = ['Age','Annual_Premium','Policy_Sales_Channel','Vintage']

scaler = StandardScaler().fit(df[cols_to_scale])
df[cols_to_scale] = pd.DataFrame(scaler.transform(df[cols_to_scale]), columns=cols_to_scale)
df.head()

In [None]:
sns.heatmap(df.corr() ,cmap='coolwarm', vmax=0.7, vmin=-0.7)

# 5. Data Modelling

In [None]:
X_train, X_test, y_train, y_test = train_test_split(df.drop('Response', axis=1), df['Response'], test_size=0.2)

In [None]:
model1 = LogisticRegression(random_state = 365).fit(X_train, y_train)
preds = model1.predict(X_test)
print(f'The accuracy score of Logistic Regression model is: {accuracy_score(preds, y_test)}')

In [None]:
model2 = XGBClassifier().fit(X_train, y_train)
preds = model2.predict(X_test)
print(f'The accuracy score of XGBClassifier model is: {accuracy_score(preds, y_test)}')

If you liked my work please upvote!