# Vehicle Insurance Interest Response Classification

Our client is an Insurance company that has provided Health Insurance to its customers now they need your help in building a model to predict whether the policyholders (customers) from past year will also be interested in Vehicle Insurance provided by the company.

Just like medical insurance, there is vehicle insurance where every year customer needs to pay a premium of certain amount to insurance provider company so that in case of unfortunate accident by the vehicle, the insurance provider company will provide a compensation (called ‘sum assured’) to the customer.

Building a model to predict whether a customer would be interested in Vehicle Insurance is extremely helpful for the company because it can then accordingly plan its communication strategy to reach out to those customers and optimise its business model and revenue.

## Problem Statement 

**An insurance company has provided Health Insurance to its customers now they want a model to predict whether the policyholders (customers) from past year will also be interested in Vehicle Insurance provided by the company.**

## Data

|Variable|Definition|
|-----|-----|
|id	|Unique ID for the customer|
|Gender	|Gender of the customer|
|Age	|Age of the customer|
|Driving_License	|0 : Customer does not have DL, 1 : Customer already has DL|
|Region_Code	|Unique code for the region of the customer|
|Previously_Insured	|1 : Customer already has Vehicle Insurance, 0 : Customer doesn't have Vehicle Insurance|
|Vehicle_Age	|Age of the Vehicle|
|Vehicle_Damage	|1 : Customer got his/her vehicle damaged in the past. 0 : Customer didn't get his/her vehicle damaged in the past.|
|Annual_Premium	|The amount customer needs to pay as premium in the year|
|PolicySalesChannel	|Anonymized Code for the channel of outreaching to the customer ie. Different Agents, Over Mail, Over Phone, In Person, etc.|
|Vintage	|Number of Days, Customer has been associated with the company|
|Response	|1 : Customer is interested, 0 : Customer is not interested|

## Import libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score,roc_auc_score,confusion_matrix,classification_report
from sklearn.utils import resample,shuffle

In [None]:
!ls /kaggle/input/**/*

## Read the data

In [None]:
import glob
csv_list = glob.glob('/kaggle/input/**/*')
csv_list

In [None]:
df = pd.read_csv(csv_list[0])
df.head()

In [None]:
df.info()

In [None]:
df.isnull().sum()

In [None]:
df.dtypes

## Summary of the data


In [None]:
def convert_to_str(val):
    if isinstance(val,float):
        return str(int(val))
    elif isinstance(val,int):
        return str(val)
    return val
    

In [None]:
for col in ['Region_Code','Policy_Sales_Channel']:
    df[col] = df[col].map(convert_to_str)

In [None]:
df.describe()

In [None]:
df.drop(['id','Driving_License','Previously_Insured','Response'],axis=1).describe().round(2)

In [None]:
df.describe(exclude=np.number)

In [None]:
plt.hist(df['Age'])

In [None]:
for col in ['Age','Annual_Premium','Vintage']:
    fig = plt.figure(figsize=(12,5))
    plt.hist(df[col])
    plt.title('histogram')
    plt.xlabel(col)
    plt.show()

In [None]:
for col in ['Driving_License','Previously_Insured','Gender','Vehicle_Age','Vehicle_Damage','Response']:
    fig = plt.figure(figsize=(12,5))
    sns.countplot(df[col])
    plt.title(f'countplot for {col}', color = 'navy', fontsize=16)
    plt.show()

## Get a count of the target variable and note down your observations

In [None]:
df['Response'].value_counts()


In [None]:
df['Response'].value_counts(normalize=True).round(2)

## What is the ratio of male and female in our dataset?

In [None]:
df['Gender'].value_counts(normalize=True).round(2)

In [None]:
df['Gender'].value_counts(normalize=True).round(2).plot(kind='bar',figsize=(7,5))


## Check the gender ratio in the interested customers, what are your observations?

In [None]:
df[df['Response']==1]['Gender'].value_counts(normalize=True)

In [None]:
df[df['Response']==1]['Gender'].value_counts(normalize=True).round(2).plot(kind='bar',figsize=(7,5))

## Find out the distribution of customers age

In [None]:
fig = plt.figure(figsize=(7,7))
plt.boxplot(df['Age'])
plt.show()

In [None]:
print("Age distribution according to Response")
facetgrid = sns.FacetGrid(df,hue="Response",aspect = 4)
facetgrid.map(sns.kdeplot,"Age",shade = True)
facetgrid.set(xlim = (0,df["Age"].max()))
facetgrid.add_legend()
plt.title('Age distribution according to responce',color='navy',fontsize=16)
plt.show()

In [None]:
print("Age distribution according to Gender")
facetgrid = sns.FacetGrid(df,hue="Gender",aspect = 4)
facetgrid.map(sns.kdeplot,"Age",shade = True)
facetgrid.set(xlim = (0,df["Age"].max()))
facetgrid.add_legend()
plt.title('Age distribution according to Gender',fontsize=16,color='navy')
plt.show()

In [None]:
for col in ['Region_Code','Policy_Sales_Channel']:
    df[col].value_counts(normalize=True)[:10].plot(kind='bar',figsize=(12,5))
    plt.title(f'top 10 values for {col}',fontsize=16,color='navy')
    plt.show()

## Which regions have people applied from more?

In [None]:
df[df['Response']==1]['Region_Code'].value_counts(normalize=True)[:10].plot(kind='barh',figsize=(12,5))
plt.title('top 10 region response is good',fontsize=16,color='navy')
plt.show()

In [None]:
pd.crosstab(df['Response'], df['Previously_Insured'])

## Check the ratio of previously insured, note down your observations

In [None]:
df['Previously_Insured'].value_counts()

In [None]:
pd.crosstab(df['Response'],df['Previously_Insured']).plot(kind='bar',figsize=(10,5))
plt.title('would customer with existing insurance want to insurance again',fontsize=12,color='navy')
plt.show()

## How old are most of the vehicles? Does vehicle damage has any effect on the Response variable?

In [None]:
df['Vehicle_Age'].value_counts(normalize=True)

In [None]:
pd.crosstab(df['Response'],df['Vehicle_Damage']).plot(kind='bar',figsize=(10,5))
plt.title('would customer with existing insurance want to insurance again',fontsize=12,color='navy')
plt.show()

In [None]:
top_10_region=df['Region_Code'].value_counts()[:10].index
top_10_channel=df['Policy_Sales_Channel'].value_counts()[:10].index

df['Region_code'] = df['Region_Code'].map(lambda x: x if x in top_10_region else 'others')
df['policy_sales_channel'] = df['Policy_Sales_Channel'].map(lambda x: x if x in top_10_region else 'others')


In [None]:
plt.figure(figsize=(12,10))
print("Correlation matrix-")
plt.rcParams['figure.figsize']=(8,6)
sns.heatmap(df.corr(),cmap='Spectral',annot = True)

In [None]:
df.corr()['Response'].sort_values()