![](https://www.shriramgi.com/images/travelproductbg.jpg)

### **Context**
A Tour & Travels Company Is Offering Travel Insurance Package To Their Customers.
The New Insurance Package Also Includes Covid Cover.
The Company Requires To Know The Which Customers Would Be Interested To Buy It Based On Its Database History.
The Insurance Was Offered To Some Of The Customers In 2019 And The Given Data Has Been Extracted From The Performance/Sales Of The Package During That Period.
The Data Is Provided For Almost 2000 Of Its Previous Customers And You Are Required To Build An Intelligent Model That Can Predict If The Customer Will Be Interested To Buy The Travel Insurance Package Based On Certain Parameters Given Below.

### loading libs

In [None]:
import numpy as np # for linear algebra
import pandas as pd # data processing
import warnings
warnings.filterwarnings('ignore')

# for data visualization
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
import plotly.express as px
import plotly.graph_objects as go


from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split, KFold
from lightgbm import LGBMClassifier
from sklearn.metrics import roc_auc_score

#for hyperparameters tuning
import optuna  

### Reading Data using pandas read_csv()

In [None]:
data = pd.read_csv(r'../input/travel-insurance-prediction-data/TravelInsurancePrediction.csv')

In [None]:
data.shape # for checking rows and cols count

In [None]:
data.dtypes # for checking what type of data is present

In [None]:
data.head()

### **About Data**
* **Age**- Age Of The Customer

* **Employment Type**- The Sector In Which Customer Is Employed
* **GraduateOrNot**- Whether The Customer Is College Graduate Or Not
* **AnnualIncome**- The Yearly Income Of The Customer In Indian Rupees[Rounded To Nearest 50 Thousand Rupees]
* **FamilyMembers**- Number Of Members In Customer's Family
* **ChronicDisease**- Whether The Customer Suffers From Any Major Disease Or Conditions Like Diabetes/High BP or Asthama,etc.
* **FrequentFlyer**- Derived Data Based On Customer's History Of Booking Air Tickets On Atleast 4 Different Instances In The Last 2 Years[2017-2019].
* **EverTravelledAbroad**- Has The Customer Ever Travelled To A Foreign Country[Not Necessarily Using The Company's Services]
* **TravelInsurance**- Did The Customer Buy Travel Insurance Package During Introductory Offering held in 2019.

In [None]:
data.nunique() # to check unique values in cols

In [None]:
data.isnull().sum() # to check null values if present

* looks like there is no missing values.
* let's check each columns:

In [None]:
data.columns 

* drop **Unnamed: 0** as it contains index values

In [None]:
data.drop('Unnamed: 0',axis=1,inplace=True)

### Data Visualization

* **TravelInsurance:**

In [None]:
data['TravelInsurance'].replace({0:'No',1:'Yes'},inplace=True)


fig = go.Figure(data=[go.Pie(labels=data.TravelInsurance, hole=.4)])
fig.add_annotation(text='TravelInsurance',
                   x=0.5,y=0.5,showarrow=False,font_size=14,opacity=0.7,font_family='monospace')
fig.update_traces(hoverinfo='label+percent+value',
                  marker=dict(colors=['darkorange','blue'], line=dict(color='#000000', width=2)))
fig.update_layout(
    font_family='monospace',
    title=dict(text='how many will take Travel Insurance?',x=0.47,y=0.98,
               font=dict(color='black',size=20)),
    legend=dict(x=0.37,y=-0.05,orientation='h',traceorder='reversed'),
    hoverlabel=dict(bgcolor='white'))
fig.update_traces(textposition='outside', textinfo='percent+label')
fig.show()

* **Age:**

In [None]:
data['Age'].value_counts()

* looks like the age group is between 25 to 35

In [None]:
fig = go.Figure(data=[go.Pie(labels=data.Age, hole=.2)])
fig.add_annotation(text='Age',
                   x=0.5,y=0.5,showarrow=False,font_size=18,opacity=0.7,font_family='monospace')
fig.update_traces(hoverinfo='label+percent+value',
                  marker=dict(colors=['darkorange','blue'], line=dict(color='#000000', width=2)))
fig.update_layout(
    font_family='monospace',
    title=dict(text='Age distribution',x=0.47,y=0.98,
               font=dict(color='black',size=20)),
    legend=dict(orientation='v',traceorder='reversed'),
    hoverlabel=dict(bgcolor='white'))
fig.update_traces(textposition='outside', textinfo='percent+label')
fig.show()

* **Employment Type:**

In [None]:
fig = go.Figure(data=[go.Pie(labels=data['Employment Type'], hole=.4)])
fig.add_annotation(text='Employment Type',
                   x=0.5,y=0.5,showarrow=False,font_size=14,opacity=0.7,font_family='monospace')
fig.update_traces(hoverinfo='label+percent+value',
                  marker=dict(colors=['darkorange','blue'], line=dict(color='#000000', width=2)))
fig.update_layout(
    font_family='monospace',
    title=dict(text='Employment Type',x=0.47,y=0.98,
               font=dict(color='black',size=20)),
    legend=dict(x=0.37,y=-0.05,orientation='h',traceorder='reversed'),
    hoverlabel=dict(bgcolor='white'))
fig.update_traces(textposition='outside', textinfo='percent+label')
fig.show()

* **GraduateOrNot:**

In [None]:
fig = go.Figure(data=[go.Pie(labels=data['GraduateOrNot'], hole=.4)])
fig.add_annotation(text='GraduateOrNot',
                   x=0.5,y=0.5,showarrow=False,font_size=14,opacity=0.7,font_family='monospace')
fig.update_traces(hoverinfo='label+percent+value',
                  marker=dict(colors=['darkorange','blue'], line=dict(color='#000000', width=2)))
fig.update_layout(
    font_family='monospace',
    title=dict(text='how many Graduate or Not?',x=0.47,y=0.98,
               font=dict(color='black',size=20)),
    legend=dict(x=0.37,y=-0.05,orientation='h',traceorder='reversed'),
    hoverlabel=dict(bgcolor='white'))
fig.update_traces(textposition='outside', textinfo='percent+label')
fig.show()

* **FamilyMembers:**

In [None]:
data['FamilyMembers'].value_counts()

In [None]:
fig = go.Figure(data=[go.Pie(labels=data['FamilyMembers'], hole=.4)])
fig.add_annotation(text='FamilyMembers',
                   x=0.5,y=0.5,showarrow=False,font_size=14,opacity=0.7,font_family='monospace')
fig.update_traces(hoverinfo='label+percent+value',
                  marker=dict(colors=['darkorange','blue'], line=dict(color='#000000', width=2)))
fig.update_layout(
    font_family='monospace',
    title=dict(text='Family Members',x=0.47,y=0.98,
               font=dict(color='black',size=20)),
    legend=dict(orientation='v',traceorder='reversed'),
    hoverlabel=dict(bgcolor='white'))
fig.update_traces(textposition='outside', textinfo='percent+label')
fig.show()

* **ChronicDiseases:**

In [None]:
data['ChronicDiseases'].replace({0:'No',1:'Yes'},inplace=True)

fig = go.Figure(data=[go.Pie(labels=data['ChronicDiseases'], hole=.4)])
fig.add_annotation(text='ChronicDiseases',
                   x=0.5,y=0.5,showarrow=False,font_size=14,opacity=0.7,font_family='monospace')
fig.update_traces(hoverinfo='label+percent+value',
                  marker=dict(colors=['darkorange','blue'], line=dict(color='#000000', width=2)))
fig.update_layout(
    font_family='monospace',
    title=dict(text='Chronic Diseases',x=0.47,y=0.98,
               font=dict(color='black',size=20)),
    legend=dict(x=0.37,y=-0.05,orientation='h',traceorder='reversed'),
    hoverlabel=dict(bgcolor='white'))
fig.update_traces(textposition='outside', textinfo='percent+label')
fig.show()

* **FrequentFlyer:**


In [None]:
fig = go.Figure(data=[go.Pie(labels=data['FrequentFlyer'], hole=.4)])
fig.add_annotation(text='FrequentFlyer',
                   x=0.5,y=0.5,showarrow=False,font_size=14,opacity=0.7,font_family='monospace')
fig.update_traces(hoverinfo='label+percent+value',
                  marker=dict(colors=['darkorange','blue'], line=dict(color='#000000', width=2)))
fig.update_layout(
    font_family='monospace',
    title=dict(text='FrequentFlyer',x=0.47,y=0.98,
               font=dict(color='black',size=20)),
    legend=dict(x=0.37,y=-0.05,orientation='h',traceorder='reversed'),
    hoverlabel=dict(bgcolor='white'))
fig.update_traces(textposition='outside', textinfo='percent+label')
fig.show()

* **EverTravelledAbroad:**

In [None]:
fig = go.Figure(data=[go.Pie(labels=data['EverTravelledAbroad'], hole=.5)])
fig.add_annotation(text='EverTravelledAbroad',
                   x=0.5,y=0.5,showarrow=False,font_size=16,opacity=0.7,font_family='monospace')
fig.update_traces(hoverinfo='label+percent+value',
                  marker=dict(colors=['darkorange','blue'], line=dict(color='#000000', width=2)))
fig.update_layout(
    font_family='monospace',
    title=dict(text='how many people Ever Travelled Abroad?',x=0.47,y=0.98,
               font=dict(color='black',size=20)),
    legend=dict(x=0.37,y=-0.05,orientation='h',traceorder='reversed'),
    hoverlabel=dict(bgcolor='white'))

fig.update_traces(textposition='outside', textinfo='percent+label')

fig.show()

### Impact of features on Travel Insurance:

* **Age:**

In [None]:
plt.figure(figsize=(15,7))
ax = sns.countplot(x='Age',hue ='TravelInsurance', data=data,linewidth=1, edgecolor=".2",palette="Set3")
for container in ax.containers:
    ax.bar_label(container)
plt.title("Impact of Age on Travel Insurance", fontdict={'fontsize':15,'fontweight':0})
plt.xlabel('Age (in years)')
plt.show()

* **Employment Type:**

In [None]:
plt.figure(figsize=(15,7))
ax = sns.countplot(x='Employment Type',hue ='TravelInsurance', data=data,linewidth=1, edgecolor=".2",palette="Set3")
for container in ax.containers:
    ax.bar_label(container)
plt.title("Impact of Employment Type on Travel Insurance", fontdict={'fontsize':15})
plt.xlabel('Employment Type')
plt.show()

* **GraduateOrNot:**

In [None]:
plt.figure(figsize=(15,7))
ax = sns.countplot(x='GraduateOrNot',hue ='TravelInsurance', data=data,linewidth=1, edgecolor=".2",palette="Set3")
for container in ax.containers:
    ax.bar_label(container)
plt.title("Impact of GraduateOrNot on Travel Insurance", fontdict={'fontsize':15})
plt.xlabel('Graduate')
plt.show()

* **FamilyMembers:**

In [None]:
plt.figure(figsize=(15,7))
ax = sns.countplot(x='FamilyMembers',hue ='TravelInsurance', data=data,linewidth=1, edgecolor=".2",palette="Set3")
for container in ax.containers:
    ax.bar_label(container)
plt.title("Impact of FamilyMembers on Travel Insurance", fontdict={'fontsize':15})
plt.xlabel('Family Members')
plt.show()

* **ChronicDiseases:**

In [None]:
plt.figure(figsize=(15,7))
ax = sns.countplot(x='ChronicDiseases',hue ='TravelInsurance', data=data,linewidth=1, edgecolor=".2",palette="Set3")
for container in ax.containers:
    ax.bar_label(container)
plt.title("Impact of ChronicDiseases on Travel Insurance", fontdict={'fontsize':15})
plt.xlabel('Chronic Diseases')
plt.show()

* **FrequentFlyer:**

In [None]:
plt.figure(figsize=(15,7))
ax = sns.countplot(x='FrequentFlyer',hue ='TravelInsurance', data=data,linewidth=1, edgecolor=".2",palette="Set3")
for container in ax.containers:
    ax.bar_label(container)
plt.title("Impact of FrequentFlyer on Travel Insurance", fontdict={'fontsize':15})
plt.xlabel('Frequent Flyer')
plt.show()

* **EverTravelledAbroad:**

In [None]:
plt.figure(figsize=(15,7))
ax = sns.countplot(x='EverTravelledAbroad',hue ='TravelInsurance', data=data,linewidth=1, edgecolor=".2",palette="Set3")
for container in ax.containers:
    ax.bar_label(container)
plt.title("Impact of EverTravelledAbroad on Travel Insurance", fontdict={'fontsize':15})
plt.xlabel('Ever Travelled Abroad')
plt.show()

In [None]:
data['ChronicDiseases'].replace({'No':0,'Yes':1},inplace=True)
data['TravelInsurance'].replace({'No':0,'Yes':1},inplace=True)

### **Data preprocessing**

* changing all the object values in int values using LabelEncoder

In [None]:
le = LabelEncoder()
var = ['Employment Type','GraduateOrNot','FrequentFlyer','EverTravelledAbroad']
for i in var:
    data[i] = le.fit_transform(data[i])

In [None]:
data.head()

In [None]:
y = data['TravelInsurance']
data.drop('TravelInsurance',axis=1,inplace=True)

### **Building Model**

### hyper-parameters tuning using optuna:

Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning.

In [None]:
def fit_lgb(trial, x_train, y_train, x_test, y_test):
    params = {
        'reg_alpha' : trial.suggest_loguniform('reg_alpha' , 1e-4, 1e4),#L1 regularization
        'reg_lambda' : trial.suggest_loguniform('reg_lambda' ,1e-4, 1e4),#L2 regularization
        'learning_rate' : trial.suggest_uniform('learning_rate' , 0.03 , 0.07),#determines the step size at each iteration while moving toward a minimum of a loss function
        'max_depth' : trial.suggest_int('max_depth', 1 , 20),#maximum depth to which each tree will be built.
        'n_estimators' : trial.suggest_int('n_estimators', 100 , 20000) # the number of trees you want to build before taking the maximum voting or averages of predictions
    }

    
    
    model = LGBMClassifier(**params)
    model.fit(x_train, y_train,eval_set=[(x_test,y_test)], early_stopping_rounds=150, verbose=False)
    
    y_train_pred = model.predict_proba(x_train)[:,1]
    
    y_test_pred = model.predict_proba(x_test)[:,1]
    y_train_pred = np.clip(y_train_pred, 0.1, None)
    y_test_pred = np.clip(y_test_pred, 0.1, None)
    
    log = {
        "train accuracy": roc_auc_score(y_train, y_train_pred),
        "valid accuracy": roc_auc_score(y_test, y_test_pred)
    }
    
    return model, log

In [None]:
def objective(trial):
    acc = 0
    x_train, x_test, y_train, y_test = train_test_split(data, y, test_size=0.20)
    model, log = fit_lgb(trial, x_train, y_train, x_test, y_test)
    acc += log['valid accuracy']
        
    return acc

In [None]:
study = optuna.create_study(direction = 'maximize')
study.optimize(objective,n_trials=15)

* these are the best params recovered from Optuna.

In [None]:
lgb_params = study.best_params
lgb_params

### Lightgbm Model:

In [None]:
folds = KFold(n_splits = 5, shuffle = True)

for fold, (trn_idx, val_idx) in enumerate(folds.split(data)):
    print(f"Fold: {fold}")
    X_train, X_test = data.iloc[trn_idx], data.iloc[val_idx]
    y_train, y_test = y.iloc[trn_idx], y.iloc[val_idx]

    model = LGBMClassifier(**lgb_params)
   
    model.fit(X_train, y_train,
              eval_set=[(X_test, y_test)],
                early_stopping_rounds=400,
                verbose=False)
    pred = model.predict_proba(X_test)[:,1]
    roc = roc_auc_score(y_test, pred)
    print(f" roc_auc_score: {roc}")
    print("-"*50)
    

In [None]:
import pickle
# save the model to disk
filename = 'finalized_model.pkl'
pickle.dump(model, open(filename, 'wb'))