# Factors Affecting Purchase of Travel Insurance

Travel insurance is an essential consideration for many people when planning their trips. It provides financial protection and peace of mind in case of unexpected events during travel. However, not everyone chooses to purchase travel insurance. In this blog post, we will explore some factors that influence the purchase of travel insurance using a dataset and build a predictive model to analyze these factors.

## Data Exploration

To begin with, we import the necessary libraries, including Pandas for data manipulation and Plotly Express for data visualization. We load the dataset from a CSV file hosted on GitHub, which contains information about individuals and whether they have purchased travel insurance.

In [1]:
import pandas as pd

# download dataset
data = pd.read_csv("https://raw.githubusercontent.com/amankharwal/Website-data/master/TravelInsurancePrediction.csv")
data.head()

Unnamed: 0.1,Unnamed: 0,Age,Employment Type,GraduateOrNot,AnnualIncome,FamilyMembers,ChronicDiseases,FrequentFlyer,EverTravelledAbroad,TravelInsurance
0,0,31,Government Sector,Yes,400000,6,1,No,No,0
1,1,31,Private Sector/Self Employed,Yes,1250000,7,0,No,No,0
2,2,34,Private Sector/Self Employed,Yes,500000,4,1,No,No,1
3,3,28,Private Sector/Self Employed,Yes,700000,3,1,No,No,0
4,4,28,Private Sector/Self Employed,Yes,700000,8,1,Yes,No,0


After read the dataset, drops an unnecessary column, and converts the numerical labels of the "TravelInsurance" column into more meaningful labels.

In [2]:
# Drop unnecessary column
data.drop(columns=["Unnamed: 0"], inplace=True)
data.isnull().sum()

In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1987 entries, 0 to 1986
Data columns (total 9 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   Age                  1987 non-null   int64 
 1   Employment Type      1987 non-null   object
 2   GraduateOrNot        1987 non-null   object
 3   AnnualIncome         1987 non-null   int64 
 4   FamilyMembers        1987 non-null   int64 
 5   ChronicDiseases      1987 non-null   int64 
 6   FrequentFlyer        1987 non-null   object
 7   EverTravelledAbroad  1987 non-null   object
 8   TravelInsurance      1987 non-null   int64 
dtypes: int64(5), object(4)
memory usage: 139.8+ KB


In [5]:
data["TravelInsurance"] = data["TravelInsurance"].map({0: "Not Purchased", 1: "Purchased"})

## Data Visulization

It creates histograms using Plotly Express to visualize the distribution of age, employment type, and annual income among those who purchased and did not purchase travel insurance.

In [6]:
import plotly.express as px

# Visualize factors affecting travel insurance purchase
figure = px.histogram(data, x = "Age",
                      color = "TravelInsurance",
                      title= "Factors Affecting Purchase of Travel Insurance: Age")
figure.show()

In [7]:
# employment affect on insurance
figure = px.histogram(data, x = "Employment Type",
                      color = "TravelInsurance",
                      title= "Factors Affecting Purchase of Travel Insurance: Employment Type")
figure.show()

In [8]:
# income affect on insurance
figure = px.histogram(data, x = "AnnualIncome",
                      color = "TravelInsurance",
                      title= "Factors Affecting Purchase of Travel Insurance: Income")
figure.show()

### Building the Predictive Model

Next, we prepare the data for training a predictive model. We map categorical variables to numerical values and split the data into training and testing sets. We use a decision tree classifier to build the model and make predictions on the testing set. Finally, we evaluate the model's accuracy.

In [9]:
import numpy as np

# Map categorical variables to numerical values
data["GraduateOrNot"] = data["GraduateOrNot"].map({"No": 0, "Yes": 1})
data["FrequentFlyer"] = data["FrequentFlyer"].map({"No": 0, "Yes": 1})
data["EverTravelledAbroad"] = data["EverTravelledAbroad"].map({"No": 0, "Yes": 1})
# Prepare the input and output variables
X = np.array(data[["Age", "GraduateOrNot",
                   "AnnualIncome", "FamilyMembers",
                   "ChronicDiseases", "FrequentFlyer",
                   "EverTravelledAbroad"]])
y = np.array(data[["TravelInsurance"]])

In [10]:
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.10, random_state=42)

# Build and train the decision tree classifier
model = DecisionTreeClassifier()
model.fit(X_train, y_train)

# Make predictions on the testing set
predictions = model.predict(X_test)

In [12]:
from sklearn.metrics import accuracy_score

# Evaluate the model's accuracy
accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {accuracy}")

Accuracy: 0.8140703517587939


# Conclusion

In this code, we analyzed factors affecting the purchase of travel insurance using a dataset. We visualized the distribution of age, employment type, and annual income among those who purchased and did not purchase travel insurance. Additionally, we built a decision tree classifier to predict travel insurance purchase based on several factors. The model achieved an accuracy of [insert accuracy value]. Understanding these factors can help insurance providers tailor their offerings and marketing strategies to target potential customers more effectively.