## A . Introduction

### **HEALTH INSURANCE RECOMMENDER SYSTEM**

### **Abstract** 

Health insurance is crucial for financial security and access to quality healthcare. However, selecting the right insurance plan remains a challenge due to affordability constraints, policy complexities, and limited coverage for chronic conditions. This study aims to develop a personalized Health Insurance Recommender System that suggests the most suitable insurance plans based on users’ health conditions, financial capacity, and required benefits. The system will utilize demographic-based recommendations, collaborative filtering, and affordability-based segmentation to guide users in making informed decisions. Additionally, the study will analyze socioeconomic factors affecting health insurance adoption, identify coverage gaps, and assess how policy attributes influence user preferences. The results will offer insights for insurers to enhance policy accessibility and affordability, ultimately improving insurance penetration rates in Kenya.

### **Problem Statement**

In Kenya, health insurance penetration remains low, with many individuals struggling to find appropriate coverage due to affordability constraints and policy complexity. The lack of a structured recommendation system leads to misinformed choices, leaving many underinsured or uninsured. Individuals with chronic illnesses, such as asthma, diabetes, hypertension, cancer, and HIV, face even greater difficulties in obtaining sufficient coverage at an affordable rate.

Current health insurance selection methods primarily rely on user research or recommendations from agents, which can be biased and limited in scope. Additionally, low-income individuals often prioritize cost over comprehensive coverage, increasing their financial burden during medical emergencies. There is a need for a data-driven recommender system that simplifies insurance selection by considering users' health conditions, financial status, and policy preferences. This study aims to bridge the gap by developing a recommender system that enhances decision-making, improves insurance accessibility, and ensures that individuals receive adequate health coverage based on their needs.

### **Health Insurance Coverage in Kenya and Challenges**  

Health insurance in Kenya is primarily provided through the Social Health Authority (SHA) and private insurance companies.SHA covers formal sector employees mandatorily, while informal sector workers and low-income individuals must opt in voluntarily. Private insurers offer a range of policies tailored to different income groups, including employer-sponsored health plans, family insurance, and individual policies.  

Despite these options, several challenges persist:  
- **Low Penetration Rates**: A large portion of Kenya’s population remains uninsured, with affordability and lack of awareness being major barriers.  
- **Affordability Issues**: High insurance premiums, deductibles, and co-payment requirements deter low-income individuals from enrolling.  
- **Limited Coverage for Chronic Conditions**: Many insurance providers impose exclusions or high costs for individuals with pre-existing conditions like diabetes, cancer, and hypertension.  
- **Complex Policy Structures**: The variety of policies with different benefits, exclusions, and pricing makes it difficult for users to choose the best plan.  
- **Inadequate Public Health Insurance**: NHIF benefits are sometimes insufficient, forcing patients to pay out-of-pocket for essential services.  

### **Objectives**



#### Primary Objectives

1. Develop a recommender system that provides personalized insurance recommendations by analyzing a user’s pre-existing conditions, financial capacity, and required benefits, ensuring they select the most suitable coverage.

2. Implement techniques such as demographic-based recommendations, collaborative filtering, and default user profiles to ensure that new users receive relevant insurance suggestions even with minimal historical data.

#### Secondary Objectives

3. Analyze the impact of socioeconomic factors (e.g., age, gender, income, and location) on health insurance adoption and affordability.

4. Identify gaps in insurance coverage by evaluating how individuals with chronic illnesses (asthma, diabetes, hypertension, cancer, HIV) are insured across different income levels.

5. Develop an affordability-based insurance segmentation model to recommend the best plans for low-income individuals while ensuring sufficient coverage for chronic conditions.

6. Assess the role of policy attributes (e.g., premium cost, deductibles, and benefits) in determining user preferences for health insurance selection.


## B. Data Understanding & Exploration

### Load the Dataset

In [7]:
import pandas as pd
df = pd.read_csv(r"N:\Moringa\afterM\Health Insurance\Health-Insurance-Recommender-System\health_insurance_recommender.csv")
df.head()

Unnamed: 0,user_id,age,gender,country,region,employment_status,monthly_income,pre_existing_conditions,number_of_dependents,current_health_expenditure,...,education_level,recent_hospital_visits,smoking_habit,alcohol_consumption,existing_medications,disability_status,co_payment_preference,lifetime_coverage_limit,preferred_hospital,claim_reimbursement_speed
0,1,78.0,Female,Kenya,Urban,Self-Employed,342.147528,Cancer,2,2614.37,...,Bachelor's,5,No,Yes,No,Yes,High,50000,Life Healthcare,Medium
1,2,27.0,Male,Kenya,Semi-Urban,Unemployed,23.271139,,1,3343.49,...,Master's,10,Yes,Yes,No,Yes,High,1000000,Netcare,Slow
2,3,74.0,Male,Kenya,Urban,Unemployed,7605.332784,,5,4615.26,...,Diploma,1,No,Yes,Yes,Yes,High,1000000,Netcare,Fast
3,4,38.0,Male,Kenya,Rural,Employed,2349.243288,Cancer,3,612.38,...,Diploma,5,No,Yes,No,No,High,1000000,Netcare,Medium
4,5,43.0,Male,Kenya,Urban,Employed,17.187938,Asthma,4,2314.12,...,Bachelor's,0,No,Yes,No,Yes,High,1000000,Aga Khan Hospital,Fast


#### Columns/features Descriptions

In [19]:
dfcols = pd.read_excel(r"N:\Moringa\afterM\Health Insurance\Health-Insurance-Recommender-System\health_insurance_features_definitions.xlsx")
dfcols.head(30)

Unnamed: 0,Feature,Definition
0,user_id,Unique identifier for each user
1,age,Age of the user
2,gender,"Gender of the user (Male, Female, Other)"
3,country,Country of residence
4,monthly_income,User's monthly income in USD
5,health_condition,"Primary health condition of the user (e.g., Di..."
6,plan_cost,Monthly cost of the insurance plan in USD
7,deductible_amount,Amount user pays before insurance covers expenses
8,out_of_pocket_max,Maximum amount user pays per year before full ...
9,family_size,Number of dependents in the family


#### Summary Statistics

In [23]:
df.shape

(52500, 42)

We have a dataset of 52,500 individuals, each described by 42 different attributes, including demographic details, health status, and financial capacity. These features will help in analyzing trends, building predictive models, and making personalized insurance recommendations.

In [20]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52500 entries, 0 to 52499
Data columns (total 42 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   user_id                      52500 non-null  int64  
 1   age                          49859 non-null  float64
 2   gender                       52500 non-null  object 
 3   country                      52500 non-null  object 
 4   region                       52500 non-null  object 
 5   employment_status            52500 non-null  object 
 6   monthly_income               49864 non-null  float64
 7   pre_existing_conditions      43941 non-null  object 
 8   number_of_dependents         52500 non-null  int64  
 9   current_health_expenditure   52500 non-null  float64
 10  hospital_preference          52500 non-null  object 
 11  previous_insurance_coverage  52500 non-null  object 
 12  preferred_coverage_type      52500 non-null  object 
 13  insurance_provid

In [21]:
df.describe()

Unnamed: 0,user_id,age,monthly_income,number_of_dependents,current_health_expenditure,plan_cost,deductible_amount,out_of_pocket_max,user_satisfaction_rating,duplicate_plan_cost,waiting_period_months,family_size,recent_hospital_visits,lifetime_coverage_limit
count,52500.0,49859.0,49864.0,52500.0,52500.0,52500.0,52500.0,52500.0,49881.0,52500.0,52500.0,52500.0,52500.0,52500.0
mean,24980.571486,48.535911,2229.294781,2.492286,2542.757659,599.710239,113.86427,381.808335,3.001423,27581.73084,4.911219,3.501067,4.99059,315353.333333
std,14433.032416,17.922048,3993.264625,1.705581,1416.930992,1029.239746,200.843563,634.008554,1.41954,13051.249725,6.010216,1.709279,3.155993,347630.969987
min,1.0,18.0,1.281549,0.0,100.03,1.260237,0.126121,1.260728,1.0,5000.16,0.0,1.0,0.0,0.0
25%,12479.75,33.0,59.004709,1.0,1316.6525,22.398519,3.185938,21.623265,2.0,16220.6125,1.0,2.0,2.0,50000.0
50%,24974.5,48.0,611.07948,2.0,2540.765,168.51024,31.528224,107.380008,3.0,27641.475,3.0,4.0,5.0,100000.0
75%,37471.25,64.0,1360.54818,4.0,3766.235,354.35295,69.125184,215.736066,4.0,38970.1425,6.0,5.0,8.0,500000.0
max,50000.0,79.0,19434.9874,5.0,4999.99,4859.763804,971.92224,2915.811432,5.0,49999.71,24.0,6.0,10.0,1000000.0


### Data Cleaning

#### Handling Missing Values

In [28]:
df.isnull().sum()

user_id                           0
age                            2641
gender                            0
country                           0
region                            0
employment_status                 0
monthly_income                 2636
pre_existing_conditions        8559
number_of_dependents              0
current_health_expenditure        0
hospital_preference               0
previous_insurance_coverage       0
preferred_coverage_type           0
insurance_provider                0
plan_cost                         0
deductible_amount                 0
out_of_pocket_max                 0
medication_coverage               0
maternity_coverage                0
chronic_illness_coverage          0
emergency_coverage                0
dental_coverage                   0
vision_coverage                   0
user_satisfaction_rating       2619
recommended_plan                  0
policy_expiry_date                0
duplicate_plan_cost               0
duplicate_country           