## Problem Statement

Insurance companies need to accurately predict the cost of health insurance for individuals to set premiums appropriately.  
However, traditional methods of cost prediction often rely on broad actuarial tables and historical averages, which may not account for the nuanced differences among individuals.  

By leveraging machine learning techniques, insurers can more accurately predict insurance costs tailored to individual profiles, leading to:

- More competitive pricing  
- Improved risk management  
- Better alignment between premiums and actual health risks  

## Insurance Cost Prediction Need

The primary need for this project arises from the challenges insurers face in pricing policies accurately while remaining competitive in the market.  
Inaccurate predictions can lead to losses for insurers and unfairly high premiums for policyholders.  

By implementing a machine learning model, insurers can:

- **Enhance Precision in Pricing**: Use individual data points to determine premiums that reflect actual risk more closely than generic estimates.  
- **Increase Competitiveness**: Offer rates that are attractive to consumers while ensuring that the pricing is sustainable for the insurer.  
- **Improve Customer Satisfaction**: Fair and transparent pricing based on personal health data can increase trust and satisfaction among policyholders.  
- **Enable Personalized Offerings**: Create customized insurance packages based on predicted costs, catering more directly to individual needs and preferences.  
- **Risk Assessment**: Refine risk assessment processes by identifying key factors that most significantly influence costs.  
- **Policy Development**: Use insights from the model to inform the creation of new insurance products or the adjustment of existing ones.  
- **Strategic Decision Making**: Support broader strategic decisions, such as entering new markets or adjusting policy terms based on risk predictions.  
- **Customer Engagement**: Apply insights from the model in customer engagement initiatives, including personalized marketing and tailored advice for policyholders.  

## Data Description

The dataset comprises the following 11 attributes:

1. **Age**: Numeric, ranging from 18 to 66 years.  
2. **Diabetes**: Binary (0 or 1), where 1 indicates the presence of diabetes.  
3. **BloodPressureProblems**: Binary (0 or 1), indicating the presence of blood pressure-related issues.  
4. **AnyTransplants**: Binary (0 or 1), where 1 indicates the person has had a transplant.  
5. **AnyChronicDiseases**: Binary (0 or 1), indicating the presence of any chronic diseases.  
6. **Height**: Numeric, measured in centimeters, ranging from 145 cm to 188 cm.  
7. **Weight**: Numeric, measured in kilograms, ranging from 51 kg to 132 kg.  
8. **KnownAllergies**: Binary (0 or 1), where 1 indicates known allergies.  
9. **HistoryOfCancerInFamily**: Binary (0 or 1), indicating a family history of cancer.  
10. **NumberOfMajorSurgeries**: Numeric, counting the number of major surgeries, ranging from 0 to 3 surgeries.  
11. **PremiumPrice**: Numeric, representing the premium price in currency, ranging from 15,000 to 40,000.  

In [6]:
# import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [4]:
# importing the dataset
df = pd.read_csv('insurance.csv')

In [5]:
df.head()

Unnamed: 0,Age,Diabetes,BloodPressureProblems,AnyTransplants,AnyChronicDiseases,Height,Weight,KnownAllergies,HistoryOfCancerInFamily,NumberOfMajorSurgeries,PremiumPrice
0,45,0,0,0,0,155,57,0,0,0,25000
1,60,1,0,0,0,180,73,0,0,0,29000
2,36,1,1,0,0,158,59,0,0,1,23000
3,52,1,1,0,1,183,93,0,0,2,28000
4,38,0,0,0,1,166,88,0,0,1,23000
