# U.S. Medical Insurance Costs

The CSV File has 7 columns: [age, sex, bmi, children, smoker, region, charges]

Important notes:
- The age range is 18-64 years old, is numerical
- Sex is male or female, is categorical
- BMI Range is 15.96 - 53.13, is numerical
- Children range is 0 - 5, is categorical
- Smoker is boolean: Yes or No, is categorical
- Region only has 4 different regions: (southwest, southeast, northwest and northeast), is categorical
- Charges range is 1121.87 - 63770.43 dollars, is numerical
- Columns index: {age: 0, sex: 1, bmi: 2, children: 3, smoker: 4, region: 5, charges: 6}

US Medical Insurance Costs Project Roadmap

This document outlines the roadmap for our US Medical Insurance Costs project, detailing the goals, analysis, data, evaluation, and output. This structured approach will guide us in conducting a thorough analysis using Python and presenting our findings in a professional manner.

Project Goals
The primary objectives of this project are to:
Make Predictions for Future Projects: Utilize the insights gained from the analysis to forecast potential trends and costs in medical insurance.
Analyze the Data with Learned Skills: Apply the analytical techniques and skills acquired through prior learning to extract meaningful insights from the dataset.

Analysis
The analysis will focus on several key aspects of the dataset to understand the factors influencing medical insurance costs:
Average Age of the Patients: Determine the mean age of individuals in the dataset.
Average BMI of the Patients: Calculate the average Body Mass Index (BMI) to assess health trends.
Average Number of Children per Patient: Analyze the family size of patients.
Patient Distribution by Region: Count the number of patients in each region to identify regional patterns.
Average Charge per Patient: Compute the mean insurance charge to understand overall cost trends.
Impact of Factors on Charges: Investigate how the number of children, BMI, smoking status, and region affect insurance charges.

Data
The project utilizes the us-medical-insurance-cost.csv file. This dataset has been thoroughly checked for cleanliness and completeness, ensuring that all necessary information is available for the analysis.

Evaluation
The evaluation phase will involve interpreting the results from the analysis. This will include:
Summarizing findings based on the average calculations and distributions.
Evaluating how demographic and lifestyle factors influence insurance costs.
Formulating predictions and recommendations for future insurance cost trends.

Output
The final output will be a presentation or dashboard that communicates our findings effectively. This will include:
Simple and Professional Design: Using a minimalistic color palette to maintain a professional appearance.

Key Insights and Visualizations: Presenting data through clear charts and graphs to enhance understanding.

Actionable Recommendations: Offering insights that stakeholders can utilize for decision-making.

By following this roadmap, the project aims to deliver comprehensive insights into US medical insurance costs and provide valuable predictions for future considerations.


In [None]:
# This script reads a CSV file named 'insurance.csv' and prints the maximum and minimum values of the first column (age).
# It assumes that the first column contains integer values representing ages.
# The CSV file is expected to have a header row, which will be skipped.
# The script uses the csv module to read the file and stores the ages in a list.
# It then calculates and prints the minimum and maximum ages.
import csv

ages = []

with open('insurance.csv') as file:
    reader = csv.reader(file)
    next(reader)  # Skip header
    for row in reader:
        age = int(row[0])  # The first column is 'age'
        ages.append(age)

print("Minimum age:", min(ages))
print("Maximum age:", max(ages))

Minimum age: 18
Maximum age: 64


In [None]:
# This script reads a CSV file named 'insurance.csv' and prints the maximum and minimum values of the third column(bmi).
bmis = []

with open('insurance.csv') as file:
    reader = csv.reader(file)
    next(reader)  # Skip header
    for row in reader:
        bmi = float(row[2])  # 3rd column is index 2
        bmis.append(bmi)

print("Minimum BMI:", min(bmis))
print("Maximum BMI:", max(bmis))


Minimum BMI: 15.96
Maximum BMI: 53.13


In [None]:
# This script reads a CSV file named 'insurance.csv' and prints the maximum and minimum values of the fourth and seventh column(children & charges).
children_list = []
charges_list = []

with open('insurance.csv') as file:
    reader = csv.reader(file)
    next(reader)  # Skip header
    for row in reader:
        children = int(row[3])           # 4th column (index 3)
        charges = round(float(row[6]), 2)  # 7th column (index 6), rounded
        children_list.append(children)
        charges_list.append(charges)

print("Children - Min:", min(children_list), "Max:", max(children_list))
print("Charges - Min:", min(charges_list), "Max:", max(charges_list))



Children - Min: 0 Max: 5
Charges - Min: 1121.87 Max: 63770.43


In [3]:
# Initialize lists for each column
import csv
ages_list = []
sex_list = []
bmi_list = []
children_list = []
smoker_list = []
region_list = []
charges_list = []

# Read the CSV file and populate the lists
with open('insurance.csv') as file:
    reader = csv.reader(file)
    next(reader)  # Skip header
    for row in reader:
        ages_list.append(int(row[0]))          # Age column
        sex_list.append(row[1])               # Sex column
        bmi_list.append(float(row[2]))        # BMI column
        children_list.append(int(row[3]))     # Children column
        smoker_list.append(row[4])            # Smoker column
        region_list.append(row[5])            # Region column
        charges_list.append(round(float(row[6]), 2))  # Charges column

In [11]:
# General average age
general_average_age = sum(ages_list) / len(ages_list)
print("General Average Age:", round(general_average_age, 0), "years old")

# Average age for males
male_ages = [age for i, age in enumerate(ages_list) if sex_list[i] == 'male']
average_male_age = sum(male_ages) / len(male_ages)
print("Average Male Age:", round(average_male_age, 0), "years old")

# Average age for females
female_ages = [age for i, age in enumerate(ages_list) if sex_list[i] == 'female']
average_female_age = sum(female_ages) / len(female_ages)
print("Average Female Age:", round(average_female_age, 0), "years old")

General Average Age: 39.0 years old
Average Male Age: 39.0 years old
Average Female Age: 40.0 years old


In [12]:
# Calculate the average BMI
average_bmi = sum(bmi_list) / len(bmi_list)
print("Average BMI of the Patients:", round(average_bmi, 2))

Average BMI of the Patients: 30.66


In [16]:
# Patient distribution by region without using Counter
region_distribution = {}
for region in region_list:
    if region in region_distribution:
        region_distribution[region] += 1
    else:
        region_distribution[region] = 1

print("Patient Distribution by Region:", region_distribution)

# Calculate the average number of children per patient
average_children = sum(children_list) // len(children_list)
print("Average Number of Children per Patient:", average_children)

Patient Distribution by Region: {'southwest': 325, 'southeast': 364, 'northwest': 325, 'northeast': 324}
Average Number of Children per Patient: 1


In [17]:
# Impact of number of children on charges
charges_by_children = {}
for children, charge in zip(children_list, charges_list):
    if children not in charges_by_children:
        charges_by_children[children] = []
    charges_by_children[children].append(charge)

average_charges_by_children = {k: round(sum(v) / len(v), 2) for k, v in charges_by_children.items()}
print("Average Charges by Number of Children:", average_charges_by_children)

# Impact of BMI on charges (categorizing BMI into underweight, normal, overweight, obese)
bmi_categories = {'underweight': [], 'normal': [], 'overweight': [], 'obese': []}
for bmi, charge in zip(bmi_list, charges_list):
    if bmi < 18.5:
        bmi_categories['underweight'].append(charge)
    elif 18.5 <= bmi < 25:
        bmi_categories['normal'].append(charge)
    elif 25 <= bmi < 30:
        bmi_categories['overweight'].append(charge)
    else:
        bmi_categories['obese'].append(charge)

average_charges_by_bmi_category = {k: round(sum(v) / len(v), 2) for k, v in bmi_categories.items()}
print("Average Charges by BMI Category:", average_charges_by_bmi_category)

# Impact of smoking status on charges
charges_by_smoker = {'yes': [], 'no': []}
for smoker, charge in zip(smoker_list, charges_list):
    charges_by_smoker[smoker].append(charge)

average_charges_by_smoker = {k: round(sum(v) / len(v), 2) for k, v in charges_by_smoker.items()}
print("Average Charges by Smoking Status:", average_charges_by_smoker)

# Impact of region on charges
charges_by_region = {}
for region, charge in zip(region_list, charges_list):
    if region not in charges_by_region:
        charges_by_region[region] = []
    charges_by_region[region].append(charge)

average_charges_by_region = {k: round(sum(v) / len(v), 2) for k, v in charges_by_region.items()}
print("Average Charges by Region:", average_charges_by_region)

# Calculate the average charge per patient
average_charge = sum(charges_list) / len(charges_list)
print("Average Charge per Patient:", round(average_charge, 2))

Average Charges by Number of Children: {0: 12365.98, 1: 12731.17, 3: 15355.32, 2: 15073.56, 5: 8786.04, 4: 13850.66}
Average Charges by BMI Category: {'underweight': 8852.2, 'normal': 10409.34, 'overweight': 10987.51, 'obese': 15552.34}
Average Charges by Smoking Status: {'yes': 32050.23, 'no': 8434.27}
Average Charges by Region: {'southwest': 12346.94, 'southeast': 14735.41, 'northwest': 12417.58, 'northeast': 13406.38}
Average Charge per Patient: 13270.42


Medical Insurance Cost Analysis Report

Objective
This report analyzes key demographic and lifestyle factors that influence medical insurance charges using the dataset us-medical-insurance-cost.csv. The goal is to identify trends and patterns in the data that can inform healthcare and insurance policy decisions.

1. Demographic and Health Profile of the Patients
Average Age
Overall average age: 39 years

By gender:

Males: 39 years

Females: 40 years

The dataset includes a balanced adult population with a slight age difference by gender. Since the average age is close to 40, age-related health issues may begin to impact medical costs.

Average BMI
Overall average BMI: 30.66

This value is in the "obese" category according to WHO standards (BMI ≥ 30), suggesting a significant proportion of the population may be at risk of obesity-related health conditions, which are known to increase medical costs.

Average Number of Children
Average children per patient: 1

Patients tend to have small family sizes, which may slightly influence insurance premiums due to dependents but is not the dominant cost driver.

2. Regional and Cost Distribution
Patient Distribution by Region

Southwest: 325

Southeast: 364

Northwest: 325

Northeast: 324

The dataset is regionally well-balanced, allowing for fair comparisons of insurance costs by region.

Average Charge per Patient
Overall average charge: $13,270.42

This is the baseline cost, influenced by age, lifestyle, and health indicators.

3. Factors Influencing Insurance Charges

Impact of Number of Children

Children	Average Charge ($)
0	        12,365.98$
1	        12,731.17$
2	        15,073.56$
3	        15,355.32$
4	        13,850.66$
5	        8,786.04$

Insurance charges generally increase with the number of children, up to 3. However, the drop at 5 children may suggest outlier effects or specific policies targeting large families with subsidies.

Impact of BMI Category

BMI Category	Average Charge ($)
Underweight	    8,852.20$
Normal	        10,409.34$
Overweight	    10,987.51$
Obese	        15,552.34$

There is a clear trend of increasing insurance costs with higher BMI. Obese individuals incur significantly higher medical expenses, highlighting the cost implications of obesity-related health risks.

Impact of Smoking Status
Smoking Status	Average Charge ($)

Yes	            32,050.23$
No	            8,434.27$

Smokers are charged nearly four times more on average than non-smokers, emphasizing the strong correlation between smoking and higher healthcare costs.

Impact of Region
Region	        Average Charge ($)

Southwest	    12,346.94$
Southeast	    14,735.41$
Northwest	    12,417.58$
Northeast	    13,406.38$

The Southeast has the highest average charges, possibly due to regional health disparities, lifestyle differences, or access to healthcare services. The Southwest and Northwest show lower average charges.

4. Key Insights and Recommendations

Findings
Lifestyle Choices Matter: Smoking and obesity are the most influential factors in raising insurance charges.
Family Size Has a Moderate Effect: More children generally mean higher costs, but the effect plateaus or drops after three children.
Regional Differences Are Noticeable: The Southeast shows higher insurance costs, which may require targeted regional interventions.
Demographics Are Stable: Age and gender distributions are fairly consistent and do not drastically skew the data.

Recommendations
Promote Preventive Health Programs: Encourage smoking cessation and weight management programs to reduce long-term insurance costs.
Tailor Insurance Plans by Risk Factors: Consider personalized premiums based on lifestyle-related risk assessments.
Investigate Regional Healthcare Access: Conduct deeper regional analyses to understand why Southeast costs are higher and address disparities.
Policy Incentives for Healthy Living: Implement premium discounts or benefits for maintaining healthy BMI and non-smoking status.

Future Predictions
If current trends continue, insurance costs will likely keep rising, particularly among smokers and individuals with high BMI.
Insurers and policymakers should prepare for increasing claims by investing in public health campaigns and offering incentives for healthier lifestyles.