# U.S. Medical Insurance Costs

## Goals / Scope
- Average age of patients in dataset
- Average age of patients with at least one child
- Where most patients are from
- Smoker vs non-smoker cost difference
- Number of patients and average insurance cost by BMI categories

### What is the average age of patients in the dataset?

In [119]:
# importing the dataset
import csv

with open('insurance.csv', newline='') as i_data:
    insurance_csv_data = csv.DictReader(i_data)

    #since no names have been included, I gave everyone an id
    insurance_data = {id: row for id, row in enumerate(insurance_csv_data)} 

# calculating the average age of patients

def average_age_calc(insurance_data):
    total_age = 0
    
    for patient in insurance_data.values():
        total_age += int(patient['age'])
    
    # print(total_age) # 52,459 years
    
    average_age = total_age / len(insurance_data)

    return average_age

print(average_age_calc(insurance_data)) # average age rounded to 39.21 years old
        

39.20702541106129


#### Conclusion
The average age of patients in the dataset is 39.21 years old.

### What is the average age of patients with at least one child?

In [109]:
def average_age_of_patients_with_child(insurance_data):

    patients_with_child_counter = 0

    total_age_with_child = 0

    for patient in insurance_data.values():
        if int(patient['children']) > 0:
            total_age_with_child += int(patient['age'])
            patients_with_child_counter += 1
    
    average_age_of_patients_with_child = total_age_with_child / patients_with_child_counter

    return average_age_of_patients_with_child

print(average_age_of_patients_with_child(insurance_data)) # average age of patients with one or more children rounded to 39.78 years old


39.78010471204188


#### Conclusion
The average age of patients in the dataset with one or more children is 39.78 years old.

### Where are Most Patients From?

In [105]:
from collections import defaultdict

def patient_distribution_calc(insurance_data):

    number_of_patients_by_region = defaultdict(int)

    for patient in insurance_data.values():
        origin = patient.get('region')
        if origin:
            number_of_patients_by_region[origin] += 1

    return number_of_patients_by_region

print(patient_distribution_calc(insurance_data))
    

defaultdict(<class 'int'>, {'southwest': 325, 'southeast': 364, 'northwest': 325, 'northeast': 324})


#### Conclusion
Based on my calculations, the distribution of patients is fairly even at these numbers:
- Southwest: 325
- Southeast: 364
- Northwest: 325
- Northeast: 324

However, the majority of patients come from the Southeast region of the United States.

### What is the cost difference between smokers and non-smokers?

In [140]:
def smoker_vs_non_smoker_average_cost_calc(insurance_data):
    smoker_total_costs = 0
    total_smokers = 0
    non_smoker_total_costs = 0
    total_non_smokers = 0
    
    for patient in insurance_data.values():
        if patient['smoker'] == 'yes':
            total_smokers += 1
            smoker_total_costs += float(patient['charges'])
        else:
            total_non_smokers += 1
            non_smoker_total_costs += float(patient['charges'])

    average_cost_for_smokers = smoker_total_costs / total_smokers

    average_cost_for_non_smokers = non_smoker_total_costs / total_non_smokers

    percent_diff = (abs(average_cost_for_smokers - average_cost_for_non_smokers)) / ((average_cost_for_smokers + average_cost_for_non_smokers) / 2) * 100

    return percent_diff

print(smoker_vs_non_smoker_average_cost_calc(insurance_data)) # 116.67% increase
            

116.66669198433817


#### Conclusion
Since smokers pay about 116.67% more in charges for insurance, I would recommend quitting smoking if possible.

### What is the number of patients in each BMI category?

In [189]:
def categorize_and_average_by_bmi(insurance_data):
    # bmi_categories = {
    #     'Underweight': '< 18.5',
    #     'Normal weight': '18.5 to 24.9',
    #     'Overweight': '25 to 29.9',
    #     'Obese': '>= 30'
    # }

    underweight_total = 0
    normal_weight_total = 0
    overweight_total = 0
    obese_total = 0

    # initialized dictionary
    patients_by_bmi_category = {
        'Underweight': {'Num_of_patients': 0, 'Average_cost': 0}, 
        'Normal Weight': {'Num_of_patients': 0, 'Average_cost': 0}, 
        'Overweight': {'Num_of_patients': 0, 'Average_cost': 0}, 
        'Obese': {'Num_of_patients': 0, 'Average_cost': 0}
    }

    for patient in insurance_data.values():
        bmi = float(patient['bmi'])
        cost = float(patient['charges'])

        if bmi < 18.5:
            category = 'Underweight'
            underweight_total += cost
            
        elif bmi >= 18.5 and bmi <= 24.9:
            category = 'Normal Weight'
            normal_weight_total += cost
            
        elif bmi >= 25 and bmi <= 29.9:
            category = 'Overweight'
            overweight_total += cost
        
        else:
            category = 'Obese'
            obese_total += cost

        patients_by_bmi_category[category]['Num_of_patients'] += 1
        
    # Average calculations for each category
    underweight_average = underweight_total / float(patients_by_bmi_category['Underweight']['Num_of_patients'])
    patients_by_bmi_category['Underweight']['Average_cost'] += underweight_average

    normal_weight_average = normal_weight_total / float(patients_by_bmi_category['Normal Weight']['Num_of_patients'])
    patients_by_bmi_category['Normal Weight']['Average_cost'] += normal_weight_average

    overweight_average = overweight_total / float(patients_by_bmi_category['Overweight']['Num_of_patients'])
    patients_by_bmi_category['Overweight']['Average_cost'] += overweight_average

    obese_average = obese_total / float(patients_by_bmi_category['Obese']['Num_of_patients'])
    patients_by_bmi_category['Obese']['Average_cost'] += obese_average
    

    return patients_by_bmi_category
    

print(categorize_and_average_by_bmi(insurance_data))
comparison_dict = categorize_and_average_by_bmi(insurance_data)

def calc_percent_diffs(comparison_dict):
    normal_weight_vs_obese_diff = (abs(comparison_dict['Normal Weight']['Average_cost'] - comparison_dict['Obese']['Average_cost'])) / ((comparison_dict['Normal Weight']['Average_cost'] + comparison_dict['Obese']['Average_cost']) / 2) * 100
    return normal_weight_vs_obese_diff

print(calc_percent_diffs(comparison_dict)) # 39.44

{'Underweight': {'Num_of_patients': 20, 'Average_cost': 8852.200585000002}, 'Normal Weight': {'Num_of_patients': 222, 'Average_cost': 10379.499732162163}, 'Overweight': {'Num_of_patients': 377, 'Average_cost': 10993.994037132627}, 'Obese': {'Num_of_patients': 719, 'Average_cost': 15479.549772628647}}
39.444992280335875


#### Conclusion
Body Mass Index is not considered an accurate measure of healthy weights by modern standards, and yet medical insurance calculations still take it into account when calculating prices. Considering how many people qualify as 'obese' based on their BMI and pay higher rates because of it, it should not be included in insurance cost calculations.

Here are my calculations rounded:

| BMI Category | Number of Patients | Average Cost of Medical Insurance ($) |
| :-: | :-: | :-: |
| Underweight | 20 | 8852.20 
| Normal Weight | 222 | 10379.50 
| Overwight | 377 | 10994.00 
| Obese | 719 | 15479.55 

Assuming that being underweight by BMI standards is unhealthy and should not be aspired to, there's a 39.44% spike in insurance costs between people with 'normal' weight and people with 'obese' weight. 

## Final Thoughts

There are many factors that go into calculating medical insurance costs, and some of them are outdated, but it would appear that the most influential factor that determines the cost is whether or not one is a smoker because it has the largest percentage increase.