# U.S. Medical Insurance Costs

## Codecademy's Prompts for Analysis:
- Find out the average age of the patients in the dataset.
- Analyze where a majority of the individuals are from.
- Look at the different costs between smokers vs. non-smokers.
- Figure out what the average age is for someone who has at least one child in this dataset.

## My Analysis
- Gender: What are the average insurance costs by gender?
- Regional Costs: What are the average insurance costs by region? Is there a region with notably higher or lower costs?
- BMI vs. Costs: Is there a correlation between BMI and insurance costs? How does BMI affect insurance costs?
- Age vs. Costs: How do insurance costs vary with age? Is there a noticeable trend?

In [2]:
# import the csv and assign the data to a dictionary

import csv

insurance_records = {}
with open('insurance.csv', newline='') as insurance_csv:
    insurance_data = csv.DictReader(insurance_csv)
    id = 1
    for row in insurance_data:
        insurance_records[id] = row
        id += 1

In [69]:
# Find out the average age of the patients in the dataset.

def average_age(insurance_records):
    ages = 0
    for individual in insurance_records:
        age = insurance_records[individual].get('age')
        ages += int(age)
    average_age = ages/len(insurance_records)
    print("The average age of a patient in this data set is " + str(round(average_age, 0)))
average_age(insurance_records)

The average age of a patient in this data set is 39.0


In [71]:
# Analyze where a majority of the individuals are from.

regions = {'southwest': 0, 'southeast': 0, 'northwest': 0, 'northeast':0}
total_patients = len(insurance_records)

for individual in insurance_records:
    region = insurance_records[individual].get('region')
    regions[region] += 1

for x in regions:
    print("The number of patients in the " + x + " is " + str(regions[x]) + ", which is " + str(round(regions[x]/total_patients * 100, 2)) 
          + " percent of the sample.")

The number of patients in the southwest is 325, which is 24.29 percent of the sample.
The number of patients in the southeast is 364, which is 27.2 percent of the sample.
The number of patients in the northwest is 325, which is 24.29 percent of the sample.
The number of patients in the northeast is 324, which is 24.22 percent of the sample.


In [73]:
# Look at the different costs between smokers vs. non-smokers.

smokers = 0
non_smokers = 0
smoker_costs = 0
non_smoker_costs = 0

for individual in insurance_records:
    status = insurance_records[individual].get('smoker')
    insurance_cost = insurance_records[individual].get('charges')
    if status == 'yes':
        smokers += 1
        smoker_costs += float(insurance_cost)
    else:
        non_smokers += 1
        non_smoker_costs += float(insurance_cost)
print("There are " + str(smokers) + " smokers in the dataset. The total cost for this group is: " + str(round(smoker_costs, 2))
      + " dollars. The average cost is: " + str(round(smoker_costs/smokers, 2)) + " dollars.")
print("There are " + str(non_smokers) + " non-smokers in the dataset. The total cost for this group is: " + str(round(non_smoker_costs, 2))
      + " dollars. The average cost is: " + str(round(non_smoker_costs/non_smokers, 2)) + " dollars.")

There are 274 smokers in the dataset. The total cost for this group is: 8781763.52 dollars. The average cost is: 32050.23 dollars.
There are 1064 non-smokers in the dataset. The total cost for this group is: 8974061.47 dollars. The average cost is: 8434.27 dollars.


In [8]:
# Figure out what the average age is for someone who has at least one child in this dataset.

total_ages = 0
parents = 0
for individual in insurance_records:
    age_of_individual = int(insurance_records[individual].get('age'))
    number_of_children = int(insurance_records[individual].get('children'))
    
    if number_of_children > 1:
        total_ages += age_of_individual
        parents += 1

print("The average age of a patient with at least one child is: " + str(round(total_ages/parents, 2)))

The average age of a patient with at least one child is: 40.02


In [27]:
# Gender: What are the average insurance costs by gender?

gender_costs = {'female': {'sum': 0, 'count': 0}, 'male': {'sum': 0, 'count': 0}}

for individual in insurance_records:
    sex = insurance_records[individual].get('sex')
    cost = float(insurance_records[individual].get('charges'))
    gender_costs[sex]['sum'] += cost
    gender_costs[sex]['count'] += 1
female_avg_cost = round(gender_costs['female']['sum']/gender_costs['female']['count'], 2)
male_avg_cost = round(gender_costs['male']['sum']/gender_costs['male']['count'], 2)

print("The average cost of insurance for a female is " + str(female_avg_cost) + " dollars and the average cost for a male is " 
      + str(male_avg_cost) + " dollars")

The average cost of insurance for a female is 12569.58 dollars and the average cost for a male is 13956.75 dollars


In [45]:
# Regional Costs: What are the average insurance costs by region? Is there a region with notably higher or lower costs?

region_costs = {'southwest': 0, 'southeast': 0, 'northwest': 0, 'northeast':0}

for individual in insurance_records:
    region = insurance_records[individual].get('region')
    cost = float(insurance_records[individual].get('charges'))
    region_costs[region] += round(cost, 2)
   
region_costs = {region: round(total, 2) for region, total in region_costs.items()}
print("The total cost of insurance for each region is " + str(region_costs))

average_region_costs = {'southwest': 0, 'southeast': 0, 'northwest': 0, 'northeast':0}

for region in region_costs:
    average_region_costs[region] = round(region_costs[region]/regions[region], 2)
print("The average cost of insurance for each region is " + str(average_region_costs))

The total cost of insurance for each region is {'southwest': 4012754.69, 'southeast': 5363689.78, 'northwest': 4035711.93, 'northeast': 4343668.61}
The average cost of insurance for each region is {'southwest': 12346.94, 'southeast': 14735.41, 'northwest': 12417.58, 'northeast': 13406.38}


In [8]:
min_bmi = float("inf")
max_bmi = 0
for individual in insurance_records:
    bmi = float(insurance_records[individual].get('bmi'))
    if bmi < min_bmi:
        min_bmi = bmi
    elif bmi > max_bmi:
        max_bmi = bmi

bmi_bins = [0, 18.5, 24.9, 39.9, float("inf")] 
bmi_costs = {
    "0-18.5": {'sum': 0, 'count': 0},
    "18.5-24.9": {'sum': 0, 'count': 0},
    "24.9-39.9": {'sum': 0, 'count': 0},
    "39.9+": {'sum': 0, 'count': 0}
}

for individual in insurance_records:
    bmi = float(insurance_records[individual].get('bmi'))
    cost = float(insurance_records[individual].get('charges'))
    for i in range(len(bmi_bins)-1):
        if bmi_bins[i] <= bmi < bmi_bins[i + 1]:
            label = list(bmi_costs.keys())[i]
            bmi_costs[label]["sum"] += cost
            bmi_costs[label]["count"] += 1
bmi_costs = {label: {'sum': round(data['sum'],2), 'count': data['count']} for label, data in bmi_costs.items()}

for bmi in bmi_costs:
    print("The average cost of insurance for patients in the bmi range " + bmi + " is " 
          + str(round(bmi_costs[bmi]['sum']/bmi_costs[bmi]['count'], 2)))

The average cost of insurance for patients in the bmi range 0-18.5 is 8852.2
The average cost of insurance for patients in the bmi range 18.5-24.9 is 10379.5
The average cost of insurance for patients in the bmi range 24.9-39.9 is 13648.97
The average cost of insurance for patients in the bmi range 39.9+ is 17002.78


In [12]:
# Age vs. Costs: How do insurance costs vary with age? Is there a noticeable trend?

min_age = float("inf")
max_age = 0
for individual in insurance_records:
    age = int(insurance_records[individual].get('age'))
    if age < min_age:
        min_age = age
    elif age > max_age:
        max_age = age

age_ranges = {"18-24": {'sum': 0, 'count': 0},
              "25-34": {'sum': 0, 'count': 0},
              "35-44": {'sum': 0, 'count': 0},
              "45-54": {'sum': 0, 'count': 0},
              "55-64": {'sum': 0, 'count': 0},
             }
age_groups = [
    (18, 24, "18-24"),
    (25, 34, "25-34"),
    (35, 44, "35-44"),
    (45, 54, "45-54"),
    (55, 64, "55-64")
]

for individual in insurance_records:
    age = int(insurance_records[individual].get('age'))
    cost = float(insurance_records[individual].get('charges'))
    for start, end, group in age_groups:
        if start <= age <= end:
            age_ranges[group]['sum'] += cost
            age_ranges[group]['count'] += 1

for group in age_ranges:
    print("The average cost of insurance for patients aged " + group + " is: " + str(round(age_ranges[group]['sum']/age_ranges[group]['count'],2)))

The average cost of insurance for patients aged 18-24 is: 9011.34
The average cost of insurance for patients aged 25-34 is: 10352.39
The average cost of insurance for patients aged 35-44 is: 13134.17
The average cost of insurance for patients aged 45-54 is: 15853.93
The average cost of insurance for patients aged 55-64 is: 18513.28
