# U.S. Medical Insurance Costs

Some goals of this project include:
* Investigate the age breakdown of the individuals in the data
* Look at how the individuals are distributed geographically across the U.S.
* Find the average cost for smokers vs non-smokers
* Find the average cost for people with 0, 1, 2, or 3+ children


First, I will convert the csv file to a dictionary where the keys are the row number (to represent an id for each individual) and the values are another dictionary of the column and row values.

In [34]:
import csv

filename = 'insurance.csv'
fields = []
rows = []

with open(filename, 'r') as csvfile:
    csvreader = csv.reader(csvfile)
    fields = next(csvreader)
    for row in csvreader:
        rows.append(row)

insurance_dict = {}
for i, info in enumerate(rows):
    ind_dict = {}
    for j, field in enumerate(fields):  
        if field in ['age', 'bmi', 'children', 'charges']:
            info[j] = float(info[j])
        ind_dict[field] = info[j]
    insurance_dict[i] = ind_dict

print(insurance_dict)

{0: {'age': 19.0, 'sex': 'female', 'bmi': 27.9, 'children': 0.0, 'smoker': 'yes', 'region': 'southwest', 'charges': 16884.924}, 1: {'age': 18.0, 'sex': 'male', 'bmi': 33.77, 'children': 1.0, 'smoker': 'no', 'region': 'southeast', 'charges': 1725.5523}, 2: {'age': 28.0, 'sex': 'male', 'bmi': 33.0, 'children': 3.0, 'smoker': 'no', 'region': 'southeast', 'charges': 4449.462}, 3: {'age': 33.0, 'sex': 'male', 'bmi': 22.705, 'children': 0.0, 'smoker': 'no', 'region': 'northwest', 'charges': 21984.47061}, 4: {'age': 32.0, 'sex': 'male', 'bmi': 28.88, 'children': 0.0, 'smoker': 'no', 'region': 'northwest', 'charges': 3866.8552}, 5: {'age': 31.0, 'sex': 'female', 'bmi': 25.74, 'children': 0.0, 'smoker': 'no', 'region': 'southeast', 'charges': 3756.6216}, 6: {'age': 46.0, 'sex': 'female', 'bmi': 33.44, 'children': 1.0, 'smoker': 'no', 'region': 'southeast', 'charges': 8240.5896}, 7: {'age': 37.0, 'sex': 'female', 'bmi': 27.74, 'children': 3.0, 'smoker': 'no', 'region': 'northwest', 'charges': 72

I will investiagate how many individuals are in each of the following age brackets and what their average insurance costs are:
* 0-10
* 11-20
* 21-30
* 31-40
* 41-50
* 51-60
* 61-70
* 71-80
* 80+


In [60]:
age_brackets = {
    '0-9':{},
    '10-19':{},
    '20-29':{},
    '30-39':{},
    '40-49':{},
    '50-59':{},
    '60-69':{},
    '70-79':{},
    '81+':{}
}

for id_num, info in insurance_dict.items():
    if info['age'] in range(0,10):
        age_brackets['0-9'][id_num] = info
    elif info['age'] in range(10,20):
        age_brackets['10-19'][id_num] = info
    elif info['age'] in range(20,30):
        age_brackets['20-29'][id_num] = info
    elif info['age'] in range(30,40):
        age_brackets['30-39'][id_num] = info
    elif info['age'] in range(40,50):
        age_brackets['40-49'][id_num] = info
    elif info['age'] in range(50,60):
        age_brackets['50-59'][id_num] = info
    elif info['age'] in range(60,70):
        age_brackets['60-69'][id_num] = info
    elif info['age'] in range(70,80):
        age_brackets['70-79'][id_num] = info
    else:
        age_brackets['80+'][id_num] = info

for age, ids in age_brackets.items():
    age_count = len(ids.values())
    total_cost = 0
    for individual_stats in ids.values():
        total_cost += individual_stats['charges']
    if age_count != 0:
        avg_cost = round(total_cost/age_count, 2)
    else:
        avg_cost = None
    print(f"The number of people in the age range {age} is: {age_count}\
    \nand their average insurance cost is: {avg_cost} dollars.\n")
    


The number of people in the age range 0-9 is: 0    
and their average insurance cost is: None dollars.

The number of people in the age range 10-19 is: 137    
and their average insurance cost is: 8407.35 dollars.

The number of people in the age range 20-29 is: 280    
and their average insurance cost is: 9561.75 dollars.

The number of people in the age range 30-39 is: 257    
and their average insurance cost is: 11738.78 dollars.

The number of people in the age range 40-49 is: 279    
and their average insurance cost is: 14399.2 dollars.

The number of people in the age range 50-59 is: 271    
and their average insurance cost is: 16495.23 dollars.

The number of people in the age range 60-69 is: 114    
and their average insurance cost is: 21248.02 dollars.

The number of people in the age range 70-79 is: 0    
and their average insurance cost is: None dollars.

The number of people in the age range 81+ is: 0    
and their average insurance cost is: None dollars.



Now I will look at how individuals are spread out across the country and what the average cost paid in each region is.

In [61]:
locations = {
    'southwest':{},
    'southeast':{},
    'northeast':{},
    'northwest':{}
}

for id_num, info in insurance_dict.items():
    if info['region'] == 'southwest':
        locations['southwest'][id_num] = info
    elif info['region'] == 'southeast':
        locations['southeast'][id_num] = info
    elif info['region'] == 'northeast':
        locations['northeast'][id_num] = info
    elif info['region'] == 'northwest':
        locations['northwest'][id_num] = info
        

for location, ids in locations.items():
    location_count = len(ids.values())
    total_cost = 0
    for individual_stats in ids.values():
        total_cost += individual_stats['charges']
    if location_count != 0:
        avg_cost = round(total_cost/location_count, 2)
    else:
        avg_cost = None
    print(f"The number of people in the {location} is: {location_count}\
    \nand their average insurance cost is: {avg_cost} dollars.\n")
    

The number of people in the southwest is: 325    
and their average insurance cost is: 12346.94 dollars.

The number of people in the southeast is: 364    
and their average insurance cost is: 14735.41 dollars.

The number of people in the northeast is: 324    
and their average insurance cost is: 13406.38 dollars.

The number of people in the northwest is: 325    
and their average insurance cost is: 12417.58 dollars.



Now I will investigate the price difference between the average smoker and non-smoker.

In [62]:
smokers = {
    'smokers':{},
    'non-smokers':{}
}

for id_num, info in insurance_dict.items():
    if info['smoker'] == 'yes':
        smokers['smokers'][id_num] = info
    elif info['smoker'] == 'no':
        smokers['non-smokers'][id_num] = info
        
for status, ids in smokers.items():
    smoker_count = len(ids.values())
    total_cost = 0
    for individual_stats in ids.values():
        total_cost += individual_stats['charges']
    if smoker_count != 0:
        avg_cost = round(total_cost/smoker_count, 2)
    else:
        avg_cost = None
    print(f"The number of peopler who are {status} is: {smoker_count}\
    \nand their average insurance cost is: {avg_cost} dollars.\n")
    

The number of peopler who are smokers is: 274    
and their average insurance cost is: 32050.23 dollars.

The number of peopler who are non-smokers is: 1064    
and their average insurance cost is: 8434.27 dollars.



Finally, I will look into how costs compare for people with 0, 1, 2, or 3+ children.

In [66]:
children = {
    '0':{},
    '1':{},
    '2':{},
    '3+':{},
}

for id_num, info in insurance_dict.items():
    if info['children'] == 0:
        children['0'][id_num] = info
    elif info['children'] == 1:
        children['1'][id_num] = info
    elif info['children'] == 2:
        children['2'][id_num] = info
    elif info['children'] >= 3:
        children['3+'][id_num] = info
    
        
for kid_num, ids in children.items():
    household_count = len(ids.values())
    total_cost = 0
    for individual_stats in ids.values():
        total_cost += individual_stats['charges']
    if household_count != 0:
        avg_cost = round(total_cost/household_count, 2)
    else:
        avg_cost = None
    if kid_num == '1':
        print(f"The number of peopler who have {kid_num} child is: {household_count}\
        \nand their average insurance cost is: {avg_cost} dollars.\n")
    else:
        print(f"The number of peopler who have {kid_num} children is: {household_count}\
        \nand their average insurance cost is: {avg_cost} dollars.\n")

The number of peopler who have 0 children is: 574        
and their average insurance cost is: 12365.98 dollars.

The number of peopler who have 1 child is: 324        
and their average insurance cost is: 12731.17 dollars.

The number of peopler who have 2 children is: 240        
and their average insurance cost is: 15073.56 dollars.

The number of peopler who have 3+ children is: 200        
and their average insurance cost is: 14576.0 dollars.

