# U.S. Medical Insurance Costs

In this document, we will be examining a dataset that contains insurance cost information for individuals in various American regions. Our goal is to find the average insurance cost for each region.

First, we will import the csv module and the relevant file of data, and assemble the data into an easy-to-read list of dictionary items.

In [2]:
import csv
data_rows = []
with open('insurance.csv') as file:
    csv_reader = csv.DictReader(file)
    for row in csv_reader:
        data_rows.append(row)


Looking at the data, each row contains 7 categories: **age, sex, bmi, children, smoker, region, and charges**.

To make the data simpler, we will put each category into its own list.

In [3]:
age = []
sex = []
bmi = []
children = []
smoker = []
region = []
charges = []

for row in data_rows:
    age.append(row['age'])
    sex.append(row['sex'])
    bmi.append(row['bmi'])
    children.append(row['children'])
    smoker.append(row['smoker'])
    region.append(row['region'])
    charges.append(row['charges'])
    

Now that we have the relevant dated sorted. We want to again ask our research question:

(1) ***What region had the highest average costs?***

To figure this out, we first need to find out what the unique regions are in the data, and after that, we must create separate lists for each region. 

In [4]:
#Find unique regions
myset = set(region)
print(myset)

#Create separate lists for each region
southwest_data = []
northwest_data = []
northeast_data = []
southeast_data = []

{'southwest', 'northwest', 'southeast', 'northeast'}


Now, we need to fill our lists with the appropriate data.

In [5]:
#Populate the lists
for entry in data_rows:
    if entry['region'] == 'southwest':
        southwest_data.append(entry)
    elif entry['region'] == 'northwest':
        northwest_data.append(entry)
    elif entry['region'] == 'northeast':
        northeast_data.append(entry)
    elif entry['region'] == 'southeast':
        southeast_data.append(entry)
    else:
        print('Region not found!')
        

We now have all the correct data in the respective places, so we can go through each of the above lists and find the average cost for each region.

In [6]:
#Average cost for insurance in the southwest
total_southwest_costs = 0
for entry in southwest_data:
    entry_cost = float(entry['charges'])
    total_southwest_costs += entry_cost

southwest_average_cost = round(total_southwest_costs / len(southwest_data))

print("The average insurance costs in the southwest region are {southwest_average_cost} dollars.".format(southwest_average_cost=southwest_average_cost))

The average insurance costs in the southwest region are 12347 dollars.


In [7]:
#Average cost for insurance in the northwest
total_northwest_costs = 0
for entry in northwest_data:
    entry_cost = float(entry['charges'])
    total_northwest_costs += entry_cost

northwest_average_cost = round(total_northwest_costs / len(northwest_data))

print("The average insurance costs in the northwest region are {northwest_average_cost} dollars.".format(northwest_average_cost=northwest_average_cost))

The average insurance costs in the northwest region are 12418 dollars.


In [10]:
#Average cost for insurance in the northeast
total_northeast_costs = 0
for entry in northeast_data:
    entry_cost = float(entry['charges'])
    total_northeast_costs += entry_cost

northeast_average_cost = round(total_northeast_costs / len(northeast_data))

print("The average insurance costs in the northeast region are {northeast_average_cost} dollars.".format(northeast_average_cost=northeast_average_cost))

The average insurance costs in the northeast region are 13406 dollars.


In [9]:
#Average cost for insurance in the southeast
total_southeast_costs = 0
for entry in southeast_data:
    entry_cost = float(entry['charges'])
    total_southeast_costs += entry_cost

southeast_average_cost = round(total_southeast_costs / len(southeast_data))

print("The average insurance costs in the southeast region are {southeast_average_cost} dollars.".format(southeast_average_cost=southeast_average_cost))

The average insurance costs in the southeast region are 14735 dollars.
