# U.S. Medical Insurance Costs

## Given a .csv of U.S. Medical Insurance Costs can we:

1. Examine the average cost of insurance?
    * Which region has the highest cost?
    * Which region has the lowest cost?
    * What's the highest overall cost?
    * What's the lowest overall cost? 
    
    <br>
2. Examine the average age of insurance holders.
    * What's the highest age overall?
    * What's the youngest age overall?
    * Is there an oldest region?
    * Is there a youngest region?
    
    <br>
3. Examine the BMI of the insurance holders.
    * What is the average BMI?
    * Is there a region with a particularly high overall BMI?
    * Is there a region with a particularly low overall BMI?
    
    <br>
4. Examine the impact of smoking status on cost.
    * Does smoking mean that insurance cost is higher?
    
    <br>
5. Examine the impact of having children on cost.
    * Does cost increase with the number of children?
    
    <br>
6. What's the average age of insurance holders' with at least one child?

In [10]:
import csv

In [11]:
# Create variables (as lists) for the data contained in the .csv file
age = []
sex = []
bmi = []
num_of_children = []
smoking_status = []
region = []
charges = []

In [12]:
# To get the data contained in the .csv file into the list variables, a function will be created that
# imports the .csv file and saves it using the csv module and the DictReader method.

def list_to_data(lst, csv_file, column_name):
    # import and open the .csv file
    with open(csv_file) as csv_info:
        csv_dict = csv.DictReader(csv_info)
        # Loop through the rows in the csv_dict and assign them to the list varibles
        for row in csv_dict:
            lst.append(row[column_name])
        #return the final list
        return lst

In order to get the information stored in the Insurance.csv file into the previously defined variables, we will
call the `list_to_data` method on the list, the .csv file and title the column to match the list.

In [13]:
ages = list_to_data(age, 'insurance.csv', 'age')
sexes = list_to_data(sex, 'insurance.csv', 'sex')
bmis = list_to_data(bmi, 'insurance.csv', 'bmi')
num_of_children = list_to_data(num_of_children, 'insurance.csv', 'children')
smoking_status = list_to_data(smoking_status, 'insurance.csv', 'smoker')
regions = list_to_data(region, 'insurance.csv', 'region')
cost = list_to_data(charges, 'insurance.csv', 'charges')


We now have organized, labeled lists of data to analyze. <br>

In order to analyze the data collected, we will need to build out functions that will perform the analysis on the lists and return the desired value - either an average or a total.
* Analyze cost
    * return the average cost of insurance
    * return the average cost per region
    * return the highest and lowest cost
* Analyze age
    * return the average age of an insurance holder
    * return the average age per region
    * return the highest and lowest ages
* Analyze BMI
    * return the average overall BMI 
    * return the average BMI per region
* Analyze smoking_status
    * return how many smokers vs non-smokers
    * return the cost difference between smokers vs non-smokers
* Analyze children
    * return the average cost of insurance as number of children increases 
    * return the average age of an insurance holder who has at least one child
     <br>

Since a good deal of what we want to look at are averages (average cost, average age, average BMI, etc.) a helper function would be useful. We could then use that function to determine the averages and save them to their own variables for later use if necessary.

Another helper function that may be useful is one that will return the highest and lowest data points in a list. 
These returned values can also be stored for later use if necessary.

A class may also be useful in analyzing some of the other data. 


In [14]:
# A helper function to find the average would be incredibly helpful since we want to see a lot of averages
# and creating multiple functions that essential do the same thing is repetitive

def find_average(lst):
    # Create a running total
    total = 0
    # iterate through every item in the list
    for item in lst:
        # add that item to total as a float 
        total += float(item)
    # divide the total by the length of the list and return it to find the average, rounded to the second decimal
    return round(total / len(lst), 2)


In [15]:
# We can then use this function to save the averages to variables with corresponding names for later use 
# if necessary.
avg_cost = find_average(cost)
avg_age = find_average(ages)
avg_bmi = find_average(bmis)

print("The average cost of insurance is ${avg_cost}.".format(avg_cost=avg_cost))
print("The average age of insurance holders per our data is {avg_age} years old.".format(avg_age=avg_age))
print("The average BMI of insurance holders per our data is {avg_bmi}.".format(avg_bmi=avg_bmi))

The average cost of insurance is $13270.42.
The average age of insurance holders per our data is 39.21 years old.
The average BMI of insurance holders per our data is 30.66.


In [16]:
# The second helper function will determine the highest and lowest values in a given list
def find_high_low(lst):
    # set two variables to track our highest and lowest values
    highest = float('-inf')
    lowest = float('inf')
    # iterate through the list
    for item in lst:
        # if the item is higher than our highest variable, we assign that item to it
        if float(item) > float(highest):
            highest = item
        if float(item) < float(lowest):
            lowest = item
    # finally we return our variables
    return highest, lowest


In [17]:
# The `find_high_low` function can be used to save the highest and lowest values into variables with names corresponding
# to the list passed for later use if necessary

highest_lowest_cost = find_high_low(cost)
highest_lowest_age = find_high_low(ages)
highest_lowest_bmi = find_high_low(bmis)

print("The highest and lowest cost of insurance are: {highest_lowest_cost}".format(highest_lowest_cost=highest_lowest_cost))
print("The oldest and youngest insurance holders are: {highest_lowest_age}".format(highest_lowest_age=highest_lowest_age))
print("The highest and lowest BMIs are: {highest_lowest_bmi}".format(highest_lowest_bmi=highest_lowest_bmi))

The highest and lowest cost of insurance are: ('63770.42801', '1121.8739')
The oldest and youngest insurance holders are: ('64', '18')
The highest and lowest BMIs are: ('53.13', '15.96')


Before we build our class that will handle the rest of the analysis, we want to have a dictionary that will hold the bulk of our data. The key for this dictionary will be a number assigned from 0 to the length of the file. The value will be a list containing the pertinent data (i.e. age, sex, bmi, etc.).

In [None]:
# A Class will be built that takes in information of the Patient (our seven data lists)
class Patient_Info:
    def __init__(self, age_list, sex_list, bmi_list, children_list, smoking_list, region_list, cost_list):
        self.age = age_list
        self.sex = sex_list
        self.bmi = bmi_list
        self.children = children_list
        self.smoking_status = smoking_list
        self.regions = region_list
        self.cost = cost_list

    

Patient = Patient_Info(ages, sexes, bmis, num_of_children, smoking_status, regions, cost)