# U.S. Medical Insurance Costs
#### The goals of this project are: 

- Find out the average BMI of patients.
- What percentage of patients are smokers.
    - How do insurance costs differ between smokers and non-smokers.
- How do insurance costs differ between people with and without children.

In [10]:
import csv

First, filter out and organize the data in order to fulfill the goals of this project. 
1. Create a ``BMI``list to find ``average_bmi``. 
2. Create a ``smoker`` list to find ``percent_smokers``. 
    2. Create two lists, ``ins_costs_smokers``and ``ins_costs_nons`` in order to find the average insurance costs of people who smoke and people who don't.
3. Create two lists, ``ins_costs_with``and ``ins_costs_without`` in order to find the average insurance costs of people with and without children. 

In [11]:
bmis = []
smoker = []
insurance_costs = []
ins_costs_smokers = []
ins_costs_nons = []
ins_costs_with = []
ins_costs_without = []

Now, that the lists have been created, populate the lists from csv file ``insurance.csv``.

In [12]:
with open("insurance.csv") as insurance_data:
    insurance_data_dict = csv.DictReader(insurance_data)
    for row in insurance_data_dict:
        bmis.append(row["bmi"])
        smoker.append(row["smoker"])
        insurance_costs.append(row["charges"])
        if row["children"] == "children":
            continue
        elif row["children"] == "0":
            ins_costs_without.append(row["charges"])
        else:
            ins_costs_with.append(row["charges"])
        if row["smoker"] == "yes":
            ins_costs_smokers.append(row["charges"])
        else:
            ins_costs_nons.append(row["charges"])

Now that the data has been organized in lists, it is ready for analysis. To analyze the data, I will create the following methods:
- ``calc_average_bmi`` to find the ``average_bmi``.
- ``calc_percent_smokers`` to find the ``percent_smokers``.
- ``compare_costs_smokers`` find the insurance cost difference between people who smoke and people who don't.
- ``compare_costs_children``to find the insurance cost difference between ``ins_costs_with`` and ``ins_costs_without``.

In [52]:
def calc_average_bmi(bmis):
    total = 0
    for bmi in bmis:
        total += float(bmi)
    average_bmi = total/len(bmis)
    return "The average BMI of this dataset of US patients is {:.2f}".format(average_bmi)

def calc_percent_smokers(smokers_list):
    people_who_smoke = 0
    for smoker in smokers_list:
        if smoker == "yes":
            people_who_smoke += 1
    percent_smokers =  people_who_smoke / len(smokers_list)*100
    return "{:.2f}% of the patients in this dataset smoke.".format(percent_smokers)

def compare_costs_smokers(data_s, data_n):
    total_s = 0
    for i in data_s:
        total_s += float(i)
    average_smoker = total_s/len(data_s)
    
    total_n = 0
    for i in data_n:
        total_n += float(i)
    average_nonsmoker = total_n/len(data_n)
    
    if average_smoker > average_nonsmoker:
        difference = average_smoker - average_nonsmoker 
        return "In this data set, the insurance cost for smokers was higher by ${:.2f}".format(difference)
    else:
        difference = average_nonsmoker - average_smoker
        return "In this data set, the insurance cost for non_smokers was higher by ${:.2f}".format(difference)

def compare_costs_children(ins_costs_with, ins_costs_without):
    total_w = 0
    for i in ins_costs_with:
        total_w += float(i)
    average_cost_w_children = total_w/len(ins_costs_with)
    
    total_wo = 0
    for i in ins_costs_without:
        total_wo += float(i)
    average_cost_wo_children = total_wo/len(ins_costs_with)
    
    if average_cost_w_children > average_cost_wo_children:
        difference = average_cost_w_children - average_cost_wo_children 
        return "In this data set, the insurance cost for people with children was higher by ${:.2f}".format(difference)
    else:
        difference = average_nonsmoker - average_smoker
        return "In this data set, the insurance cost for people without children was higher by ${:.2f}".format(difference)

#### Now, test the function on the lists to retrieve the analysis:

In [54]:
average_bmi = calc_average_bmi(bmis)
print(average_bmi)

percent_smokers = calc_percent_smokers(smoker)
print(percent_smokers)

cost_comparison_smokers = compare_costs_smokers(ins_costs_smokers, ins_costs_nons)
print(cost_comparison_smokers)

cost_comparison_children = compare_costs_children(ins_costs_with, ins_costs_without)
print(cost_comparison_children)

The average BMI of this dataset of US patients is 30.66
20.48% of the patients in this dataset smoke.
In this data set, the insurance cost for smokers was higher by $23615.96
In this data set, the insurance cost for people with children was higher by $4659.27
