# U.S. Medical Insurance Costs

## Overview
The database provided contains information on individuals purchasing insurance through company X. The data gathered includes the following:
- Age
- Sex (Binary-normative)
- BMI
- Number of Children
- Smoking/Nonsmoking status
- Region
- Insurance Cost

I've come up with the following five questions to probe the data for:

* How does the average bmi compare across the different regions? Where do the healthiest men and women live?
* Which region smokes the most? How does their average BMI relate to the number of smokers? 
* Which region pays the lowest cost on average? How does it compare to the national average?
* Rank the attributes from those that affect costs the most, to those that affect costs the least.
* Which is the youngest region?
* What are some of the issues with this data set? What information could be gathered to improve analysis?


## Initialization

In [1]:
import csv


In [2]:
#test area


## Question One:
### How does the average bmi compare across the different regions? Where do the healthiest men and women live?

First I'll break up the data, compiling lists of each region's bmi:

In [3]:
sw_bmi_list = []
nw_bmi_list = []
se_bmi_list = []
ne_bmi_list = []

with open('insurance.csv') as ins_csv:
    ins_reader = csv.DictReader(ins_csv)
    for row in ins_reader:
        if row['region'] == 'southwest':
            sw_bmi_list.append(row['bmi'])
        elif row['region'] == 'northwest':
            nw_bmi_list.append(row['bmi'])
        elif row['region'] == 'southeast':
            se_bmi_list.append(row['bmi'])
        elif row['region'] == 'northeast':
            ne_bmi_list.append(row['bmi'])
        else:
            pass

Next, I'll create a function to get the average bmi:

In [4]:
def bmi_avg(list):
    total_bmi = 0.0
    for i in list:
        total_bmi += float(i)
    avg_bmi = round(total_bmi/len(list), 2)
    
    return avg_bmi

Finally, let's get our averages for each region

In [5]:
avg_bmi_dict = {
    'southwest': bmi_avg(sw_bmi_list),
    'northwest': bmi_avg(nw_bmi_list),
    'southeast': bmi_avg(se_bmi_list),
    'northeast': bmi_avg(ne_bmi_list),
    'entire us': (
        (bmi_avg(sw_bmi_list)+bmi_avg(nw_bmi_list)+bmi_avg(se_bmi_list)+bmi_avg(ne_bmi_list))/4
    )
}
                
for key in avg_bmi_dict:
    print("The average bmi for the {key} is: {value}".format(key=key, value=avg_bmi_dict[key]))

The average bmi for the southwest is: 30.6
The average bmi for the northwest is: 29.2
The average bmi for the southeast is: 33.36
The average bmi for the northeast is: 29.17
The average bmi for the entire us is: 30.5825


### What does this tell us?
First, let's acknowledge that bmi is inherently flawed. There are a litany of examples of why this is, but explaining the why is out of this project's scope. With that out of the way, we see that the Northeast region is the healthiest of the regions, while the Southeast is the least healthy. It would be interesting to compare this to economic or environmental data for each region and see if it tracks to bmi.

### What else can we do with this data?
Getting the average bmi across each region is a bit broad. Now that I've worked with the data a bit, I'd like to find data on more specific demographics.

We can create a class for each region that would break bmi data down by sex, age group, smoking status, and how many children they have:

In [6]:
class BmiBreakdown:
    def __init__(self, dict):
        self.under_30_bmi_lst = []
        self.over_29_bmi_lst = []
        self.female_bmi_lst = []
        self.male_bmi_lst = []
        #Parents will be segregated by parenthood status as well as number of children (1, 2, and >=3)
        self.nonparent_bmi_lst = []
        self.parent_bmi_lst = []
        self.parent_1_bmi_lst = []
        self.parent_2_bmi_lst = []
        self.parent_3up_bmi_lst = []
        self.smoker_bmi_lst = []
        self.nonsmoker_bmi_lst = []
        self.population = len(dict)
        self.total_bmi_lst = []
        for key in dict:
            #BMI based on age
            for value in dict[key]:
                self.total_bmi_lst.append(dict[key]['bmi'])
                if value == 'age':
                    if int(dict[key]['age']) < 30:
                        self.under_30_bmi_lst.append(dict[key]['bmi'])
                    elif int(dict[key]['age']) >= 30:
                        self.over_29_bmi_lst.append(dict[key]['bmi'])
            #BMI based on sex
                elif value == 'sex':
                    if dict[key]['sex'] == 'female':
                        self.female_bmi_lst.append(dict[key]['bmi'])
                    elif dict[key]['sex'] == 'male':
                        self.male_bmi_lst.append(dict[key]['bmi'])
            #BMI based on parent status
                elif value == 'children':
                    if int(dict[key]['children']) > 0:
                        self.parent_bmi_lst.append(dict[key]['bmi'])
                    #BMI based on number of children    
                        if int(dict[key]['children']) == 1:
                            self.parent_1_bmi_lst.append(dict[key]['bmi'])
                        elif int(dict[key]['children']) == 2:
                            self.parent_2_bmi_lst.append(dict[key]['bmi'])
                        elif int(dict[key]['children']) >= 3:
                            self.parent_3up_bmi_lst.append(dict[key]['bmi'])       
                    else:
                        self.nonparent_bmi_lst.append(dict[key]['bmi'])
                #BMI based on smoking status
                elif value == 'smoker':
                    if dict[key]['smoker'] == 'yes':
                        self.smoker_bmi_lst.append(dict[key]['bmi'])
                    else:
                        self.nonsmoker_bmi_lst.append(dict[key]['bmi'])
                else:
                    pass
                
    def avg_bmi(self):
        total_bmi = 0.0
        for i in self.total_bmi_lst:
            total_bmi += float(i)
        avg_bmi = total_bmi / len(self.total_bmi_lst)
        return round(avg_bmi, 2)

#Methods based on age:
    def avg_under_30(self):
        total_bmi = 0.0
        for i in self.under_30_bmi_lst:
            total_bmi += float(i)
        avg_bmi = total_bmi / len(self.under_30_bmi_lst)
        return round(avg_bmi, 2)

    def avg_over_29(self):
        total_bmi = 0.0
        for i in self.over_29_bmi_lst:
            total_bmi += float(i)
        avg_bmi = total_bmi / len(self.over_29_bmi_lst)
        return round(avg_bmi, 2)
    
#Methods based on sex:
    def avg_female(self):
        total_bmi = 0.0
        for i in self.female_bmi_lst:
            total_bmi += float(i)
        avg_bmi = total_bmi / len(self.female_bmi_lst)
        return round(avg_bmi, 2)
    
    def avg_male(self):
        total_bmi = 0.0
        for i in self.male_bmi_lst:
            total_bmi += float(i)
        avg_bmi = total_bmi / len(self.male_bmi_lst)
        return round(avg_bmi, 2)
    
#Methods based on parental status:
    def avg_nonparent(self):
        total_bmi = 0.0
        for i in self.nonparent_bmi_lst:
            total_bmi += float(i)
        avg_bmi = total_bmi / len(self.nonparent_bmi_lst)
        return round(avg_bmi, 2)
    
    def avg_parent(self):
        total_bmi = 0.0
        for i in self.parent_bmi_lst:
            total_bmi += float(i)
        avg_bmi = total_bmi / len(self.parent_bmi_lst)
        return round(avg_bmi, 2)

#Parents broken down into number of children
    def avg_parent_1(self):
        total_bmi = 0.0
        for i in self.parent_1_bmi_lst:
            total_bmi += float(i)
        avg_bmi = total_bmi / len(self.parent_1_bmi_lst)
        return round(avg_bmi, 2)
    
    def avg_parent_2(self):
        total_bmi = 0.0
        for i in self.parent_2_bmi_lst:
            total_bmi += float(i)
        avg_bmi = total_bmi / len(self.parent_2_bmi_lst)
        return round(avg_bmi, 2)

    def avg_parent_3up(self):
        total_bmi = 0.0
        for i in self.parent_3up_bmi_lst:
            total_bmi += float(i)
        avg_bmi = total_bmi / len(self.parent_3up_bmi_lst)
        return round(avg_bmi, 2)
    
#Methods based on Smoking status:
    def avg_smoker(self):
        total_bmi = 0.0
        for i in self.smoker_bmi_lst:
            total_bmi += float(i)
        avg_bmi = total_bmi / len(self.smoker_bmi_lst)
        return round(avg_bmi, 2)
    
    def avg_nonsmoker(self):
        total_bmi = 0.0
        for i in self.nonsmoker_bmi_lst:
            total_bmi += float(i)
        avg_bmi = total_bmi / len(self.nonsmoker_bmi_lst)
        return round(avg_bmi, 2)

### Dictionary Creation
The Class I've created's constructor requires a dictionary as an argument, so let's break the data down into four dictionaries, one for each region:

In [7]:
sw_dict = {}
nw_dict = {}
se_dict = {}
ne_dict = {}

with open('insurance.csv') as ins_csv:
    primary_key = 0
    ins_reader = csv.DictReader(ins_csv)
    for row in ins_reader:
        if row['region'] == 'southwest':
            sw_dict[primary_key] = row
            primary_key += 1
        elif row['region'] == 'northwest':
            nw_dict[primary_key] = row
            primary_key += 1
        elif row['region'] == 'southeast':
            se_dict[primary_key] = row
            primary_key += 1
        elif row['region'] == 'northeast':
            ne_dict[primary_key] = row
            primary_key += 1
        else:
            pass

## Using the BmiBreakdown class

Let's probe the data a bit, see what we come up with. First, create our objects:

In [8]:
sw_bmi_breakdown = BmiBreakdown(sw_dict)
nw_bmi_breakdown = BmiBreakdown(nw_dict)
se_bmi_breakdown = BmiBreakdown(se_dict)
ne_bmi_breakdown = BmiBreakdown(ne_dict)



### Question 1.a:
How fit is each regions under 30 population? How does it compare to those 30 and older? How does each demographic compare to the mean BMI of the region?

In [9]:
under_30_avg_bmi_dict = {
    'southwest': sw_bmi_breakdown.avg_under_30(),
    'northwest': nw_bmi_breakdown.avg_under_30(),
    'southeast': se_bmi_breakdown.avg_under_30(),
    'northeast': ne_bmi_breakdown.avg_under_30()
}

over_29_avg_bmi_dict = {
    'southwest': sw_bmi_breakdown.avg_over_29(),
    'northwest': nw_bmi_breakdown.avg_over_29(),
    'southeast': se_bmi_breakdown.avg_over_29(),
    'northeast': ne_bmi_breakdown.avg_over_29()
}

for key in under_30_avg_bmi_dict:
    print("The average bmi of people under 30 in the " + key + " is: " + str(under_30_avg_bmi_dict[key]))
    print("The average bmi of over the age of 29 in the " + key + " is: " + str(over_29_avg_bmi_dict[key]))
    print("The average bmi of the region is: " + str(avg_bmi_dict[key]))
    print("The average bmi of the entire US is: " + str(avg_bmi_dict['entire us']))
    print('')


The average bmi of people under 30 in the southwest is: 29.08
The average bmi of over the age of 29 in the southwest is: 31.26
The average bmi of the region is: 30.6
The average bmi of the entire US is: 30.5825

The average bmi of people under 30 in the northwest is: 28.54
The average bmi of over the age of 29 in the northwest is: 29.5
The average bmi of the region is: 29.2
The average bmi of the entire US is: 30.5825

The average bmi of people under 30 in the southeast is: 33.3
The average bmi of over the age of 29 in the southeast is: 33.38
The average bmi of the region is: 33.36
The average bmi of the entire US is: 30.5825

The average bmi of people under 30 in the northeast is: 28.0
The average bmi of over the age of 29 in the northeast is: 29.71
The average bmi of the region is: 29.17
The average bmi of the entire US is: 30.5825



### Take-away
- While it may seem obvious that younger people will, on average, have a lower BMI than their older counterparts, we see that in one region, the South East, this is just barely the situation. \n
 
#### A note: 
- After completing question 1.a, I think my outputs are a bit verbose and difficult to immediately grasp. Going forward I'm going to assign a *distance from local mean* and *distance to national mean* variable to each demographic.

### Question 1.b
How many smokers are in each region? How does their BMI compare to nonsmokers?

In [10]:
#create dictionaries, one key per region, each key is assigned three values: number of smokers/nonsmokers, avg bmi of smokers/nonsmokers, and total population of the region
smoker_bmi_dict = {
    'southwest': {
        "smoker_pop": len(sw_bmi_breakdown.smoker_bmi_lst), 
        "avg_bmi": sw_bmi_breakdown.avg_smoker(), 
        "region_pop": sw_bmi_breakdown.population
        }, 
    'northwest': {
        "smoker_pop": len(nw_bmi_breakdown.smoker_bmi_lst), 
        "avg_bmi": nw_bmi_breakdown.avg_smoker(), 
        "region_pop": nw_bmi_breakdown.population
        },
    'southeast': {
        "smoker_pop": len(se_bmi_breakdown.smoker_bmi_lst), 
        "avg_bmi": se_bmi_breakdown.avg_smoker(), 
        "region_pop": se_bmi_breakdown.population
        },
    'northeast': {
        "smoker_pop": len(ne_bmi_breakdown.smoker_bmi_lst), 
        "avg_bmi": ne_bmi_breakdown.avg_smoker(), 
        "region_pop": ne_bmi_breakdown.population
        }
    }


nonsmoker_bmi_dict = {
    'southwest': {
        "nonsmoker_pop": len(sw_bmi_breakdown.nonsmoker_bmi_lst), 
        "avg_bmi": sw_bmi_breakdown.avg_nonsmoker(), 
        "region_pop": nw_bmi_breakdown.population
        },
    'northwest': {
        "nonsmoker_pop": len(nw_bmi_breakdown.nonsmoker_bmi_lst), 
        "avg_bmi": nw_bmi_breakdown.avg_nonsmoker(), 
        "region_pop": nw_bmi_breakdown.population
        },
    'southeast': {
        "nonsmoker_pop": len(se_bmi_breakdown.nonsmoker_bmi_lst), 
        "avg_bmi": se_bmi_breakdown.avg_nonsmoker(), 
        "region_pop": se_bmi_breakdown.population
        },
    'northeast': {
        "nonsmoker_pop": len(ne_bmi_breakdown.nonsmoker_bmi_lst), 
        "avg_bmi": ne_bmi_breakdown.avg_nonsmoker(), 
        "region_pop": ne_bmi_breakdown.population
        }
    }

#output the smoker results
for key in smoker_bmi_dict:
    print("There are {num} smokers in the {region} ({pct}% of the regional population)".format(
        num=smoker_bmi_dict[key]["smoker_pop"], region=key, pct=round(smoker_bmi_dict[key]["smoker_pop"]/smoker_bmi_dict[key]["region_pop"]*100, 2)
        )
    )
    print("Their average bmi is {bmi}".format(bmi=smoker_bmi_dict[key]["avg_bmi"]))
    #Compare smoker results to the regional mean
    if avg_bmi_dict[key] > smoker_bmi_dict[key]["avg_bmi"]:
        print("This is {diff} less than the regional mean bmi of {avg}".format(diff=round(abs(avg_bmi_dict[key]-smoker_bmi_dict[key]["avg_bmi"]),2), avg=avg_bmi_dict[key]))
    elif avg_bmi_dict[key] < smoker_bmi_dict[key]["avg_bmi"]:
        print("This is {diff} higher than the regional mean bmi of {avg}".format(diff=round(abs(avg_bmi_dict[key]-smoker_bmi_dict[key]["avg_bmi"]),2), avg=avg_bmi_dict[key]))
    #Compare smoker results to the national mean
    if avg_bmi_dict['entire us'] > smoker_bmi_dict[key]["avg_bmi"]:
        print("This is {diff} less than the national mean bmi of {avg}".format(diff=round(abs(avg_bmi_dict['entire us']-smoker_bmi_dict[key]["avg_bmi"]),2), avg=avg_bmi_dict['entire us']))
    elif avg_bmi_dict['entire us'] < smoker_bmi_dict[key]["avg_bmi"]:
        print("This is {diff} higher than the national mean bmi of {avg}".format(diff=round(abs(avg_bmi_dict['entire us']-smoker_bmi_dict[key]["avg_bmi"]),2), avg=avg_bmi_dict['entire us']))
    else:
        print("This is about the same BMI as the national average.")
    print("")

#output the nonsmoker results
    print("There are {num} nonsmokers in the {region} ({pct}% of the population)".format(
        num=nonsmoker_bmi_dict[key]["nonsmoker_pop"], region=key, pct=round(nonsmoker_bmi_dict[key]["nonsmoker_pop"]/nonsmoker_bmi_dict[key]["region_pop"]*100, 2)
        )
    )
    print("Their average bmi is {bmi}".format(bmi=nonsmoker_bmi_dict[key]["avg_bmi"]))
    #Compare nonsmoker results to the regional mean
    if avg_bmi_dict[key] > nonsmoker_bmi_dict[key]["avg_bmi"]:
        print("This is {diff} less than the regional mean bmi of {avg}".format(diff=round(abs(avg_bmi_dict[key]-nonsmoker_bmi_dict[key]["avg_bmi"]),2), avg=avg_bmi_dict[key]))
    elif avg_bmi_dict[key] < nonsmoker_bmi_dict[key]["avg_bmi"]:
        print("This is {diff} higher than the regional mean bmi of {avg}".format(diff=round(abs(avg_bmi_dict[key]-nonsmoker_bmi_dict[key]["avg_bmi"]),2), avg=avg_bmi_dict[key]))
    #Compare nonsmoker results to the national mean
    if avg_bmi_dict['entire us'] > nonsmoker_bmi_dict[key]["avg_bmi"]:
        print("This is {diff} less than the national mean bmi of {avg}".format(diff=round(abs(avg_bmi_dict['entire us']-nonsmoker_bmi_dict[key]["avg_bmi"]),2), avg=avg_bmi_dict['entire us']))
    elif avg_bmi_dict['entire us'] < nonsmoker_bmi_dict[key]["avg_bmi"]:
        print("This is {diff} higher than the national mean bmi of {avg}".format(diff=round(abs(avg_bmi_dict['entire us']-nonsmoker_bmi_dict[key]["avg_bmi"]),2), avg=avg_bmi_dict['entire us']))
    else:
        print("This is about the same BMI as the national average.")
    print("\n \n")
    
print("END")


There are 58 smokers in the southwest (17.85% of the regional population)
Their average bmi is 31.01
This is 0.41 higher than the regional mean bmi of 30.6
This is 0.43 higher than the national mean bmi of 30.5825

There are 267 nonsmokers in the southwest (82.15% of the population)
Their average bmi is 30.51
This is 0.09 less than the regional mean bmi of 30.6
This is 0.07 less than the national mean bmi of 30.5825

 

There are 58 smokers in the northwest (17.85% of the regional population)
Their average bmi is 29.14
This is 0.06 less than the regional mean bmi of 29.2
This is 1.44 less than the national mean bmi of 30.5825

There are 267 nonsmokers in the northwest (82.15% of the population)
Their average bmi is 29.21
This is 0.01 higher than the regional mean bmi of 29.2
This is 1.37 less than the national mean bmi of 30.5825

 

There are 91 smokers in the southeast (25.0% of the regional population)
Their average bmi is 33.1
This is 0.26 less than the regional mean bmi of 33.36
T

### Take Aways:
- The East smokes more than the West
- Smoking status correlates with lower BMI in the NW, SE and NE
    - Follow up question: why? I would love to cross reference this with:
        - regional grocery store data
        - economic data (wealth, income, predominant job sectors)
        - legal data (smoking laws)
        - average population density (like, the average walkability score of households)

### Question 2:
Which region pays the lowest cost on average? How does it compare to the national average?

I will re-jigger the BmiBreakdown class to a ChargeBreakdown class:

In [11]:
class ChargesBreakdown:
    def __init__(self, dict):
        self.all_charges = []
        self.under_30_charges_lst = []
        self.over_29_charges_lst = []
        self.female_charges_lst = []
        self.male_charges_lst = []
        #Parents will be segregated by parenthood status as well as number of children (1, 2, and >=3)
        self.nonparent_charges_lst = []
        self.parent_charges_lst = []
        self.parent_1_charges_lst = []
        self.parent_2_charges_lst = []
        self.parent_3up_charges_lst = []
        self.smoker_charges_lst = []
        self.nonsmoker_charges_lst = []
        self.population = len(dict)
        for key in dict:
            #charges based on age
            for value in dict[key]:
                if value == 'age':
                    if int(dict[key]['age']) < 30:
                        self.under_30_charges_lst.append(dict[key]['charges'])
                    elif int(dict[key]['age']) >= 30:
                        self.over_29_charges_lst.append(dict[key]['charges'])
            #charges based on sex
                elif value == 'sex':
                    if dict[key]['sex'] == 'female':
                        self.female_charges_lst.append(dict[key]['charges'])
                    elif dict[key]['sex'] == 'male':
                        self.male_charges_lst.append(dict[key]['charges'])
            #charges based on parent status
                elif value == 'children':
                    if int(dict[key]['children']) > 0:
                        self.parent_charges_lst.append(dict[key]['charges'])
                    #BMI based on number of children    
                        if int(dict[key]['children']) == 1:
                            self.parent_1_charges_lst.append(dict[key]['charges'])
                        elif int(dict[key]['children']) == 2:
                            self.parent_2_charges_lst.append(dict[key]['charges'])
                        elif int(dict[key]['children']) >= 3:
                            self.parent_3up_charges_lst.append(dict[key]['charges'])       
                    else:
                        self.nonparent_charges_lst.append(dict[key]['charges'])
                #charges based on smoking status
                elif value == 'smoker':
                    if dict[key]['smoker'] == 'yes':
                        self.smoker_charges_lst.append(dict[key]['charges'])
                    else:
                        self.nonsmoker_charges_lst.append(dict[key]['charges'])
                elif value == 'charges':
                    self.all_charges.append(dict[key][value])
                else:
                    pass

#average charges method:
    def avg_charges(self):
        total_charges = 0.0
        for i in self.all_charges:
            total_charges += float(i)
        avg_charges = total_charges / len(self.all_charges)
        return round(avg_charges, 2)

#Methods based on age:
    def avg_under_30(self):
        total_charges = 0.0
        for i in self.under_30_charges_lst:
            total_charges += float(i)
        avg_charges = total_charges / len(self.under_30_charges_lst)
        return round(avg_charges, 2)

    def avg_over_29(self):
        total_charges = 0.0
        for i in self.over_29_charges_lst:
            total_charges += float(i)
        avg_charges = total_charges / len(self.over_29_charges_lst)
        return round(avg_charges, 2)
    
#Methods based on sex:
    def avg_female(self):
        total_charges = 0.0
        for i in self.female_charges_lst:
            total_charges += float(i)
        avg_charges = total_charges / len(self.female_charges_lst)
        return round(avg_charges, 2)
    
    def avg_male(self):
        total_charges = 0.0
        for i in self.male_charges_lst:
            total_charges += float(i)
        avg_charges = total_charges / len(self.male_charges_lst)
        return round(avg_charges, 2)
    
#Methods based on parental status:
    def avg_nonparent(self):
        total_charges = 0.0
        for i in self.nonparent_charges_lst:
            total_charges += float(i)
        avg_charges = total_charges / len(self.nonparent_charges_lst)
        return round(avg_charges, 2)
    
    def avg_parent(self):
        total_charges = 0.0
        for i in self.parent_charges_lst:
            total_charges += float(i)
        avg_charges = total_charges / len(self.parent_charges_lst)
        return round(avg_charges, 2)

#Parents broken down into number of children
    def avg_parent_1(self):
        total_charges = 0.0
        for i in self.parent_1_charges_lst:
            total_charges += float(i)
        avg_charges = total_charges / len(self.parent_1_charges_lst)
        return round(avg_charges, 2)
    
    def avg_parent_2(self):
        total_charges = 0.0
        for i in self.parent_2_charges_lst:
            total_charges += float(i)
        avg_charges = total_charges / len(self.parent_2_charges_lst)
        return round(avg_charges, 2)

    def avg_parent_3up(self):
        total_charges = 0.0
        for i in self.parent_3up_charges_lst:
            total_charges += float(i)
        avg_charges = total_charges / len(self.parent_3up_charges_lst)
        return round(avg_charges, 2)
    
#Methods based on Smoking status:
    def avg_smoker(self):
        total_charges = 0.0
        for i in self.smoker_charges_lst:
            total_charges += float(i)
        avg_charges = total_charges / len(self.smoker_charges_lst)
        return round(avg_charges, 2)
    
    def avg_nonsmoker(self):
        total_charges = 0.0
        for i in self.nonsmoker_charges_lst:
            total_charges += float(i)
        avg_charges = total_charges / len(self.nonsmoker_charges_lst)
        return round(avg_charges, 2)

#### I'll just copy the process that I used for BMI to break the charges into different demographics

In [12]:
sw_charges_breakdown = ChargesBreakdown(sw_dict)
nw_charges_breakdown = ChargesBreakdown(nw_dict)
se_charges_breakdown = ChargesBreakdown(se_dict)
ne_charges_breakdown = ChargesBreakdown(ne_dict)

In [26]:
sw_average_charges = sw_charges_breakdown.avg_charges()
nw_average_charges = nw_charges_breakdown.avg_charges()
se_average_charges = se_charges_breakdown.avg_charges()
ne_average_charges = ne_charges_breakdown.avg_charges()

avg_charge_dict = {
    'SouthWest': sw_average_charges,
    'NorthWest': nw_average_charges,
    'SouthEast': se_average_charges,
    'NorthEast': ne_average_charges,
    'Entire US': round(
        ((sw_average_charges+nw_average_charges+se_average_charges+ne_average_charges)/4), 2
    )
}
I seem to be writing a lot of code to print out results, can I automate this?

for key in avg_charge_dict:
    print('The average charge for medical insurance in the {key} is: \n \t ${value}'.format(key=key, value=avg_charge_dict[key]))
    if avg_charge_dict[key] > avg_charge_dict['Entire US']:
        diff_mean = round((avg_charge_dict[key] - avg_charge_dict['Entire US']), 2)
        print(
            'This is ${diff} more than the national average of ${nationalavg} \n'.format(
                diff=diff_mean, nationalavg=avg_charge_dict['Entire US']
            )
        )
    elif avg_charge_dict[key] < avg_charge_dict['Entire US']:
        diff_mean = round((avg_charge_dict['Entire US'] - avg_charge_dict[key]),2)
        print(
            'This is ${diff} less than the national average of ${nationalavg} \n'.format(
                diff=diff_mean, nationalavg=avg_charge_dict['Entire US']
            )
        ) 
    else:
        print("This is about the same as the national average")


Object `this` not found.
The average charge for medical insurance in the SouthWest is: 
 	 $12346.94
This is $879.64 less than the national average of $13226.58 

The average charge for medical insurance in the NorthWest is: 
 	 $12417.58
This is $809.0 less than the national average of $13226.58 

The average charge for medical insurance in the SouthEast is: 
 	 $14735.41
This is $1508.83 more than the national average of $13226.58 

The average charge for medical insurance in the NorthEast is: 
 	 $13406.38
This is $179.8 more than the national average of $13226.58 

The average charge for medical insurance in the Entire US is: 
 	 $13226.58
This is about the same as the national average


### I need a more comprehensive dictionary in order to break things down and compare things methinks.
#### Am I out of my element here?
#### What are the parameters again?

    Age
    Sex (Binary-normative)
    BMI
    Number of Children
    Smoking/Nonsmoking status
    Region
    Insurance Cost



In [16]:
comprehensive_dict = {
    'Southwest': {
        'population': {
            #'regional_pop': len(sw_bmi_breakdown.total_bmi_lst),
            'under_30': len(sw_bmi_breakdown.under_30_bmi_lst),
            'over_29': len(sw_bmi_breakdown.over_29_bmi_lst),
            'male': len(sw_bmi_breakdown.male_bmi_lst),
            'female': len(sw_bmi_breakdown.female_bmi_lst),
            'nonparent': len(sw_bmi_breakdown.nonparent_bmi_lst),
            'parent': len(sw_bmi_breakdown.parent_bmi_lst),
            'parent1': len(sw_bmi_breakdown.parent_1_bmi_lst),
            'parent2': len(sw_bmi_breakdown.parent_2_bmi_lst),
            'parent3up': len(sw_bmi_breakdown.parent_3up_bmi_lst),
            #'nonsmoker': len(sw_bmi_breakdown.nonsmoker_bmi_lst),
            'smoker': len(sw_bmi_breakdown.smoker_bmi_lst)
        },
        'avg_charges': {
            'regional_avg': sw_charges_breakdown.avg_charges(),
            'under_30': sw_charges_breakdown.avg_under_30(),
            'over_29': sw_charges_breakdown.avg_over_29(),
            'male': sw_charges_breakdown.avg_male(),
            'female': sw_charges_breakdown.avg_female(),
            'nonparent': sw_charges_breakdown.avg_nonparent(),
            'parent': sw_charges_breakdown.avg_parent(),
            'parent1': sw_charges_breakdown.avg_parent_1(),
            'parent2': sw_charges_breakdown.avg_parent_2(),
            'parent3up': sw_charges_breakdown.avg_parent_3up(),
            'nonsmoker': sw_charges_breakdown.avg_nonsmoker(),
            'smoker': sw_charges_breakdown.avg_smoker()
            },
        'avg_bmi': {
            'regional_avg': sw_bmi_breakdown.avg_bmi(),
            'under_30': sw_bmi_breakdown.avg_under_30(),
            'over_29': sw_bmi_breakdown.avg_over_29(),
            'male': sw_bmi_breakdown.avg_male(),
            'female': sw_bmi_breakdown.avg_female(),
            'nonparent': sw_bmi_breakdown.avg_nonparent(),
            'parent': sw_bmi_breakdown.avg_parent(),
            'parent1': sw_bmi_breakdown.avg_parent_1(),
            'parent2': sw_bmi_breakdown.avg_parent_2(),
            'parent3up': sw_bmi_breakdown.avg_parent_3up(),
            'nonsmoker': sw_bmi_breakdown.avg_nonsmoker(),
            'smoker': sw_bmi_breakdown.avg_smoker()
        }
    },
    'Northwest': {
        'population': {
            #'regional_pop': len(nw_bmi_breakdown.total_bmi_lst),
            'under_30': len(nw_bmi_breakdown.under_30_bmi_lst),
            'over_29': len(nw_bmi_breakdown.over_29_bmi_lst),
            'male': len(nw_bmi_breakdown.male_bmi_lst),
            'female': len(nw_bmi_breakdown.female_bmi_lst),
            'nonparent': len(nw_bmi_breakdown.nonparent_bmi_lst),
            'parent': len(nw_bmi_breakdown.parent_bmi_lst),
            'parent1': len(nw_bmi_breakdown.parent_1_bmi_lst),
            'parent2': len(nw_bmi_breakdown.parent_2_bmi_lst),
            'parent3up': len(nw_bmi_breakdown.parent_3up_bmi_lst),
            #'nonsmoker': len(nw_bmi_breakdown.nonsmoker_bmi_lst),
            'smoker': len(nw_bmi_breakdown.smoker_bmi_lst)
        },
        'avg_charges': {
            'regional_avg': nw_charges_breakdown.avg_charges(),
            'under_30': nw_charges_breakdown.avg_under_30(),
            'over_29': nw_charges_breakdown.avg_over_29(),
            'male': nw_charges_breakdown.avg_male(),
            'female': nw_charges_breakdown.avg_female(),
            'nonparent': nw_charges_breakdown.avg_nonparent(),
            'parent': nw_charges_breakdown.avg_parent(),
            'parent1': nw_charges_breakdown.avg_parent_1(),
            'parent2': nw_charges_breakdown.avg_parent_2(),
            'parent3up': nw_charges_breakdown.avg_parent_3up(),
            'nonsmoker': nw_charges_breakdown.avg_nonsmoker(),
            'smoker': nw_charges_breakdown.avg_smoker()
            },
        'avg_bmi': {
            'regional_avg': nw_bmi_breakdown.avg_bmi(),
            'under_30': nw_bmi_breakdown.avg_under_30(),
            'over_29': nw_bmi_breakdown.avg_over_29(),
            'male': nw_bmi_breakdown.avg_male(),
            'female': nw_bmi_breakdown.avg_female(),
            'nonparent': nw_bmi_breakdown.avg_nonparent(),
            'parent': nw_bmi_breakdown.avg_parent(),
            'parent1': nw_bmi_breakdown.avg_parent_1(),
            'parent2': nw_bmi_breakdown.avg_parent_2(),
            'parent3up': nw_bmi_breakdown.avg_parent_3up(),
            'nonsmoker': nw_bmi_breakdown.avg_nonsmoker(),
            'smoker': nw_bmi_breakdown.avg_smoker()
        }
    },
    "Southeast": {
        'population': {
            #'regional_pop': len(se_bmi_breakdown.total_bmi_lst),
            'under_30': len(se_bmi_breakdown.under_30_bmi_lst),
            'over_29': len(se_bmi_breakdown.over_29_bmi_lst),
            'male': len(se_bmi_breakdown.male_bmi_lst),
            'female': len(se_bmi_breakdown.female_bmi_lst),
            'nonparent': len(se_bmi_breakdown.nonparent_bmi_lst),
            'parent': len(se_bmi_breakdown.parent_bmi_lst),
            'parent1': len(se_bmi_breakdown.parent_1_bmi_lst),
            'parent2': len(se_bmi_breakdown.parent_2_bmi_lst),
            'parent3up': len(se_bmi_breakdown.parent_3up_bmi_lst),
            'nonsmoker': len(se_bmi_breakdown.nonsmoker_bmi_lst),
            'smoker': len(se_bmi_breakdown.smoker_bmi_lst)
        },
        'avg_charges': {
            'regional_avg': se_charges_breakdown.avg_charges(),
            'under_30': se_charges_breakdown.avg_under_30(),
            'over_29': se_charges_breakdown.avg_over_29(),
            'male': se_charges_breakdown.avg_male(),
            'female': se_charges_breakdown.avg_female(),
            'nonparent': se_charges_breakdown.avg_nonparent(),
            'parent': se_charges_breakdown.avg_parent(),
            'parent1': se_charges_breakdown.avg_parent_1(),
            'parent2': se_charges_breakdown.avg_parent_2(),
            'parent3up': se_charges_breakdown.avg_parent_3up(),
            'nonsmoker': se_charges_breakdown.avg_nonsmoker(),
            'smoker': se_charges_breakdown.avg_smoker()
            },
        'avg_bmi': {
            'regional_avg': se_bmi_breakdown.avg_bmi(),
            'under_30': se_bmi_breakdown.avg_under_30(),
            'over_29': se_bmi_breakdown.avg_over_29(),
            'male': se_bmi_breakdown.avg_male(),
            'female': se_bmi_breakdown.avg_female(),
            'nonparent': se_bmi_breakdown.avg_nonparent(),
            'parent': se_bmi_breakdown.avg_parent(),
            'parent1': se_bmi_breakdown.avg_parent_1(),
            'parent2': se_bmi_breakdown.avg_parent_2(),
            'parent3up': se_bmi_breakdown.avg_parent_3up(),
            'nonsmoker': se_bmi_breakdown.avg_nonsmoker(),
            'smoker': se_bmi_breakdown.avg_smoker()
        }
    },
    "Northeast": {
        'population': {
            #'regional_pop': len(ne_bmi_breakdown.total_bmi_lst),
            'under_30': len(ne_bmi_breakdown.under_30_bmi_lst),
            'over_29': len(ne_bmi_breakdown.over_29_bmi_lst),
            'male': len(ne_bmi_breakdown.male_bmi_lst),
            'female': len(ne_bmi_breakdown.female_bmi_lst),
            'nonparent': len(ne_bmi_breakdown.nonparent_bmi_lst),
            'parent': len(ne_bmi_breakdown.parent_bmi_lst),
            'parent1': len(ne_bmi_breakdown.parent_1_bmi_lst),
            'parent2': len(ne_bmi_breakdown.parent_2_bmi_lst),
            'parent3up': len(ne_bmi_breakdown.parent_3up_bmi_lst),
            #'nonsmoker': len(ne_bmi_breakdown.nonsmoker_bmi_lst),
            'smoker': len(ne_bmi_breakdown.smoker_bmi_lst)
        },
        'avg_charges': {
            'regional_avg': ne_charges_breakdown.avg_charges(),
            'under_30': ne_charges_breakdown.avg_under_30(),
            'over_29': ne_charges_breakdown.avg_over_29(),
            'male': ne_charges_breakdown.avg_male(),
            'female': ne_charges_breakdown.avg_female(),
            'nonparent': ne_charges_breakdown.avg_nonparent(),
            'parent': ne_charges_breakdown.avg_parent(),
            'parent1': ne_charges_breakdown.avg_parent_1(),
            'parent2': ne_charges_breakdown.avg_parent_2(),
            'parent3up': ne_charges_breakdown.avg_parent_3up(),
            'nonsmoker': ne_charges_breakdown.avg_nonsmoker(),
            'smoker': ne_charges_breakdown.avg_smoker()
            },
        'avg_bmi': {
            'regional_avg': ne_bmi_breakdown.avg_bmi(),
            'under_30': ne_bmi_breakdown.avg_under_30(),
            'over_29': ne_bmi_breakdown.avg_over_29(),
            'male': ne_bmi_breakdown.avg_male(),
            'female': ne_bmi_breakdown.avg_female(),
            'nonparent': ne_bmi_breakdown.avg_nonparent(),
            'parent': ne_bmi_breakdown.avg_parent(),
            'parent1': ne_bmi_breakdown.avg_parent_1(),
            'parent2': ne_bmi_breakdown.avg_parent_2(),
            'parent3up': ne_bmi_breakdown.avg_parent_3up(),
            'nonsmoker': ne_bmi_breakdown.avg_nonsmoker(),
            'smoker': ne_bmi_breakdown.avg_smoker()
        }
    }
}

for region in comprehensive_dict:
    print(region)
    for key in comprehensive_dict[region]:
        highest_value = 0.0
        lowest_value = 100000.0
        for demo in comprehensive_dict[region][key]:
            if comprehensive_dict[region][key][demo] > highest_value:
                highest_value = comprehensive_dict[region][key][demo]
                highest_demo = demo
            elif comprehensive_dict[region][key][demo] < lowest_value:
                lowest_value = comprehensive_dict[region][key][demo]
                lowest_demo = demo
            else:
                pass
        print("\t The demo with the highest " + key + " is: ")
        print("\t \t" + highest_demo + " with " + str(highest_value))
        print("\t The demo with the lowest " + key + " is: ")
        print("\t \t" + lowest_demo + " with " + str(lowest_value) + "\n")


Southwest
	 The demo with the highest population is: 
	 	over_29 with 226
	 The demo with the lowest population is: 
	 	parent3up with 52

	 The demo with the highest avg_charges is: 
	 	smoker with 32269.06
	 The demo with the lowest avg_charges is: 
	 	nonsmoker with 8019.28

	 The demo with the highest avg_bmi is: 
	 	parent2 with 32.28
	 The demo with the lowest avg_bmi is: 
	 	under_30 with 29.08

Northwest
	 The demo with the highest population is: 
	 	over_29 with 223
	 The demo with the lowest population is: 
	 	parent3up with 53

	 The demo with the highest avg_charges is: 
	 	smoker with 30192.0
	 The demo with the lowest avg_charges is: 
	 	under_30 with 7981.65

	 The demo with the highest avg_bmi is: 
	 	parent3up with 30.27
	 The demo with the lowest avg_bmi is: 
	 	under_30 with 28.54

Southeast
	 The demo with the highest population is: 
	 	nonsmoker with 273
	 The demo with the lowest population is: 
	 	parent3up with 46

	 The demo with the highest avg_charges is: 
	 

Blerrrrgh I'm ready to move onto the next lesson. I'd love to break this down and really finish the project, but the issue I'm running into is that I've got such a small amount of time to do this kind of thing and I feel like I'm forgetting the fundamentals. I'm going to put a pin in this jupyter and move onto the next unit.