# U.S. Medical Insurance Costs
This notebook is an analysis of a insurance.csv containing medical insurance cost for people in the United States. Through out this notebook we will be comparing the differences between decades of smokers and how that comparison holds against non-smokers.

## Goals
### To Determine:

What is the average cost of medical insurance for each decade of smokers and the difference between non-smokers?

What is the average gender for each decade of smokers, and the difference between non-smokers?

What is the average BMI for each decade of smokers, and the difference between non-smokers?

What is the average amount of children for each decade of smokers, and the difference between non-smokers?

What is the average region for each decade of smokers, and the difference between non-smokers?

### Decades Being Examined: (20-29),(30-39),(40-49),(50-59)

First we will start by importing the needed libraries

In [1]:
import csv

Next, we will create a helper method to help us create list of records within a certain age range and smoker status.

In [2]:
def age_range_maker(age_start, age_end, smoker_status):
    with open('insurance.csv') as insurance_csv:
        insurance_data = csv.DictReader(insurance_csv)
        age_range = []
        for record in insurance_data:
            if int(record['age']) >= age_start and int(record['age']) <= age_end and record['smoker'] == smoker_status:
                age_range.append(record)
        return age_range

We  will then create a list of records for each decade of smokers and non-smokers we plan on examining.

In [3]:
twenties_list = age_range_maker(20,29,'no')
twenties_smokers_list = age_range_maker(20,29,'yes')
thirties_list = age_range_maker(30,39,'no')
thirties_smokers_list = age_range_maker(30,39,'yes')
forties_list = age_range_maker(40,49,'no')
forties_smokers_list = age_range_maker(40,49,'yes')
fifties_list = age_range_maker(50,59,'no')
fifties_smokers_list = age_range_maker(50,59,'yes')

Our next goal will be to create a pair of helper methods that'll be used by the class methods to determine a type of average we want.

These are the types of averages we will using for each category:

* Average Charge -> Median
* Average Gender -> Mode
* Average BMI -> Median
* Average Amount of Children -> Median
* Average Region -> Mode

In [4]:
def average_by_mean(list_values):
    total_value = 0.0
    for value in list_values:
        total_value += float(value)
    final_value = total_value / len(list_values)
    return round(final_value, 2)

In [5]:
def average_by_mode(*args):
    if len(args) == 2:
        length_arg_one = len(args[0])
        length_arg_two = len(args[1])
        if length_arg_one > length_arg_two:
            return args[0][0]['sex']
        else:
            return args[1][0]['sex']
    if len(args) == 4:
        length_arg_one = len(args[0])
        length_arg_two = len(args[1])
        length_arg_three = len(args[2])
        length_arg_four = len(args[3])
        if length_arg_one > length_arg_two and length_arg_one > length_arg_three  and length_arg_one > length_arg_four:
            return args[0][0]['region']
        elif length_arg_two > length_arg_one and length_arg_two > length_arg_three  and length_arg_two > length_arg_four:
            return args[1][0]['region']
        elif length_arg_three > length_arg_one and length_arg_three > length_arg_two  and length_arg_three > length_arg_four:
            return args[2][0]['region']
        else:
            return args[3][0]['region']

Next, we will create a class which will allow us to create decade_analyzer objects, based on the time period passed during creation. These objects will contain methods to run analysis on the data of the object.

The analysis methods will be:

* `average_charge()`
* `average_gender()`
* `average_BMI()`
* `average_children()`
* `average_region()`

In [6]:
class decade_analyzer:
    def __init__(self, records):
        self.records = records

    def average_charge(self, definition):
        list_values = []
        for record in self.records:
            list_values.append(record['charges'])
        avg_charge = average_by_mean(list_values)
        print('The average charge for {group_definition} is: ${avg}'.format(group_definition=definition, avg=avg_charge))

    def average_gender(self, definition):
        males_list = []
        females_list = []
        for record in self.records:
            if record['sex'] == 'male':
                males_list.append(record)
            else:
                females_list.append(record)
        avg_gender = average_by_mode(males_list, females_list)
        print('The average gender for {group_definition} is: {avg}'.format(group_definition=definition, avg=avg_gender))

    def average_BMI(self, definition):
        list_values = []
        for record in self.records:
            list_values.append(record['bmi'])
        avg_bmi = average_by_mean(list_values)
        print('The average BMI for {group_definition} is: {avg}'.format(group_definition=definition, avg=avg_bmi))

    def average_children(self, definition):
        list_values = []
        for record in self.records:
            list_values.append(record['children'])
        avg_children = round(average_by_mean(list_values))
        print('The average amount of children for {group_definition} is: {avg}'.format(group_definition=definition, avg=avg_children))

    def average_region(self, definition):
        north_west = []
        north_east = []
        south_west = []
        south_east = []
        for record in self.records:
            if record['region'] == 'northwest':
                north_west.append(record)
            elif record['region'] == 'northeast':
                north_east.append(record)
            elif record['region'] == 'southwest':
                south_west.append(record)
            else:
                south_east.append(record)
        avg_region = average_by_mode(north_west, north_east, south_west, south_east)
        print('The average region for {group_definition} is: {avg}'.format(group_definition=definition, avg=avg_region))

We will than create an object of analysis for each decade of smoker and non-smoker

In [7]:
twenties = decade_analyzer(twenties_list)
twenties_smokers = decade_analyzer(twenties_smokers_list)

thirties = decade_analyzer(thirties_list)
thirties_smokers = decade_analyzer(thirties_smokers_list)

forties = decade_analyzer(forties_list)
forties_smokers = decade_analyzer(forties_smokers_list)

fifties = decade_analyzer(fifties_list)
fifties_smokers = decade_analyzer(fifties_smokers_list)

Now that all the methods and data are in place, we now analyze each decade independently with smokers vs non-smokers.

### Twenties

In [8]:
twenties.average_charge('twenties non-smokers')
twenties_smokers.average_charge('twenties smokers')

The average charge for twenties non-smokers is: $4921.63
The average charge for twenties smokers is: $28122.22


In [9]:
twenties.average_gender('twenties non-smokers')
twenties_smokers.average_gender('twenties smokers')

The average gender for twenties non-smokers is: male
The average gender for twenties smokers is: male


In [10]:
twenties.average_BMI('twenties non-smokers')
twenties_smokers.average_BMI('twenties smokers')

The average BMI for twenties non-smokers is: 29.64
The average BMI for twenties smokers is: 30.37


In [11]:
twenties.average_children('twenties non-smokers')
twenties_smokers.average_children('twenties smokers')

The average amount of children for twenties non-smokers is: 1
The average amount of children for twenties smokers is: 1


In [12]:
twenties.average_region('twenties non-smokers')
twenties_smokers.average_region('twenties smokers')

The average region for twenties non-smokers is: northwest
The average region for twenties smokers is: southeast


### Thirties

In [13]:
thirties.average_charge('thirties non-smokers')
thirties_smokers.average_charge('thirties non-smokers')

The average charge for thirties non-smokers is: $6337.36
The average charge for thirties non-smokers is: $30271.25


In [14]:
thirties.average_gender('thirties non-smokers')
thirties_smokers.average_gender('thirties smokers')

The average gender for thirties non-smokers is: female
The average gender for thirties smokers is: male


In [15]:
thirties.average_BMI('thirties non-smokers')
thirties_smokers.average_BMI('thirties smokers')

The average BMI for thirties non-smokers is: 30.42
The average BMI for thirties smokers is: 30.54


In [16]:
thirties.average_children('thirties non-smokers')
thirties_smokers.average_children('thirties smokers')

The average amount of children for thirties non-smokers is: 2
The average amount of children for thirties smokers is: 2


In [17]:
thirties.average_region('thirties non-smokers')
thirties_smokers.average_region('thirties smokers')

The average region for thirties non-smokers is: southeast
The average region for thirties smokers is: southwest


### Forties

In [18]:
forties.average_charge('forties non-smokers')
forties_smokers.average_charge('forties smokers')

The average charge for forties non-smokers is: $9183.34
The average charge for forties smokers is: $32654.72


In [19]:
forties.average_gender('forties non-smokers')
forties_smokers.average_gender('forties smokers')

The average gender for forties non-smokers is: female
The average gender for forties smokers is: male


In [20]:
forties.average_BMI('forties non-smokers')
forties_smokers.average_BMI('forties smokers')

The average BMI for forties non-smokers is: 30.87
The average BMI for forties smokers is: 30.14


In [21]:
forties.average_children('forties non-smokers')
forties_smokers.average_children('forties smokers')

The average amount of children for forties non-smokers is: 1
The average amount of children for forties smokers is: 1


In [22]:
forties.average_region('forties non-smokers')
forties_smokers.average_region('forties smokers')

The average region for forties non-smokers is: southwest
The average region for forties smokers is: southeast


### Fifties

In [23]:
fifties.average_charge('fifties non-smokers')
fifties_smokers.average_charge('fifties smokers')

The average charge for fifties non-smokers is: $12749.34
The average charge for fifties smokers is: $37508.75


In [24]:
fifties.average_gender('fifties non-smokers')
fifties_smokers.average_gender('fifties smokers')

The average gender for fifties non-smokers is: female
The average gender for fifties smokers is: male


In [25]:
fifties.average_BMI('fifties non-smokers')
fifties_smokers.average_BMI('fifties smokers')

The average BMI for fifties non-smokers is: 31.48
The average BMI for fifties smokers is: 31.66


In [26]:
fifties.average_children('fifties non-smokers')
fifties_smokers.average_children('fifties smokers')

The average amount of children for fifties non-smokers is: 1
The average amount of children for fifties smokers is: 1


In [27]:
fifties.average_region('fifties non-smokers')
fifties_smokers.average_region('fifties smokers')

The average region for fifties non-smokers is: southwest
The average region for fifties smokers is: southeast


## Overall Analysis Conclusion