# U.S. Medical Insurance Costs
This notebook is an analysis of a insurance.csv containing medical insurance cost for people in the United States. Through out this notebook we will be comparing the differences between decades of smokers and how that comparison holds against non-smokers.

## Goals
### To Determine:

What is the average cost of medical insurance for each decade of smokers and the difference between non-smokers?

What is the average gender for each decade of smokers, and the difference between non-smokers?

What is the average BMI for each decade of smokers, and the difference between non-smokers?

What is the average amount of children for each decade of smokers, and the difference between non-smokers?

What is the average region for each decade of smokers, and the difference between non-smokers?

### Decades Being Examined: (20-29),(30-39),(40-49),(50-59)

First we will start by importing the needed libraries

In [9]:
import csv

Next, we will create a helper method to help us create list of records within a certain age range and smoker status.

In [10]:
def age_range_maker(age_start, age_end, smoker_status):
    with open('insurance.csv') as insurance_csv:
        insurance_data = csv.DictReader(insurance_csv)
        age_range = []
        for record in insurance_data:
            if int(record['age']) >= age_start and int(record['age']) <= age_end and record['smoker'] == smoker_status:
                age_range.append(record)
        return age_range

We  will then create a list of records for each decade of smokers and non-smokers we plan on examining.

In [12]:
twenties = age_range_maker(20,29,'no')
twenties_smokers = age_range_maker(20,29,'yes')
thirties = age_range_maker(30,39,'no')
thirties_smokers = age_range_maker(30,39,'yes')
forties = age_range_maker(40,49,'no')
forties_smokers = age_range_maker(40,49,'yes')
fifties = age_range_maker(50,59,'no')
fifties_smokers = age_range_maker(50,59,'yes')

Our next goal will be to create a pair of helper methods that'll be used by the class methods to determine a type of average we want.

These are the types of averages we will using for each category:

* Average Charge -> Median
* Average Gender -> Mode
* Average BMI -> Median
* Average Amount of Children -> Median
* Average Region -> Mode

In [15]:
def average_by_mean(list_values):
    total_value = 0.0
    for value in list_values:
        total_value += float(value)
    final_value = total_value / len(list_values)
    return round(final_value, 2)

In [14]:
def average_by_mode(*args):
    if len(args) == 2:
        length_arg_one = len(args[0])
        length_arg_two = len(args[1])
        if length_arg_one > length_arg_two:
            return args[0][0]['sex']
        else:
            return args[1][0]['sex']
    if len(args) == 4:
        length_arg_one = len(args[0])
        length_arg_two = len(args[1])
        length_arg_three = len(args[2])
        length_arg_four = len(args[3])
        if length_arg_one > length_arg_two and length_arg_one > length_arg_three  and length_arg_one > length_arg_four:
            return args[0][0]['region']
        elif length_arg_two > length_arg_one and length_arg_two > length_arg_three  and length_arg_two > length_arg_four:
            return args[1][0]['region']
        elif length_arg_three > length_arg_one and length_arg_three > length_arg_two  and length_arg_three > length_arg_four:
            return args[2][0]['region']
        else:
            return args[3][0]['region']

Next, we will create a class which will allow us to create decade_analyzer objects, based on the time period passed during creation. These objects will contain methods to run analysis on the data of the object.

The analysis methods will be:

* `average_charge()`
* `average_gender()`
* `average_BMI()`
* `average_children()`
* `average_region()`

In [16]:
class decade_analyzer:
    def __init__(self, records):
        self.records = records

    def average_charge(self):
        list_values = []
        for record in self.records:
            list_values.append(record['charges'])
        a_charge = average_by_mean(list_values)
        print('The average charge is: ' + str(a_charge))

    def average_gender(self):
        males_list = []
        females_list = []
        for record in self.records:
            if record['sex'] == 'male':
                males_list.append(record)
            else:
                females_list.append(record)
        avg_gender = average_by_mode(males_list, females_list)
        print('The average gender is: ' + avg_gender)

    def average_BMI(self):
        list_values = []
        for record in self.records:
            list_values.append(record['bmi'])
        a_bmi = average_by_mean(list_values)
        print('The average BMI is: ' + str(a_bmi))

    def average_children(self):
        list_values = []
        for record in self.records:
            list_values.append(record['children'])
        a_children = round(average_by_mean(list_values))
        print('The average amount of children is: ' + str(a_children))

    def average_region(self):
        north_west = []
        north_east = []
        south_west = []
        south_east = []
        for record in self.records:
            if record['region'] == 'northwest':
                north_west.append(record)
            elif record['region'] == 'northeast':
                north_east.append(record)
            elif record['region'] == 'southwest':
                south_west.append(record)
            else:
                south_east.append(record)
        avg_region = average_by_mode(north_west, north_east, south_west, south_east)
        print('The average region is: ' + avg_region)

We will than create an object of analysis for each decade of smoker and non-smoker

In [18]:
twenties_analyzed = decade_analyzer(twenties)
twenties_smokers_analyzed = decade_analyzer(twenties_smokers)

thirties_analyzed = decade_analyzer(thirties)
thirties_smokers_analyzed = decade_analyzer(thirties_smokers)

forties_analyzed = decade_analyzer(forties)
forties_smokers_analyzed = decade_analyzer(forties_smokers)

fifties_analyzed = decade_analyzer(fifties)
fifties_smokers_analyzed = decade_analyzer(fifties_smokers)