# U.S. Medical Insurance Costs

In this project, a CSV file containing patient insurance costs and patient data will be analyzed using Python. The goal will be to investigate the various characteristics of insurance.csv to obtain insight into how the various characteristics of patients relate to the price of their medical insurance costs.

## Code breakdown:

In [None]:
import csv

The csv module will be necessary to analyze the file conveniently

In [None]:
def average_check(iterable, check, values_to_average):
    # Take input list, range of values to accept check, and data to take
    #  the average and return average.
    total = 0
    count = 0
    zipped_data = zip(iterable, values_to_average)
    for age, charge in zipped_data:
        if age in check:
            count += 1
            total += charge
    try:
        return total/count
    except:
        return 0

This function will take a section of data, check it against a provided filter, and retrieve the charges associated with each row. It will then return the average.

In [None]:
def store_averages():
    # Store average costs of ages
    for start, end in zip(range(0, 80, 10), range(9, 80, 10)):
        average_data['age_averages'].append(average_check(data['age'], list(range(start, end)),
                                                          data['charges']))

    # Store average costs of sexes
    average_data['sex_averages'].append(
        average_check(data['sex'], 'male', data['charges']))
    average_data['sex_averages'].append(
        average_check(data['sex'], 'female', data['charges']))

    # Store average costs of bmi
    for start, end in zip(range(20, 51, 10), range(30, 51, 10)):
        average_data['bmi_averages'].append(average_check(
            data['bmi'], list(range(start, end)), data['charges']))

    # average costs of smoking status
    average_data['smoking_averages'].append(average_check(
        data['smoker'], 'yes', data['charges']))
    average_data['smoking_averages'].append(average_check(
        data['smoker'], 'no', data['charges']))

    # average cost with children
    for children_count in list(range(0, 7)):
        average_data['children_averages'].append(average_check(
            data['children'], [children_count], data['charges']))

    # average cost from region
    for region in set(data['region']):
        average_data['region_averages'].append(average_check(
            data['region'], region, data['charges']))

This function will utilize the previous function and store the data in a dictionary for later analysis.

In [None]:
def print_averages():
    # average costs of ages
    for key, value in average_data.items():
        print('The average costs for {key} are: {value}'.format(
            key=key, value=value))

This function retrieves the data that was stored in the dictionary and prints it to the standard output.

In [5]:
def parse_data():
    # Parse csv data into data dictionary.
    with open('insurance.csv') as insurance_data:
        data_reader = csv.DictReader(insurance_data)
        for row in data_reader:
            for column, value in row.items():
                data[column].append(value)
        # Convert quantitiative data into numerical representations
    data['age'] = list(map(int, data['age']))
    data['bmi'] = list(map(float, data['bmi']))
    data['charges'] = list(map(float, data['charges']))
    data['children'] = list(map(int, data['children']))

This function parses the csv file for data and converts the numerical/quantitative data into relevant datatypes.

In [6]:
# Data storage for parsed csv.
data = {
    'age': [],
    'sex': [],
    'bmi': [],
    'children': [],
    'smoker': [],
    'region': [],
    'charges': [],
}

The dictionary above is used to store the data parsed by the parse_data function and accessed by the average_check functions passed in store_averages.

In [7]:
# Variables to hold averages stored by store_averages.
average_data = {
    'age_averages': [],
    'sex_averages': [],
    'bmi_averages': [],
    'children_averages': [],
    'smoking_averages': [],
    'region_averages': [],
}

The variables above are are where the averages are stored by the store_averages function for later analysis.

In [8]:
parse_data()
store_averages()
print_averages()

The average costs for age_averages are: [0, 7086.2175563623205, 9469.075096521734, 11734.532088491378, 14589.201669003984, 16251.265503739825, 21248.021884912272, 0]
The average costs for sex_averages are: [13956.751177721886, 13270.422265141257]
The average costs for bmi_averages are: [10961.087229, 7778.5103288888895, 13063.883]
The average costs for children_averages are: [12365.975601635882, 12731.171831635793, 15073.563733958328, 15355.31836681528, 13850.656311199999, 8786.035247222222, 0]
The average costs for smoking_averages are: [32050.23183153285, 8434.268297856199]
The average costs for region_averages are: [12346.93737729231, 13406.3845163858, 14735.411437609895, 12417.575373969228]


Finally, the averages are printed and we can see some correlations.

## Observations:
* An increase in age correlates to an increase in insurance prices.
* Males appear to have a higher average than females in terms of insurance costs.
* A bmi in the middle correlates lower insurance costs than those on the lower and higher ends.
* There appears that rising children correlates with rasing costs.
* The southwest region appears to have the highest insurance costs.