# U.S. Medical Insurance Costs

The purpose of this project is to investigate a U.S. medical insurance dataset for variables that are likely to affect a person's insurance costs. Additionally, each of the variables being considered will be analysed for tendencies in the dataset. Variables will also be compared with each other to determine if there is a relationship between any of them.

## Data Overview

### Importing Dataset

In [21]:
import csv

with open('insurance.csv', newline = '') as insurance_csv:
    reader = csv.DictReader(insurance_csv, delimiter = ',')
    data = list(reader)
    
print(data[0])

{'age': '19', 'sex': 'female', 'bmi': '27.9', 'children': '0', 'smoker': 'yes', 'region': 'southwest', 'charges': '16884.924'}


### Data Cleaning

In [26]:
# Converting numerical values from string to integer and float values (age, bmi, children, charges)
# Conditionals will be used for converting sex and smoker to int data types
for patient in data:
    patient['age'] = int(patient['age'])
    patient['bmi'] = float(patient['bmi'])
    patient['children'] = int(patient['children'])
    patient['charges'] = float(patient['charges'])

    if patient['sex'] == 'female':
        patient['sex'] = 0
    elif patient['sex'] == 'male':
        patient['sex'] = 1
    
    if patient['smoker'] == 'no':
        patient['smoker'] = 0
    elif patient['smoker'] == 'yes':
        patient['smoker'] = 1

print(data[0])

{'age': 19, 'sex': 0, 'bmi': 27.9, 'children': 0, 'smoker': 1, 'region': 'southwest', 'charges': 16884.924}


In [32]:
# Creating lists for individual variables
age_list = [patient['age'] for patient in data]
sex_list = [patient['sex'] for patient in data]
bmi_list = [patient['bmi'] for patient in data]
children_list = [patient['children'] for patient in data]
smoker_list = [patient['smoker'] for patient in data]
region_list = [patient['region'] for patient in data]
charges_list = [patient['charges'] for patient in data]

### Variables

In [18]:
print("Number of patients in the dataset: {}".format(len(data)))
print("Variables in the dataset: {}".format(data[0].keys()))

Number of patients in the dataset: 1338
Variables in the dataset: dict_keys(['age', 'sex', 'bmi', 'children', 'smoker', 'region', 'charges'])


The medical insurance cost dataset contains characteristics for 1,338 patients. For each of the patients the dataset measures 7 different characteristics:

- Age
- Sex
- BMI
- Number of Children
- Smoker or Non-Smoker
- Region
- Medical Insurance Charges


Out of the 7 variables being measured the dependent variable in this project will be the medical insurance charges while the remaining 6 variables will be considered the independent variables.

## Analysis

### Descriptive Statistics

#### Age

In [None]:
# Calculating average age of patients in the dataset
sum_age = 0
for age in age_list:
    sum_age += age
average_age = sum_age / len(age_list)

print('The average age of patients is {} years old.'.format())