# U.S. Medical Insurance Costs

## Introducing the Project

**Insurance.csv Categories**
- age
- sex
- bmi
- number of children
- smoker/non-smoker
- region
- charges


**What questions are we interested in answering?**
- Find out the average age of the patients in the dataset.
- Analyze where a majority of the individuals are from.
- Look at the different costs between smokers vs. non-smokers.
- Figure out what the average age is for someone who has at least one child in this dataset
- Find the average number of cihldren for different age groupings





### Initializing the Data

First, we import the necessary packages for analysis and create a pandas dataframe to begin organizing the dataset.

By utilizing the head function, from the pandas library, we can gather an idea of what the dataset currently looks like.

In [None]:
import pandas as pd
import numpy as np
import matplotlib as plt

insurance_data = pd.read_csv("insurance.csv")
insurance_data.head()

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,female,27.9,0,yes,southwest,16884.924
1,18,male,33.77,1,no,southeast,1725.5523
2,28,male,33.0,3,no,southeast,4449.462
3,33,male,22.705,0,no,northwest,21984.47061
4,32,male,28.88,0,no,northwest,3866.8552


### Answering the Important Questions 

In [None]:
# 1.) Finding the Average Age of the Patients

average_age = insurance_data['age'].mean()
print('The average age of the patients is {}'.format(average_age))




# 2.) Where Are the Majority of Patients From

print(insurance_data['region'].value_counts())




# 3.) Look at difference in costs between smokers and non-smokers

smoker_cost = 0
smoker_count = 0
non_smoker_cost = 0
non_smoker_count = 0

for i in range(len(insurance_data)):
    if insurance_data.loc[i,'smoker'] == 'yes':
        smoker_cost += insurance_data.loc[i, 'charges']
        smoker_count += 1
    else:
        non_smoker_cost += insurance_data.loc[i, 'charges']
        non_smoker_count += 1

avg_smo_cost = smoker_cost / smoker_count
avg_non_cost = non_smoker_cost / non_smoker_count

print('The average cost for a smoker is {}, and the average cost of a non-smoker is {}'.format(avg_smo_cost, avg_non_cost))




# 4.) Find the average age for someone who has at least one child

total_age_child = 0
total_people_with_child = 0

for i in range(len(insurance_data)):
    if insurance_data.loc[i, 'children'] > 0:
        total_age_child += insurance_data.loc[i, 'age']
        total_people_with_child += 1

avg_age_child = total_age_child / total_people_with_child
print('The average age for someone who has at least one child is {}'.format(avg_age_child))




# 5.) Average number of children for different age groupings

age_groups = {0 : 0,
              1 : 20,
              2 : 40,
              3 : 60,
              4 : 80}

# Describes the age ranges for each group (0-19, 20-39, 40-59, 60-79, 80+)

age_group_counter = {0 : 0,
                     1 : 0,
                     2 : 0,
                     3 : 0,
                     4 : 0}

# Counts the number of people in each group

child_counter = {0 : 0,
                 1 : 0,
                 2 : 0,
                 3 : 0,
                 4 : 0}

# Counts the number of children per each age group

for i in range(len(insurance_data)):
    if insurance_data.loc[i, 'age'] < age_groups[1]:
        age_group_counter[0] += 1
        child_counter[0] += insurance_data.loc[i, 'children']
    elif insurance_data.loc[i, 'age'] < age_groups[2]:
        age_group_counter[1] += 1
        child_counter[1] += insurance_data.loc[i, 'children']
    elif insurance_data.loc[i, 'age'] < age_groups[3]:
        age_group_counter[2] += 1
        child_counter[2] += insurance_data.loc[i, 'children']
    elif insurance_data.loc[i, 'age'] < age_groups[4]:
        age_group_counter[3] += 1
        child_counter[3] += insurance_data.loc[i, 'children']
    elif insurance_data.loc[i, 'age'] >= age_groups[4]:
        age_group_counter[4] += 1
        child_counter[4] += insurance_data.loc[i, 'children']

print('The number of people in each group is : {}'.format(age_group_counter))
# 137 537 550 114 0
# Number of people in each age group

print('The total children for each age group is: {}'.format(child_counter))
# 1465
# Number of children per age group

avg_child_dict = {}

for i in range(len(age_group_counter)):
    if age_group_counter[i] == 0:
        avg_child_dict[i] = 0
    else:
        avg_child_dict[i] = child_counter[i] / age_group_counter[i]

print('The average number of children for each age group is: {}'.format(avg_child_dict))


The average age of the patients is 39.20702541106129
region
southeast    364
southwest    325
northwest    325
northeast    324
Name: count, dtype: int64
