# U.S. Medical Insurance Costs

By: Abraham Mendoza, Completed on: 
Reach me at: mendoabr@oregonstate.edu

# What to Analyze
- Average age
- Average BMI
- Average Charges
- Where the majority of individuals are from
- Look at different costs between smoker vs non-smoker

Things to include:
- Goals
- Data
- Analysis

# Goals

The aim of this project is to find out the following information:
- Average age
- Average BMI
- Average Charges
- Where the majority of individuals are from
- Look at different costs between smoker vs non-smoker

# Importing Data & Creating Temporary Variables

In [96]:
import csv
import pandas

age = []
sex = []
bmi = []
children = []
smoker = []
region = []
charges = []

with open('insurance.csv') as insurance_data:
    data = csv.reader(insurance_data, delimiter = ",")
    
    for dataline in data:
        age.append(dataline[0])
        sex.append(dataline[1])
        bmi.append(dataline[2])
        children.append(dataline[3])
        smoker.append(dataline[4])
        region.append(dataline[5])
        charges.append(dataline[6])

# Data Variables

We update the previous variables to not include the first entry which is the label for the column.

In [97]:
age_updated = age[1:]
sex_updated = sex[1:]
bmi_updated = bmi[1:]
children_updated = children[1:]
smoker_updated = smoker[1:]
region_updated = region[1:]
charges_updated = charges[1:]

# Average Age

We first look into obtaining the general age of all data that was recorded.

In [98]:
#Created a function that will return the mean age from a list of ages
def mean_finder(age_list):
    sum_ages = 0
    for age in age_list:
        sum_ages += int(age)
    return sum_ages / len(age_list)
#We use function to find mean age for all participants

mean_age = round(mean_finder(age_updated), 2)
print("Mean Age is: " + str(mean_age) + " Years")
        

Mean Age is: 39.21 Years


# Average BMI
We obtain the average BMI with a similar function as created to find mean age.

In [99]:
def mean_bmi_finder(bmi_list):
    sum_bmi = 0
    for bmi in bmi_list:
        sum_bmi += float(bmi)
    return sum_bmi / len(bmi_list)

mean_bmi = round((mean_bmi_finder(bmi_updated)), 2)
print("Mean BMI is: " + str(mean_bmi))

Mean BMI is: 30.66


# Average Charges

In [100]:
def mean_charges_finder(charges_list):
    sum_charges = 0
    for charge in charges_list:
        sum_charges += float(charge)
    return sum_charges/len(charges_list)
mean_charges = round(mean_charges_finder(charges_updated), 2)
print("Mean Charges is: $" + str(mean_charges))

Mean Charges is: $13270.42


# Where are the Majority of Individuals From
We create a function that will spit out the location where the majority of individuasl are from. The function will be coded where we are able to find out exactly the amount of individuals in each region.

In [101]:
def location_counter(location_list):
    location_dict = {}
    highest_location_count = 0
    highest_location = ""
    
    
    for location in location_list:
        if location not in location_dict.keys():
            location_dict[location] = 1
        else:
            location_dict[location] += 1
    
    for key,value in location_dict.items():
        if value > highest_location_count:
            highest_location_count = value
            highest_location = key
    
    return highest_location
            
    
    
majority_region = location_counter(region_updated)
print("The majority of individuals are from: the " + majority_region)

The majority of individuals are from: the southeast


# Comparing Costs Between Smokers and Non-Smokers
We will compare the average costs between an individual who is a smoker and an individual who is a non-smoker.

In [102]:
def smoker_status_coster(smoker_updated, charges_updated):
    smoker_cost = 0
    smoker_count = 0
    non_smoker_cost = 0
    non_smoker_count = 0
    for i in range(0, len(smoker_updated)):
        if smoker_updated[i] == 'yes':
            smoker_cost += float(charges_updated[i])
            smoker_count += 1
        elif smoker_updated[i] == 'no':
            non_smoker_cost += float(charges_updated[i])
            non_smoker_count += 1
    avg_smoker_cost = round(smoker_cost/smoker_count, 2)
    avg_nonsmoker_cost = round(non_smoker_cost/non_smoker_count, 2)
    difference = avg_smoker_cost - avg_nonsmoker_cost
    print("Smoker | ", "Non-Smoker | ", "Difference")
    return avg_smoker_cost, avg_nonsmoker_cost, difference

smoker_status_coster(smoker_updated, charges_updated)


Smoker |  Non-Smoker |  Difference


(32050.23, 8434.27, 23615.96)

When comparing solely the average cost of a smoker versus a non-smoker, we can see that a smoker will pay on average $23,615.96 more in health insurance costs when compared to a non-smoker.

# Creating a Class

In [116]:
class MedicalAnalyzer:
    
    def __init__(self, age_list, bmi_list, charges_list, region_list, smoker_list):
        self.age_list = age_list
        self.bmi_list = bmi_list
        self.charges_list = charges_list
        self.region_list = region_list
        self.smoker_list = smoker_list
    
    def mean_age(self):
        sum_ages = 0
        for age in self.age_list:
            sum_ages += int(age)
        return sum_ages / len(self.age_list)
    
    def mean_bmi(self):
        sum_bmi = 0
        for bmi in self.bmi_list:
            sum_bmi += float(bmi)
        return sum_bmi / len(self.bmi_list)
    
    def mean_charges(self):
        sum_charges = 0
        for charge in self.charges_list:
            sum_charges += float(charge)
        return sum_charges/len(self.charges_list)
    
    def highest_region(self):
        location_dict = {}
        highest_location_count = 0
        highest_location = ""
        
        for location in self.region_list:
            if location not in location_dict.keys():
                location_dict[location] = 1
            else:
                location_dict[location] += 1

        for key,value in location_dict.items():
            if value > highest_location_count:
                highest_location_count = value
                highest_location = key

        return highest_location
    
    def smoker_diff(self):
        smoker_cost = 0
        smoker_count = 0
        non_smoker_cost = 0
        non_smoker_count = 0
        for i in range(0, len(self.smoker_updated)):
            if self.smoker_updated[i] == 'yes':
                smoker_cost += float(charges_updated[i])
                smoker_count += 1
            elif self.smoker_updated[i] == 'no':
                non_smoker_cost += float(charges_updated[i])
                non_smoker_count += 1
        avg_smoker_cost = round(smoker_cost/smoker_count, 2)
        avg_nonsmoker_cost = round(non_smoker_cost/non_smoker_count, 2)
        difference = avg_smoker_cost - avg_nonsmoker_cost
        return difference        

Testing out the class

In [117]:
#creating object

analysis = MedicalAnalyzer(age_updated, bmi_updated, charges_updated, region_updated, smoker_updated)

#Testing out if functions work

#meanAge

print(analysis.mean_age())
print(analysis.mean_bmi())
print(analysis.highest_region())

39.20702541106129
30.663396860986538
southeast


The class works fine. Now one is able to create an object and will then input all of the corresponding lists. Then one is able to find exacmtly what they want from that object by calling out the functions within the object. I primarly used a lot of the code that was previously written though I just had to make sure that I addressed the object itself (in this case the object was mostly all of the lists that were previously created).

# Potential Influential Features
One of the most obvious features that has a greater impact on your health insurance is if you are a smoker. There are also other features that have an impact such as: 

In [167]:
def feature_costs_comparison(feature, charges_list):
    feature_cost_list = []
    for i in range(len(feature)):
        dict = {feature[i]: charges_list[i]}
        feature_cost_list.append(dict)
    
    #dictionary with feature_list and mean costs when organizing by that feature
    feature_cost_dict = {}
        
    for index in feature_cost_list:
        for key,value in index.items():
            if key not in feature_cost_dict:
                feature_cost_dict[key] = [float(value)]
            elif key in feature_cost_dict:
                feature_cost_dict[key].append(float(value))
                
    #Create dictionary with the key and then the mean costs for that corresponding key's value which is a list.
    # I should be able to get the mean value from the lists
    
    
feature_costs_comparison(age_updated, charges_updated)