# U.S. Medical Insurance Costs

## Create data dictionary per patient and lists per column
- We will create a general data dictionary per every patient and that will contain all a a patients details. Every individual dictionary will be added to a list of patients. This will make data manipulation easier.

- Certain numbers will have to be turned into integers and floats.

- At the same time, we will create individual lists per columns which we will use on further analysis exercises.

In [168]:
import csv

age = []
sex = []
bmi = []
children = []
smoker = []
region = []
charges = []

with open("insurance.csv") as insurance_file:
    patients_list = []
    csv_dict = csv.DictReader(insurance_file)

    print("Fields:",csv_dict.fieldnames)
    
    for row in csv_dict:
        row["age"] = int(row["age"])
        row["children"] = int(row["children"])
        row["bmi"] = round(float(row["bmi"]), 1)
        row["charges"] = round(float(row["charges"]),2)
        age.append(row["age"])
        sex.append(row["sex"])
        bmi.append(row["bmi"])
        children.append(row["children"])
        smoker.append(row["smoker"])
        region.append(row["region"])
        charges.append(row["charges"])
        patients_list.append(row)

Fields: ['age', 'sex', 'bmi', 'children', 'smoker', 'region', 'charges']


## Percentage of Patients per Regions
As we can see in the dictionariy below, the number of patients is about the same per every region

In [19]:
def data_percentage(data_list):
    data_dict_perc = {}
    for data in data_list:
        if data not in data_dict_perc:
            data_dict_perc[data] = round(data_list.count(data)/len(data_list)*100, 1)
        else: pass
    return data_dict_perc

print(data_percentage(region))

{'southwest': 24.3, 'southeast': 27.2, 'northwest': 24.3, 'northeast': 24.2}


## Percentage of each BMI category by Region
#### We know that our BMI catergories are:
- Underweight <= 18.5
- Normal weight = 18.5–24.9
- Overweight = 25–29.9
- Obesity > 29.9

Obesity and overweight can bring several health issues to patients, which is why we will determine the percentage of people that are under the different BMI categories.

This function will create a list of dictionaries containing the BMI's categories compared to any factor in our dictionary of patients.

In [60]:
def bmi_category_vs_anything(data, dict_key):
    n_data = 0
    n_under = 0
    n_normal = 0
    n_over = 0
    n_obesity = 0
    for patient in patients_list:
        if patient[dict_key] == data:
            n_data += 1
            if patient["bmi"] <= 18.5:
                n_under += 1
            if 25 >= patient["bmi"] > 18.5:
                n_normal += 1
            if 29.9 >= patient["bmi"] > 25:
                n_over += 1
            if patient["bmi"] > 29.9:
                n_obesity += 1
                
    percentage_under = round(n_under/n_data*100, 1)
    pencentage_normal = round(n_normal/n_data*100, 1)
    percentage_over = round(n_over/n_data*100, 1)
    percentage_obesity = round(n_obesity/n_data*100, 1)
    
    data_bmi_dict = {data: {"Underweight": percentage_under, "Normal": pencentage_normal, "Overweight": percentage_over, "Obesity": percentage_obesity}}
    return data_bmi_dict

region_bmi_list = []
region_bmi_list.append(bmi_category_vs_anything("southwest", "region"))
region_bmi_list.append(bmi_category_vs_anything("southeast", "region"))
region_bmi_list.append(bmi_category_vs_anything("northwest", "region"))
region_bmi_list.append(bmi_category_vs_anything("northeast", "region"))

print(region_bmi_list)

[{'southwest': {'Underweight': 1.2, 'Normal': 15.1, 'Overweight': 30.5, 'Obesity': 53.2}}, {'southeast': {'Underweight': 0.0, 'Normal': 11.3, 'Overweight': 22.0, 'Obesity': 66.8}}, {'northwest': {'Underweight': 2.2, 'Normal': 19.4, 'Overweight': 32.9, 'Obesity': 45.5}}, {'northeast': {'Underweight': 3.1, 'Normal': 22.5, 'Overweight': 30.2, 'Obesity': 44.1}}]


We can conclude by our dictionary that there is a high Obesity BMI in patients, given that the lowest rate, on the Northeast region, is 44.1% and the highest, on the Southeast region, is 66.8%.

Our "Normal" BMI is between 11.3% for the Southeast and 22.5% for the Northeast.

## Percentage of Male and Female Patients

As we can see in the dictionary below, male and female patients are evenly distributed (50/50). In this case, we can reuse the function **data_percentage(data_list)**

In [22]:
print(data_percentage(sex))

{'female': 49.5, 'male': 50.5}


## BMI Category vs Sex
Now we will compare the different BMI categories for male and female patients. Here we can reuse our **bmi_category_vs_anything(data, dict_key)** function

In [17]:
sex_bmi_list = []
sex_bmi_list.append(bmi_category_vs_anything("male", "sex"))
sex_bmi_list.append(bmi_category_vs_anything("female", "sex"))

print(sex_bmi_list)

[{'male': {'Underweight': 1.2, 'Normal': 16.0, 'Overweight': 27.7, 'Obesity': 55.2}}, {'female': {'Underweight': 2.0, 'Normal': 17.8, 'Overweight': 29.8, 'Obesity': 50.5}}]


The dicitonary tells us that there are slightly more obese men than women, with 55.2% and 50.5% respectively.

# Percentage of Smokers

We reuse the function **data_percentage(data_list)**.

As we can see in the dictionary information, 20.5% of the patients are smoker and 79.5% are non smokers.

In [54]:
smoke_status_dict = data_percentage(smoker)
smoke_status_dict["smoker"] = smoke_status_dict.pop("yes")
smoke_status_dict["non-smoker"] = smoke_status_dict.pop("no")
print(smoke_status_dict)

{'smoker': 20.5, 'non-smoker': 79.5}


# Average Insurance Cost by Smoking Status
We assume that the smoking status of a patient is relevant to the charges they will have, condisering that smoking is extremely harmful to a patients health, which is why we will also analize the average insurance cost depending on smoking status. 

In [106]:
def cost_smoker_status(argument, header):
    total = 0
    n_patients = 0
    for patient in patients_list:
        if patient[header] == argument:
            n_patients += 1
            total += patient["charges"]
    cost_smoker_status_dict = {argument: round(total/n_patients,2)}
    return cost_smoker_status_dict

cost_smoker_status_list = []

smokers = cost_smoker_status("yes", "smoker")
nonsmokers = cost_smoker_status("no", "smoker")
cost_smoker_status_list.append(smokers)
cost_smoker_status_list.append(nonsmokers)
cost_smoker = smokers["yes"]
cost_nonsmoker = nonsmokers["no"]

smoker_pay_more = round(((cost_smoker - cost_nonsmoker)/cost_nonsmoker)*100, 1)
print(cost_smoker_status_list)

print("On average, smokers pay {cost}% more compared to non-smokers".format(cost = smoker_pay_more))

[{'yes': 32050.23}, {'no': 8434.27}]
On average, smokers pay 280.0% more compared to non-smokers


# Average insurance cost in each BMI category


In [128]:
def bmi_category_avg_cost(header):
    cost_under = 0
    cost_normal = 0
    cost_over = 0
    cost_obesity = 0
    n_under = 0
    n_normal = 0
    n_over = 0
    n_obesity = 0
    for patient in patients_list:
        if patient["bmi"] <= 18.5:
            n_under += 1
            cost_under += patient[header]
        if 25 >= patient["bmi"] > 18.5:
            n_normal += 1
            cost_normal += patient[header]
        if 29.9 >= patient["bmi"] > 25:
            n_over += 1
            cost_over += patient[header]
        if patient["bmi"] > 29.9:
            n_obesity += 1
            cost_obesity += patient[header]
            
    percentage_nor_vs_obe = round((((cost_obesity/n_obesity)-(cost_normal/n_normal))/(cost_normal/n_normal))*100, 1)
    pencentage_nor_over = round((((cost_over/n_over)-(cost_normal/n_normal))/(cost_normal/n_normal))*100, 1)
    print ("On average, Obese patients pay {x}% more than patients with Normal BMI's. But patients considerered Overweigth only pay {y}% more than patients with Normal BMI's".format(x = percentage_nor_vs_obe, y = pencentage_nor_over))
    
    bmi_category_cost = {"Underweight": round(cost_under/n_under, 2), "Normal": round(cost_normal/n_normal, 2), "Overweight": round(cost_over/n_over, 2), "Obesity": round(cost_obesity/n_obesity, 2)}
    
    return bmi_category_cost

print(bmi_category_avg_cost("charges"))

On average, Obese patients pay 49.0% more than patients with Normal BMI's. But patients considerered Overweigth only pay 5.3% more than patients with Normal BMI's
{'Underweight': 8657.62, 'Normal': 10435.44, 'Overweight': 10989.85, 'Obesity': 15552.34}


## Average cost of smokers and non-smokers and also based on which BMI category they fall into

In [164]:
def bmi_smoker_cost(header, argument, status):
    cost_under = 0
    cost_normal = 0
    cost_over = 0
    cost_obesity = 0
    n_under = 0
    n_normal = 0
    n_over = 0
    n_obesity = 0
    for patient in patients_list:
        if patient["smoker"] == argument:
            if patient["bmi"] <= 18.5:
                n_under += 1
                cost_under += patient[header]
            if 25 >= patient["bmi"] > 18.5:
                n_normal += 1
                cost_normal += patient[header]
            if 29.9 >= patient["bmi"] > 25:
                n_over += 1
                cost_over += patient[header]
            if patient["bmi"] > 29.9:
                n_obesity += 1
                cost_obesity += patient[header]
              
    bmi_smoker_cost_dict = {status: {"Underweight": round(cost_under/n_under, 2), "Normal": round(cost_normal/n_normal, 2), "Overweight": round(cost_over/n_over, 2), "Obesity": round(cost_obesity/n_obesity, 2)}}
    
    return bmi_smoker_cost_dict

bmi_smoker_cost_list = []
x = bmi_smoker_cost("charges", "yes", "smoker")
y = bmi_smoker_cost("charges", "no", "non-smoker")
bmi_smoker_cost_list.append(x)
bmi_smoker_cost_list.append(y)
print(bmi_smoker_cost_list)

val1 = x["smoker"]["Obesity"]
val2 = y["non-smoker"]["Obesity"]
val3 = round(((val1 - val2)/ val2)*100, 1)

print("On average, a patients smoking status influences the cost of their insurance a lot more than their BMI category. For example, a person who is obese and smokes pays ${val1} on average and a non-smoker obese patient pays ${val2}. This means that smoking obese patients pay {val3}% more than non-smokers.".format(val1 = val1, val2 = val2, val3 = val3))

[{'smoker': {'Underweight': 18809.83, 'Normal': 19942.22, 'Overweight': 22495.87, 'Obesity': 41557.99}}, {'non-smoker': {'Underweight': 5485.06, 'Normal': 7734.65, 'Overweight': 8243.26, 'Obesity': 8842.69}}]
On average, a patients smoking status influences the cost of their insurance a lot more than their BMI category. For example, a person who is obese and smokes pays $41557.99 on average and a non-smoker obese patient pays $8842.69. This means that smoking obese patients pay 370.0% more than non-smokers.


According to the National Center for Biothechnology Information, estimates suggest that obesity accounts for 5 to 15% of deaths each year in the United States and smoking for 18% which is why the charges don't really seem to consider the risks accounted for obese patients more than they do for smokers and non-smokers.