# U.S. Medical Insurance Costs

In this project, I will analyze some data about the US health insurance costs with the knowledge of fundamentals of Python that I learned in Codecademy.

## Import the data

I import my data with the help of Python's module CSV and saved him in the list "insurance_list" where every member of the list is a dictionary with the information of every person that is in the file.

In [16]:
import csv
with open("insurance.csv") as insurance_csv:
    insurance_reader = csv.DictReader(insurance_csv)
    insurance_list = [row for row in insurance_reader]

## Divide by region

Then I divide the data by regions to help with my analyze

In [17]:
southwest_region = [insurance for insurance in insurance_list if insurance["region"] == "southwest"]
southeast_region = [insurance for insurance in insurance_list if insurance["region"] == "southeast"]
northwest_region = [insurance for insurance in insurance_list if insurance["region"] == "northwest"]
northeast_region = [insurance for insurance in insurance_list if insurance["region"] == "northeast"]
regions = [southwest_region, southeast_region, northwest_region, northeast_region]

## Average of the data

In this part, I create a function that calculates the average of any parameter such as age, BMI, number of children, etc.

In [18]:
def average_function(persons,parameter):
    soma = 0
    for person in persons:
        soma += float(person[parameter])
    return soma/(len(persons))

In [19]:
#average age
average_age = average_function(insurance_list,"age")
"The average age in the data is {}".format(round(average_age,1))

'The average age in the data is 39.2'

In [20]:
#average bmi
average_bmi = average_function(insurance_list,"bmi")
"The average BMI value is {}".format(round(average_bmi,3))

'The average BMI value is 30.663'

In [21]:
#average children
average_children = average_function(insurance_list,"children")
"These people have on average {} children.".format(round(average_children,2))

'These people have on average 1.09 children.'

In [22]:
#average insurance cost
average_insurance_cost = {"Southwest":round(average_function(southwest_region,"charges"),3), "Southeast": round(average_function(southeast_region,"charges"),3), 
                          "Northwest":round(average_function(northwest_region,"charges"),3), "Northeast":round(average_function(northeast_region,"charges"),3)}
"The average insurance cost by region is: {}.".format(average_insurance_cost)

"The average insurance cost by region is: {'Southwest': 12346.937, 'Southeast': 14735.411, 'Northwest': 12417.575, 'Northeast': 13406.385}."

## Number of smokers

Now I create a function that calculate the number of smokers by region.

In [23]:
#Number of smokers
def number_smoker(regions):
    soma = 0
    for region in regions:
        if region["smoker"] == "yes":
            soma += 1
    return soma

smokers = {"Southwest":number_smoker(southwest_region), "Southeast": number_smoker(southeast_region), 
            "Northwest":number_smoker(northwest_region), "Northeast":number_smoker(northeast_region)}
"The number of smokers by region are: {}".format(smokers)

"The number of smokers by region are: {'Southwest': 58, 'Southeast': 91, 'Northwest': 58, 'Northeast': 67}"

## BMI classification

Now with a function, I divide the people by number of BMI with the following categories: "Underweight" with a BMI less than 18.5, "Normal weight" with a BMI between 18.5 and 25, "Overweight" with a BMI between 25 and 30 and finally "Obesity" with a BMI more than 30.

In [24]:
#Classification bmi
def bmi_classification_function(persons):
    underweight = 0
    normal_weight = 0
    overweight = 0
    obesity = 0
    for person in persons:
        if float(person["bmi"]) < 18.5:
            underweight += 1
        elif 18.5 <= float(person["bmi"]) < 25:
            normal_weight += 1
        elif 25 <= float(person["bmi"]) < 30:
            overweight += 1
        else:
            obesity += 1
    return {"Underweight": underweight, "Normal weight": normal_weight, "Overweight":overweight, "Obesity":obesity}

In [27]:
bmi_classification_regions = [bmi_classification_function(region) for region in regions]
region_name = ["Southwest", "Southeast",  "Northwest", "Northeast"]
for i in range(len(regions)):
    print("{region}: {bmi_classification}".format(region=region_name[i], bmi_classification=bmi_classification_regions[i]))

Southwest: {'Underweight': 3, 'Normal weight': 48, 'Overweight': 101, 'Obesity': 173}
Southeast: {'Underweight': 0, 'Normal weight': 41, 'Overweight': 80, 'Obesity': 243}
Northwest: {'Underweight': 7, 'Normal weight': 63, 'Overweight': 107, 'Obesity': 148}
Northeast: {'Underweight': 10, 'Normal weight': 73, 'Overweight': 98, 'Obesity': 143}


In [29]:
bmi_classification_general = bmi_classification_function(insurance_list)
"All regions: {bmi_classification}".format(bmi_classification=bmi_classification_general)

"All regions: {'Underweight': 20, 'Normal weight': 225, 'Overweight': 386, 'Obesity': 707}"

## Age classification

Now I divide every person by age into the following groups: 0-20 years old, 20-30 years old, 30-40 years old, 40-50 years old, 50-60 years old, and finally more than 60 years.

In [31]:
def age_classification_function(persons):
    age_0_20 = 0
    age_20_30 = 0
    age_30_40 = 0
    age_40_50 = 0
    age_50_60 = 0
    age_60 = 0
    for person in persons:
        if float(person["age"]) <= 20:
            age_0_20 += 1
        elif 20 < float(person["age"]) <= 30:
            age_20_30 += 1
        elif 30 < float(person["age"]) <= 40:
            age_30_40 += 1
        elif 40 < float(person["age"]) <= 50:
            age_40_50 += 1
        elif 50 < float(person["age"]) <= 60:
            age_50_60 += 1
        else:
            age_60 += 1
    return {"0-20": age_0_20, "20-30": age_20_30, "30-40": age_30_40, "40-50": age_40_50, "50-60":age_50_60, ">60":age_60}


In [32]:
age_classification_regions = [age_classification_function(region) for region in regions]
region_name = ["Southwest", "Southeast",  "Northwest", "Northeast"]
for i in range(len(regions)):
    print("{region}: {age_classification}".format(region=region_name[i], age_classification=age_classification_regions[i]))

Southwest: {'0-20': 39, '20-30': 67, '30-40': 62, '40-50': 69, '50-60': 66, '>60': 22}
Southeast: {'0-20': 48, '20-30': 75, '30-40': 69, '40-50': 78, '50-60': 69, '>60': 25}
Northwest: {'0-20': 41, '20-30': 67, '30-40': 64, '40-50': 66, '50-60': 64, '>60': 23}
Northeast: {'0-20': 38, '20-30': 69, '30-40': 62, '40-50': 68, '50-60': 66, '>60': 21}


In [33]:
age_classification_general = age_classification_function(insurance_list)
"All regions: {age_classification}".format(age_classification=age_classification_general)

"All regions: {'0-20': 166, '20-30': 278, '30-40': 257, '40-50': 281, '50-60': 265, '>60': 91}"

## Number of children

Finally, I determinate the number of children of every person.

In [34]:
def children_classification_function(persons):
    zero_children = 0
    one_child = 0
    two_children = 0
    three_more_children = 0
    for person in persons:
        if float(person["children"]) == 0:
            zero_children += 1
        elif float(person["children"]) == 1:
            one_child += 1
        elif float(person["children"]) == 2:
            two_children += 1
        else:
            three_more_children += 1
    return {"0":zero_children, "1": one_child, "2": two_children, ">3":three_more_children}


In [35]:
children_classification_regions = [children_classification_function(region) for region in regions]
region_name = ["Southwest", "Southeast",  "Northwest", "Northeast"]
for i in range(len(regions)):
    print("{region}: {children_classification}".format(region=region_name[i], children_classification=children_classification_regions[i]))

Southwest: {'0': 138, '1': 78, '2': 57, '>3': 52}
Southeast: {'0': 157, '1': 95, '2': 66, '>3': 46}
Northwest: {'0': 132, '1': 74, '2': 66, '>3': 53}
Northeast: {'0': 147, '1': 77, '2': 51, '>3': 49}


In [36]:
children_classification_general = children_classification_function(insurance_list)
"All regions: {children_classification}".format(children_classification=children_classification_general)

"All regions: {'0': 574, '1': 324, '2': 240, '>3': 200}"

# Conclusion

What do we conclude about this data?
- The average age is 39 years old, and people have one child on average.
- Besides, the average BMI of the population is 30.663, which is in the obesity level.
- The region with the most number of smokers is in the Southeast of the USA.
- In terms of BMI, we can see that the number of people in every category is very similar except in the case of obesity, where the region of the Southeast has the most people with obesity.
- The distribution of age and number of children are very similar in every region, so this information isn't relevant for our analyses.
- Finally, the region with the most expensive average insurance cost is the Southeast region, which makes sense because have the most people with obesity and has more smokers. And the zones of The USA with de same number of smokers (Southwest and Northwest) have average insurance costs very similar, and the other regions with more smokers (Southeast and Northeast) have more expensive insurance costs. These facts show us that smoking is a factor that has been very influential for the medical insurance charges, based on my analysis.