# U.S. Medical Insurance Costs

This project will use Python to manipulate a set of U.S medical insurance data from 2018. The goal is to organize and analyze **insurance.csv** to learn more about the patient information provided.

In [12]:
import csv

First, the necessary libraries are imported, which in this case is just <code>csv</code>.

After inspecting **insurance.csv**, we find that the following pieces of information are given for each patient:
- Age
- Sex
- BMI
- Number of Children
- Smoker Status
- Region
- Charges

Empty lists are created for each of these, which will be populated by the corresponding values from the **.csv** file.

In [13]:
ages = []
sexes = []
bmis = []
num_children = []
smoker_stauses = []
regions = []
insurance_costs = []

Next, a function will be created to import **insurance.csv** to the python script and help in populating the above lists.

In [14]:
def load_data(lst, csv_file, column_name): 
    with open(csv_file) as csv_info:
        csv_dict = csv.DictReader(csv_info)
        for row in csv_dict:
            lst.append(row[column_name])

The above function allows us to minimize the repetition of opening the **.csv** file and adding the values from each column to each list. With this function, we can simply call it for each list and column we need. See this below.

In [15]:
load_data(ages, 'insurance.csv', 'age')
load_data(sexes, 'insurance.csv', 'sex')
load_data(bmis, 'insurance.csv', 'bmi')
load_data(num_children, 'insurance.csv', 'children')
load_data(smoker_stauses, 'insurance.csv', 'smoker')
load_data(regions, 'insurance.csv', 'region')
load_data(insurance_costs, 'insurance.csv', 'charges')

Now that our data is organized into lists, we can begin our analysis. There are many aspects we could look into, but we have chosen the following:
- find whether male or female insurance is more expensive, on average.
- find average costs per region.
- find the average cost for patients with a healthy BMI.
- find how much money is saved on average when a patient is not a smoker.

To find these answers, we will create a class called <code>PatientsInfo</code> that will contain the following methods:
- <code>analyze_sexes_costs()</code>
- <code>analyze_regions()</code>
- <code>analyze_healthy_bmi()</code>
- <code>analyze_smoking()</code>

We will also create a <code>make_dictionary()</code> method to put all the data into a python dictionary for possible future use.

In [16]:
class PatientsInfo:
    def __init__(self, ages, sexes, bmis, nums_children, smoker_statuses, regions, costs):
        self.ages = ages
        self.sexes = sexes
        self.bmis = bmis
        self.nums_children = nums_children
        self.smoker_statuses = smoker_statuses
        self.regions = regions
        self.costs = costs
    
    def analyze_sexes_costs(self):
        # separating male and female insurance costs into new lists
        male_costs = []
        female_costs = []
        sex_cost = list(zip(self.sexes, self.costs))
        for sex, cost in sex_cost:
            if sex == "male":
                male_costs.append(float(cost))
            else:
                female_costs.append(float(cost))
        # finding average cost for each sex rounded to two decimal places, and the difference between them
        male_average_cost = round(sum(male_costs) / len(male_costs), 2)
        female_average_cost = round(sum(female_costs) / len(female_costs), 2)
        cost_difference = abs(male_average_cost - female_average_cost)
        # returning a string with our results
        if male_average_cost < female_average_cost:
            return "The average cost for male insurance is {}, and the average cost for female insurance is {}, so male insurance is cheaper by {} on average.".format(male_average_cost, female_average_cost, cost_difference)
        else:
            return "The average cost for male insurance is ${}, and the average cost for female insurance is {}, so female insurance is cheaper by ${} on average.".format(male_average_cost, female_average_cost, cost_difference)

    def analyze_regions(self):
        # separating costs from each region into their own respective lists
        ne_costs = []
        se_costs = []
        sw_costs = []
        nw_costs = []
        region_cost = list(zip(self.regions, self.costs))
        for region, cost in region_cost:
            if region == 'northeast':
                ne_costs.append(float(cost))
            elif region == 'southeast':
                se_costs.append(float(cost))
            elif region == 'southwest':
                sw_costs.append(float(cost))
            elif region == 'northwest':
                nw_costs.append(float(cost))
        # finding average cost for each list, rounded to two decimal places
        ne_avg_cost = round(sum(ne_costs)/len(ne_costs), 2)
        se_avg_cost = round(sum(se_costs)/len(se_costs), 2)
        sw_avg_cost = round(sum(sw_costs)/len(sw_costs), 2)
        nw_avg_cost = round(sum(nw_costs)/len(nw_costs), 2)
        # printing a string with our results
        print("The average insurance cost per region of the U.S. is as follows:\nNortheast: ${} \nSoutheast: ${} \nSouthwest: ${} \nNorthwest: ${}".format(ne_avg_cost, se_avg_cost, sw_avg_cost, nw_avg_cost))

    def analyze_healthy_bmi(self):
        # according to the CDC, a healthy adult BMI is between 18.5 and 24.9 for both male and female
        # using a different approach than above to obtain averages
        # creating empty sum and count values for healthy and poor BMI patients' insurance costs
        healthy_bmi_cost_sum = 0
        healthy_count = 0
        poor_bmi_cost_sum = 0
        poor_count = 0
        # adding insurance cost to the correct sum depending on BMI, and incrementing the count values
        for bmi, cost in list(zip(bmis, insurance_costs)):
            if float(bmi) >= 18.5 and float(bmi) <= 24.9:
                healthy_bmi_cost_sum += float(cost)
                healthy_count += 1
            else:
                poor_bmi_cost_sum += float(cost)
                poor_count += 1
        # finding averages by dividing the cost sums by the count values, and rounding to two decimal places
        healthy_bmi_avg_cost = round(healthy_bmi_cost_sum/healthy_count, 2)
        poor_bmi_avg_cost = round(poor_bmi_cost_sum/poor_count, 2)
        avg_diff = poor_bmi_avg_cost - healthy_bmi_avg_cost
        # printing a string with our results
        print("According to the CDC, a healthy adult BMI is between 18.5 and 24.9 for both male and female. \nThe average insurance cost for those within a healthy BMI range is ${}, compared to ${} for those outside of a healthy range. That\'s a difference of ${}!".format(healthy_bmi_avg_cost, poor_bmi_avg_cost, avg_diff))

    def analyze_smoking(self):
        # finding smoker and non-smoker insurance cost sums via the approach used for analyze_healthy_bmi()
        smoker_cost_sum = 0
        smoker_count = 0
        nonsmoker_cost_sum = 0
        nonsmoker_count = 0
        for smoker, cost in list(zip(smoker_stauses, insurance_costs)):
            if smoker == 'yes':
                smoker_cost_sum += float(cost)
                smoker_count += 1
            else:
                nonsmoker_cost_sum += float(cost)
                nonsmoker_count += 1
        # finding averages by dividing the sums by the count values, rounding to two decimal places
        smoker_cost_avg = round(smoker_cost_sum/smoker_count, 2)
        nonsmoker_cost_avg = round(nonsmoker_cost_sum/nonsmoker_count, 2)
        avg_diff = smoker_cost_avg - nonsmoker_cost_avg
        # printing a string with our results
        print("On average, smokers pay ${} more for their insurance than non-smokers.".format(avg_diff))

    def make_dictionary(self):
        self.dictionary = {}
        self.dictionary["ages"] = [int(age) for age in self.ages]
        self.dictionary["sexes"] = self.sexes
        self.dictionary["bmis"] = self.bmis
        self.dictionary["num_children"] = self.nums_children
        self.dictionary["smoker_stauses"] = self.smoker_statuses
        self.dictionary["regions"] = self.regions
        self.dictionary["insurance_costs"] = self.costs
        return self.dictionary

Now we need to create an instance of the <code>PatientsInfo</code> class, and use each method to see the results of our analysis.

In [17]:
patient_info = PatientsInfo(ages, sexes, bmis, num_children, smoker_stauses, regions, insurance_costs)

In [18]:
patient_info.analyze_sexes_costs()

'The average cost for male insurance is $13956.75, and the average cost for female insurance is 12569.58, so female insurance is cheaper by $1387.17 on average.'

In [19]:
patient_info.analyze_regions()

The average insurance cost per region of the U.S. is as follows:
Northeast: $13406.38 
Southeast: $14735.41 
Southwest: $12346.94 
Northwest: $12417.58


In [20]:
patient_info.analyze_healthy_bmi()

According to the CDC, a healthy adult BMI is between 18.5 and 24.9 for both male and female. 
The average insurance cost for those within a healthy BMI range is $10379.5, compared to $13845.5 for those outside of a healthy range. That's a difference of $3466.0!


In [21]:
patient_info.analyze_smoking()

On average, smokers pay $23615.96 more for their insurance than non-smokers.


Now we have the results of the analyses we chose to do. There are so many more attributes we could explore, and with the modular structure of our code, we can easily do more analysis by adding more methods to our class <code>PatientsInfo</code>.

To provide another format as an option for further analysis, the last thing we will do is put our lists of data into a dictionary.

In [22]:
patient_info_dictionary = patient_info.make_dictionary()