# U.S. Medical Insurance Costs
In this project, using Python's CSV module, I analyze __insurance.csv__ provided by Codecademy. 

In [2]:
#First, I have to import the CSV module
import csv

After taking a look at __insurance.csv__, I created seven empty lists for the different attributes in the dataset.

In [3]:
ages = []
sexes = []
bmis = []
num_of_children = []
smoking = []
regions = []
insurance_charges = []

To fill my empty lists with the relevant data, I wrote a function that iterates through the rows and appends the attribute (specified in the function's parameter) to the list.

In [4]:
def list_filler(csv_file, list, column):
    with open(csv_file) as open_file:
        file = csv.DictReader(open_file)
        for row in file:
            list.append(row[column])

list_filler("insurance.csv", ages, "age")
list_filler("insurance.csv", sexes, "sex")
list_filler("insurance.csv", bmis, "bmi")
list_filler("insurance.csv", num_of_children, "children")
list_filler("insurance.csv", smoking, "smoker")
list_filler("insurance.csv", regions, "region")
list_filler("insurance.csv", insurance_charges, "charges")

After determining the __average age__, and the __male-to-female ratio__ in the dataset, the scope of my analysis is to find the relationship between:
-  age, smoking, and the insurance charges
-  smoking and gender
-  number of children and insurance charges
-  smoking status and BMI
-  smoking and the region, in regards to and regardless of gender

I also wanted to write a method that creates a dictionary of the data for additional future analysis, `dictionary_creator()`.

To keep my code modular, I created the InsuranceInfo class with different analytic tools:
-  `average_age()`
-  `male_to_female()`
-  `tuple_creator()`
-  `smoking_by_gender()`
-  `average_charges_by_age_smoking()`
-  `children_insurance_charges()`
-  `smoking_and_avg_bmi()`
-  `smoking_by_region_gender()`
-  `smoking_and_num_of_children()`

In [5]:
class InsuranceInfo:
    def __init__(self, ages, sexes, bmis, num_of_children, smoking, regions, insurance_charges):
        self.ages = ages
        self.sexes = sexes
        self.bmis = bmis
        self.num_of_children = num_of_children
        self.smoking = smoking
        self.regions = regions
        self.insurance_charges = insurance_charges
    
    def average_age(self):
        total_age = 0
        for age in self.ages:
            total_age += int(age)
        return "The average age is " + str(round(total_age/len(self.ages), 2))
    
    def male_to_female(self):
        males = 0
        females = 0
        total = len(self.sexes)
        for sex in self.sexes:
            if sex == "male":
                males += 1
            if sex == "female":
                females += 1
        return "The number of males is {males} ({percentage_males}%), the number of females is {females} ({percentage_females}%)".format(males=males, percentage_males=round((males/total * 100), 2), females=females, percentage_females=round((females/total * 100), 2))
    
    def tuple_creator(self):
        list_to_return = []
        for i in range(len(self.ages)):
            list_to_return.append({"age": int(self.ages[i]), "sex": self.sexes[i], "bmi": float(self.bmis[i]), "num_of_children": int(self.num_of_children[i]), "smoking": self.smoking[i], "region": self.regions[i], "insurance_charges": float(self.insurance_charges[i])})
        return list_to_return
 
    def dictionary_creator(self):
        dict_to_return = {}
        for i in range(len(ages)):
            dict_to_return[i] = {"age": int(ages[i]), "sex": sexes[i], "bmi": float(bmis[i]), "num_of_children": int(num_of_children[i]), "smoking": smoking[i], "region": regions[i], "insurance_charges": float(insurance_charges[i])}
        return dict_to_return

    def smoking_by_gender(self):
        smoking_tuple = self.tuple_creator()
        smoking_male = 0
        non_smoking_male = 0
        smoking_female = 0
        non_smoking_female = 0
        total = len(smoking_tuple)
        for person in smoking_tuple:
            if person["sex"] == "male" and person["smoking"] == "yes":
                smoking_male += 1
            elif person["sex"] == "male" and person["smoking"] == "no":
                non_smoking_male += 1
            elif person["sex"] == "female" and person["smoking"] == "yes":
                smoking_female += 1
            else:
                non_smoking_female += 1
        return """Smoking males: {smoking_male} ({percentage_s_m}%)
Non-smoking males: {non_smoking_male} ({percentage_n_s_m}%)
Smoking females: {smoking_female} ({percentage_s_f}%)
Non-smoking females: {non_smoking_female} ({percentage_n_s_f}%)""".format(smoking_male=smoking_male, percentage_s_m=round((smoking_male/total * 100), 2), non_smoking_male=non_smoking_male, percentage_n_s_m=round((non_smoking_male/total * 100), 2), smoking_female=smoking_female, percentage_s_f=round((smoking_female/total * 100), 2), non_smoking_female=non_smoking_female, percentage_n_s_f=round((non_smoking_female/total * 100), 2))
    
    def average_charges_by_age_smoking(self):
        age_18_30 = []
        age_31_45 = []
        age_46_60 = []
        age_over_60 = []
        insurance_tuple = self.tuple_creator()
        for person in insurance_tuple:
            if person["age"] >= 18 and person["age"] <= 30:
                age_18_30.append(person)
            elif person["age"] > 30 and person["age"] <= 45:
                age_31_45.append(person)
            elif person["age"] > 45 and person["age"] <= 60:
                age_46_60.append(person)
            else:
                age_over_60.append(person)
                
        total_smoking_18_30 = 0
        t_s_18_30_count = 0
        total_non_smoking_18_30 = 0
        t_n_s_18_30_count = 0
        total_smoking_31_45 = 0
        t_s_31_45_count = 0
        total_non_smoking_31_45 = 0
        t_n_s_31_45_count = 0
        total_smoking_46_60 = 0
        t_s_46_60_count = 0
        total_non_smoking_46_60 = 0
        t_n_s_46_60_count = 0
        total_smoking_over_60 = 0
        t_s_over_60_count = 0
        total_non_smoking_over_60 = 0
        t_n_s_over_60_count = 0
        for person in age_18_30:
            if person["smoking"] == "yes":
                total_smoking_18_30 += person["insurance_charges"]
                t_s_18_30_count += 1
            else:
                total_non_smoking_18_30 += person["insurance_charges"]
                t_n_s_18_30_count += 1
        for person in age_31_45:
            if person["smoking"] == "yes":
                total_smoking_31_45 += person["insurance_charges"]
                t_s_31_45_count += 1
            else:
                total_non_smoking_31_45 += person["insurance_charges"]
                t_n_s_31_45_count += 1
        for person in age_46_60:
            if person["smoking"] == "yes":
                total_smoking_46_60 += person["insurance_charges"]
                t_s_46_60_count += 1
            else:
                total_non_smoking_46_60 += person["insurance_charges"]
                t_n_s_46_60_count += 1
        for person in age_over_60:
            if person["smoking"] == "yes":
                total_smoking_over_60 += person["insurance_charges"]
                t_s_over_60_count += 1
            else:
                total_non_smoking_over_60 += person["insurance_charges"]
                t_n_s_over_60_count += 1
                
        return """The average insurance charges, grouped by age groups and smoking status:
18-30 smoking: {average_1830_s}
18-30 non-smoking: {average_1830_ns}
31-45 smoking: {average_3145_s}
31-45 non-smoking: {average_3145_ns}
46-60 smoking: {average_4660_s}
46-60 non-smoking {average_4660_ns}
60+ smoking: {average_over60_s}
60+ non-smoking: {average_over60_ns}""".format(average_1830_s=round((total_smoking_18_30/t_s_18_30_count), 3), average_1830_ns=round((total_non_smoking_18_30/t_n_s_18_30_count), 3), average_3145_s=round((total_smoking_31_45/t_s_31_45_count), 3), average_3145_ns=round((total_non_smoking_31_45/t_n_s_31_45_count), 3), average_4660_s=round((total_smoking_46_60/t_s_46_60_count), 3), average_4660_ns=round((total_non_smoking_46_60/t_n_s_46_60_count), 3), average_over60_s=round((total_smoking_over_60/t_s_over_60_count), 3), average_over60_ns=round((total_non_smoking_over_60/t_n_s_over_60_count), 3))
        
        
    def children_insurance_charges(self):
        children_0_count = 0
        children_0_total = 0
        children_1_count = 0
        children_1_total = 0
        children_2_count = 0
        children_2_total = 0
        children_3_count = 0
        children_3_total = 0
        children_4_count = 0
        children_4_total = 0
        children_5plus_count = 0
        children_5plus_total = 0
        children_tuple = self.tuple_creator()
        for person in children_tuple:
            if person["num_of_children"] == 0:
                children_0_count += 1
                children_0_total += person["insurance_charges"]
            elif person["num_of_children"] == 1:
                children_1_count += 1
                children_1_total += person["insurance_charges"]
            elif person["num_of_children"] == 2:
                children_2_count += 1
                children_2_total += person["insurance_charges"]
            elif person["num_of_children"] == 3:
                children_3_count += 1
                children_3_total += person["insurance_charges"]
            elif person["num_of_children"] == 4:
                children_4_count += 1
                children_4_total += person["insurance_charges"]
            else:
                children_5plus_count += 1
                children_5plus_total += person["insurance_charges"]
        average_0 = children_0_total/children_0_count
        average_1 = children_1_total/children_1_count
        average_2 = children_2_total/children_2_count
        average_3 = children_3_total/children_3_count
        average_4 = children_4_total/children_4_count
        average_5plus = children_5plus_total/children_5plus_count
        return """Average insurance cost by number of children:
0 children: {average_0}
1 child: {average_1}
2 children: {average_2}
3 children: {average_3}
4 children: {average_4}
5 children or more: {average_5plus}""".format(average_0=round(average_0, 2), average_1=round(average_1, 2), average_2=round(average_2, 2), average_3=round(average_3, 2), average_4=round(average_4, 2), average_5plus=round(average_5plus, 2))
    
    def smoking_and_avg_bmi(self):
        smokers = 0
        smokers_bmi_total = 0
        non_smokers = 0
        non_smokers_bmi_total = 0
        smokers_tuple = self.tuple_creator()
        for person in smokers_tuple:
            if person["smoking"] == "yes":
                smokers += 1
                smokers_bmi_total += person["bmi"]
            else:
                non_smokers += 1
                non_smokers_bmi_total += person["bmi"]
        smokers_avg_bmi = smokers_bmi_total/smokers
        non_smokers_avg_bmi = non_smokers_bmi_total/non_smokers
        return """Average BMI of smokers: {smokers_avg_bmi}
Average BMI of non-smokers: {non_smokers_avg_bmi}""".format(smokers_avg_bmi=round(smokers_avg_bmi, 2), non_smokers_avg_bmi=round(non_smokers_avg_bmi, 2))
        
    def smoking_by_region_gender(self):
        southwest_smoking_male = 0
        southwest_non_smoking_male = 0
        southwest_smoking_female = 0
        southwest_non_smoking_female = 0
        southeast_smoking_male = 0
        southeast_non_smoking_male = 0
        southeast_smoking_female = 0
        southeast_non_smoking_female = 0
        northwest_smoking_male = 0
        northwest_non_smoking_male = 0
        northwest_smoking_female = 0
        northwest_non_smoking_female = 0
        northeast_smoking_male = 0
        northeast_non_smoking_male = 0
        northeast_smoking_female = 0
        northeast_non_smoking_female = 0
        smoking_tuple = self.tuple_creator()
        for person in smoking_tuple:
            if person["smoking"] == "yes" and person["sex"] == "male" and person["region"] == "southwest":
                southwest_smoking_male += 1
            elif person["smoking"] == "no" and person["sex"] == "male" and person["region"] == "southwest":
                southwest_non_smoking_male += 1
            elif person["smoking"] == "yes" and person["sex"] == "female" and person["region"] == "southwest":
                southwest_smoking_female += 1
            elif person["smoking"] == "no" and person["sex"] == "female" and person["region"] == "southwest":
                southwest_non_smoking_female += 1
            elif person["smoking"] == "yes" and person["sex"] == "male" and person["region"] == "southeast":
                southeast_smoking_male += 1
            elif person["smoking"] == "no" and person["sex"] == "male" and person["region"] == "southeast":
                southwest_non_smoking_male += 1
            elif person["smoking"] == "yes" and person["sex"] == "female" and person["region"] == "southeast":
                southwest_smoking_female += 1
            elif person["smoking"] == "no" and person["sex"] == "female" and person["region"] == "southeast":
                southwest_non_smoking_female += 1
            elif person["smoking"] == "yes" and person["sex"] == "male" and person["region"] == "northwest":
                northwest_smoking_male += 1
            elif person["smoking"] == "no" and person["sex"] == "male" and person["region"] == "northwest":
                northwest_non_smoking_male += 1
            elif person["smoking"] == "yes" and person["sex"] == "female" and person["region"] == "northwest":
                northwest_smoking_female += 1
            elif person["smoking"] == "no" and person["sex"] == "female" and person["region"] == "northwest":
                northwest_non_smoking_female += 1
            elif person["smoking"] == "yes" and person["sex"] == "male" and person["region"] == "northeast":
                northeast_smoking_male += 1
            elif person["smoking"] == "no" and person["sex"] == "male" and person["region"] == "northeast":
                northeast_non_smoking_male += 1
            elif person["smoking"] == "yes" and person["sex"] == "female" and person["region"] == "northeast":
                northeast_smoking_female += 1
            elif person["smoking"] == "no" and person["sex"] == "female" and person["region"] == "northeast":
                northeast_non_smoking_female += 1
        return """Of the total number of people ({total}) in the dataset:
Smokers:
    Southwest: {southwest_smoking}% ({southwest_smoking_men}% men, {southwest_smoking_women}% women)
    Southeast: {southeast_smoking}% ({southeast_smoking_men}% men, {southeast_smoking_women}% women)
    Northwest: {northwest_smoking}% ({northwest_smoking_men}% men, {northwest_smoking_women}% women)
    Northeast: {northeast_smoking}% ({northeast_smoking_men}% men, {northeast_smoking_women}% women)
Non-smokers:
    Southwest: {southwest_non_smoking}% ({southwest_non_smoking_men}% men, {southwest_non_smoking_women}% women)
    Southeast: {southeast_non_smoking}% ({southeast_non_smoking_men}% men, {southeast_non_smoking_women}% women)
    Northwest: {northwest_non_smoking}% ({northwest_non_smoking_men}% men, {northwest_non_smoking_women}% women)
    Northeast: {northeast_non_smoking}% ({northeast_non_smoking_men}% men, {northeast_non_smoking_women}% women)""".format(total=len(smoking_tuple), southwest_smoking=(round((southwest_smoking_male+southwest_smoking_female)/len(smoking_tuple)*100, 2)), southwest_smoking_men=round((southwest_smoking_male/len(smoking_tuple)*100), 2), southwest_smoking_women=round((southwest_smoking_female/len(smoking_tuple)*100), 2),\
                                                                                                                          southeast_smoking=round(((southeast_smoking_male+southeast_smoking_female)/len(smoking_tuple)*100), 2), southeast_smoking_men=round((southeast_smoking_male/len(smoking_tuple)*100), 2), southeast_smoking_women=round((southeast_smoking_female/len(smoking_tuple)*100), 2),\
                                                                                                                          northwest_smoking=round(((northwest_smoking_male+northwest_smoking_female)/len(smoking_tuple)*100), 2), northwest_smoking_men=round((northwest_smoking_male/len(smoking_tuple)*100), 2), northwest_smoking_women=round((northwest_smoking_female/len(smoking_tuple)*100), 2),\
                                                                                                                          northeast_smoking=round(((northeast_smoking_male+northeast_smoking_female)/len(smoking_tuple)*100), 2), northeast_smoking_men=round((northeast_smoking_male/len(smoking_tuple)*100), 2), northeast_smoking_women=round((northeast_smoking_female/len(smoking_tuple)*100), 2),\
                                                                                                                          southwest_non_smoking=round(((southwest_non_smoking_male+southwest_non_smoking_female)/len(smoking_tuple)*100), 2), southwest_non_smoking_men=round((southwest_non_smoking_male/len(smoking_tuple)*100), 2), southwest_non_smoking_women=round((southwest_non_smoking_female/len(smoking_tuple)*100), 2),\
                                                                                                                          southeast_non_smoking=round(((southeast_non_smoking_male+southeast_non_smoking_female)/len(smoking_tuple)*100), 2), southeast_non_smoking_men=round((southeast_non_smoking_male/len(smoking_tuple)*100), 2), southeast_non_smoking_women=round((southeast_non_smoking_female/len(smoking_tuple)*100), 2),\
                                                                                                                          northwest_non_smoking=round(((northwest_non_smoking_male+northwest_non_smoking_female)/len(smoking_tuple)*100), 2), northwest_non_smoking_men=round((northwest_non_smoking_male/len(smoking_tuple)*100), 2), northwest_non_smoking_women=round((northwest_non_smoking_female/len(smoking_tuple)*100), 2),\
                                                                                                                          northeast_non_smoking=round(((northeast_non_smoking_male+northeast_non_smoking_female)/len(smoking_tuple)*100), 2), northeast_non_smoking_men=round((northeast_non_smoking_male/len(smoking_tuple)*100), 2), northeast_non_smoking_women=round((northeast_non_smoking_female/len(smoking_tuple)*100), 2))
            

insurance = InsuranceInfo(ages, sexes, bmis, num_of_children, smoking, regions, insurance_charges)

### The average age and male-to-female ratio:

In [6]:
print(insurance.average_age())
print(insurance.male_to_female())

The average age is 39.21
The number of males is 676 (50.52%), the number of females is 662 (49.48%)


### Smoking and gender
Looking at the dataset in regards to smoking and gender, the majority of people are non-smokers, but there's no significant difference between the number of males and females. The relationship between the number of male and female smokers to their non-smoking counterparts is roughly similar.

In [7]:
print(insurance.smoking_by_gender())

Smoking males: 159 (11.88%)
Non-smoking males: 517 (38.64%)
Smoking females: 115 (8.59%)
Non-smoking females: 547 (40.88%)


### Average insurance charges by age and smoking
I wrote the `average_charges_by_age_smoking()` method to see how much of a difference smoking means to people of different age. Without knowing the insurance contract terms it seems that the younger the person, the bigger the difference is between the insurance charges of smokers and non-smokers:

In [8]:
print(insurance.average_charges_by_age_smoking())

The average insurance charges, grouped by age groups and smoking status:
18-30 smoking: 27528.078
18-30 non-smoking: 4462.309
31-45 smoking: 31707.164
31-45 non-smoking: 7246.17
46-60 smoking: 36451.732
46-60 non-smoking 12188.334
60+ smoking: 38929.615
60+ non-smoking: 15366.613


### Number of children and insurance charges
On average, people with 2-3 children pay the most in insurance charges:

In [9]:
print(insurance.children_insurance_charges())

Average insurance cost by number of children:
0 children: 12365.98
1 child: 12731.17
2 children: 15073.56
3 children: 15355.32
4 children: 13850.66
5 children or more: 8786.04


### Smoking status and BMI
In this dataset there doesn't seem to be any significant difference between smokers and non-smokers, regarding BMI:

In [10]:
print(insurance.smoking_and_avg_bmi())

Average BMI of smokers: 30.71
Average BMI of non-smokers: 30.65


### Smoking by region and gender
After using the `smoking_by_region_gender()` method, it's obvious that the relative majority of people are from the Southwest, and the Southeast is very under represented:

In [11]:
print(insurance.smoking_by_region_gender())

Of the total number of people (1338) in the dataset:
Smokers:
    Southwest: 7.03% (2.77% men, 4.26% women)
    Southeast: 4.11% (4.11% men, 0.0% women)
    Northwest: 4.33% (2.17% men, 2.17% women)
    Northeast: 5.01% (2.84% men, 2.17% women)
Non-smokers:
    Southwest: 40.36% (19.43% men, 20.93% women)
    Southeast: 0.0% (0.0% men, 0.0% women)
    Northwest: 19.96% (9.87% men, 10.09% women)
    Northeast: 19.21% (9.34% men, 9.87% women)
