# U.S. Medical Insurance Costs

In this project, a **CSV** file with medical insurance costs will be investigated using Python fundamentals. The goal with this project will be to analyze various attributes within **insurance.csv** to learn more about the patient information in the file and gain insight into potential use cases for the dataset.

In [4]:
# import csv library
import csv

To start, all necessary libraries must be imported. For this project the only library needed is the `csv` library in order to work with the **insurance.csv** data. There are other potential libraries that could help with this project; however, for this analysis, using just the `csv` library will suffice.

**insurance.csv** contains the following columns:
* Patient Age
* Patient Sex 
* Patient BMI
* Patient Number of Children
* Patient Smoking Status
* Patient U.S Geopraphical Region
* Patient Yearly Medical Insurance Cost

There are no signs of missing data. To store this information, seven empty lists will be created hold each individual column of data from **insurance.csv**.


In [5]:
#Create empty lists for the various attributes in insurance.csv
age = []
sex = []
bmi = []
children = []
smoker = []
region = []
charges = []

The helper function below was created to make loading data into the lists as efficient as possible. Without this function, one would have to open insurance.csv and rewrite the for loop seven times; 
however, with this function, one can simply call load_list_data() each time as shown below.

In [6]:
# helper function to load csv data
def load_list_data():
    with open('insurance.csv','r') as file:
        reader = csv.DictReader(file)
        
        for row in reader:
            age.append(int(row['age']))
            sex.append(row['sex'])
            bmi.append(float(row['bmi']))
            children.append(int(row['children']))
            smoker.append(row['smoker'])
            region.append(row['region'])
            charges.append(float(row['charges']))
           
load_list_data()



In [9]:
# look at the data in insurance_csv_dict
# Total 1338 Data

print(age[:5])
print(sex[:5])
print(bmi[:5])
print(children[:5])
print(smoker[:5])
print(region[:5])
print(charges[:5])

[19, 18, 28, 33, 32]
['female', 'male', 'male', 'male', 'male']
[27.9, 33.77, 33.0, 22.705, 28.88]
[0, 1, 3, 0, 0]
['yes', 'no', 'no', 'no', 'no']
['southwest', 'southeast', 'southeast', 'northwest', 'northwest']
[16884.924, 1725.5523, 4449.462, 21984.47061, 3866.8552]


Now that all the data from **insurance.csv** neatly organized into labeled lists, the analysis can be started. This is where one must plan out what to investigate and how to perform the analysis. There are many aspects of the data that could be looked into. The following operations will be implemented:
* find average age of the patients
* return the number of males vs. females counted in the dataset
* find geographical location of the patients
* return the average yearly medical charges of the patients
* return the effect of smoking on insurance costs
* return total insurance charges by age
* creating a dictionary that contains all patient information

To perform these inspections, a class called `PatientsInfo` has been built out which contains fives methods:
* `analyze_ages()`
* `analyze_sexes()`
* `unique_regions()`
* `average_charges()`
* `smoking_effects()`
* `analyze_charges_by_age()`
* `create_dictionary()`

The class has been built out below. 

In [11]:
class PatientsInfo:
    def __init__(self,age,sex,bmi,children,smoker,region,charges):
        self.age = age
        self.sex = sex
        self.bmi = bmi
        self.children = children
        self.smoker = smoker
        self.region = region
        self.charges = charges
        self.data_dict = {}
    
    def analyze_ages(self):
        # Find average age of the patients
        total_age = 0
        for age in self.age:
            total_age += age
        
        return f"Average Age : {total_age / len(self.age)}"
            
    
    def analyze_sexes(self):
        # return the number of males vs. females counted in the dataset
        # and more
        sex_counts = {}
        total_age_m = 0
        total_age_f = 0
        total_children_m = 0
        total_children_f = 0
        total_bmi_m = 0
        total_bmi_f = 0
        total_charges_m = 0
        total_charges_f = 0
        total_smoker_m = 0
        total_smoker_f = 0
        
        avg_age_m = 0
        avg_age_f = 0
        avg_children_m = 0
        avg_children_f = 0
        avg_bmi_m = 0
        avg_bmi_f = 0
        avg_charges_m = 0
        avg_charges_f = 0
        avg_smoker_m = 0
        avg_smoker_f = 0
        

        for i in self.sex:
            sex_counts[i] = self.sex.count(i)
            
        datas = self.create_dictionary()
        
        for keys,values in datas.items():
            if values['sex'] == 'female':
                total_age_f += values['age']
                total_children_f += values['children']
                total_bmi_f += values['bmi']
                total_charges_f += values['charges']
                total_smoker_f += values['smoker'].count('yes')
            else:
                total_age_m += values['age']
                total_children_m += values['children']
                total_bmi_m += values['bmi']
                total_charges_m += values['charges']
                total_smoker_m += values['smoker'].count('yes')
       
        avg_age_f = total_age_f / sex_counts['female']
        avg_age_m = total_age_m / sex_counts['male']
        avg_children_f = total_children_f / sex_counts['female']
        avg_children_m = total_children_m / sex_counts['male']
        avg_bmi_f = total_bmi_f / sex_counts['female']
        avg_bmi_m = total_bmi_m / sex_counts['male']
        avg_charges_f = total_charges_f / sex_counts['female']
        avg_charges_m = total_charges_m / sex_counts['male'] 
    
    
        percentage_of_f = 100 * sex_counts['female'] / (sex_counts['female'] + sex_counts['male']) 
        percentage_of_m = 100 - percentage_of_f
        
        info_message = f"There is {sex_counts['female']} female patient, {percentage_of_f:.2f}% of all patients and {sex_counts['male']} male patient, {percentage_of_m:.2f}% of all patients."
        
        return avg_age_f,avg_age_m,avg_children_f,avg_children_m,avg_bmi_f,avg_bmi_m,avg_charges_f,avg_charges_m,info_message
        
    
    def unique_regions(self):
        #return unique regions and how many times those regions are counted
        region_counts = {}
        
        for region in self.region:
            region_counts[region] = self.region.count(region)
        
        return region_counts
    
    def average_charges(self):
        #return the average yearly medical charges of the patients
        total_charges = 0
        for charge in self.charges:
            total_charges += charge
        
        return f"Average Charge : {(total_charges / len(self.charges)):.2f}"
    
    
    def smoking_effect(self):
        # returns the effect of smoking on insurance costs
        total_smoker_insurance_cost = 0
        total_nonsmoker_insurance_cost = 0
        
        total_smoker = 0
        total_nonsmoker = 0
        
        data = self.create_dictionary()
        for keys,values in data.items():
            if values["smoker"] == "yes":
                total_smoker_insurance_cost += values["charges"]
                total_smoker += 1
            else:
                total_nonsmoker_insurance_cost += values["charges"]
                total_nonsmoker += 1
        
        avg_smoker_insurance_cost = total_smoker_insurance_cost / total_smoker
        avg_nonsmoker_insurance_cost = total_nonsmoker_insurance_cost / total_nonsmoker
        
        percent_smokers = total_smoker / (total_smoker + total_nonsmoker) * 100
        percent_nonsmokers = 100 - percent_smokers
        percent_more =  (avg_smoker_insurance_cost - avg_nonsmoker_insurance_cost) / avg_smoker_insurance_cost * 100 
        
        return total_smoker,percent_smokers,total_nonsmoker,percent_nonsmokers,avg_smoker_insurance_cost,avg_nonsmoker_insurance_cost,percent_more
    
    def analyze_charges_by_age(self):
        # returns total insurance charges by age
        charges_by_age = {}
        
        data = self.create_dictionary()
        for keys,values in data.items():
            age = values["age"]
            charges = values["charges"]
            
            if age in charges_by_age:
                charges_by_age[age] += charges
            else:
                charges_by_age[age] = charges
        
        return charges_by_age
    
    
    
    def create_dictionary(self):
        
        for i in range(len(self.age)):
            self.data_dict[i] = {
                'age' : self.age[i],
                'sex' : self.sex[i],
                'bmi' : self.bmi[i],
                'children' : self.children[i],
                'smoker' : self.smoker[i],
                'region' : self.region[i],
                'charges' : self.charges[i]
            }
        return self.data_dict
    

The next step is to create an instance of the class called `patient_info`. With this instance, each method can be used to see the results of the analysis.

In [15]:
patient_info = PatientsInfo(age,sex,bmi,children,smoker,region,charges)
#patient_info.create_dictionary()

In [12]:
patient_info.analyze_ages()



'Average Age : 39.20702541106129'

The average age of the patients in **insurance.csv** is about 39 years old. 

In [44]:
patient_info.analyze_sexes()
print(f"Average Female Age : {patient_info.analyze_sexes()[0]:.2f}")
print(f"Average Male Age : {patient_info.analyze_sexes()[1]:.2f}")
print(f"Average number of children in females : {patient_info.analyze_sexes()[2]:.2f}")
print(f"Average number of children in males : {patient_info.analyze_sexes()[3]:.2f}")
print(f"Average BMI in females : {patient_info.analyze_sexes()[4]:.2f}")
print(f"Average BMI in males : {patient_info.analyze_sexes()[5]:.2f}")
print(f"Average charges in females : {patient_info.analyze_sexes()[6]:.2f}")
print(f"Average charges in males : {patient_info.analyze_sexes()[7]:.2f}")
print(patient_info.analyze_sexes()[-1])

Average Female Age : 39.50
Average Male Age : 38.92
Average number of children in females : 1.07
Average number of children in males : 1.12
Average BMI in females : 30.38
Average BMI in males : 30.94
Average charges in females : 12569.58
Average charges in males : 13956.75
There is 662 female patient, 49.48% of all patients and 676 male patient, 50.52% of all patients.


There are four unique geographical regions in this dataset, and it is important to note that all the patients come from the United States.

In [24]:
patient_info.unique_regions()

{'southwest': 325, 'southeast': 364, 'northwest': 325, 'northeast': 324}

In [25]:
patient_info.average_charges()

'Average Charge : 13270.42'

The average yearly medical insurance charge per individual is 13270 US dollars.

In [77]:
patient_info.smoking_effect()
print(f"Total number of smokers : {patient_info.smoking_effect()[0]} ({patient_info.smoking_effect()[1]:.2f}%)")
print(f"Total number of nonsmokers : {patient_info.smoking_effect()[2]} ({patient_info.smoking_effect()[3]:.2f}%)")
print(f"Average smoker insurance costs : {patient_info.smoking_effect()[4]:.2f}")
print(f"Average non-smoker insurance costs : {patient_info.smoking_effect()[5]:.2f}")
print(f"The insurance costs of smokers are {patient_info.smoking_effect()[6]:.2f}% more than non-smokers.")

Total number of smokers : 274 (20.48%)
Total number of nonsmokers : 1064 (79.52%)
Average smoker insurance costs : 32050.23
Average non-smoker insurance costs : 8434.27
The insurance costs of smokers are 73.68% more than non-smokers.


Above, there is an analysis of the insurance costs paid by smokers and non -smokers.

In [80]:
patient_info.analyze_charges_by_age()

{19: 662857.8347499999,
 18: 488949.0113890001,
 28: 253937.25179999994,
 33: 321139.85766999994,
 32: 239727.80756000004,
 31: 275318.47547999996,
 46: 415935.12851999997,
 37: 450497.79693,
 60: 505526.62567,
 25: 275474.2287,
 62: 440768.70119,
 23: 347754.9611099999,
 56: 390663.41175,
 27: 341171.64820000005,
 52: 529431.8218599999,
 30: 343415.9796699999,
 34: 301951.73114,
 59: 472396.73828999995,
 63: 457354.96460000006,
 55: 420278.1827,
 22: 280362.11845,
 26: 171747.10864,
 35: 282679.55078000005,
 24: 298144.44694,
 41: 260651.13254,
 38: 202568.34185,
 36: 305111.90345,
 21: 132453.00123,
 48: 424342.51290999993,
 40: 317850.78537,
 58: 346973.20279,
 53: 448586.06114000006,
 43: 520216.52363999997,
 64: 512061.6784199999,
 20: 294631.23435000004,
 61: 506562.52499999997,
 44: 428203.70785,
 57: 427626.8165,
 29: 281614.28563000006,
 45: 430075.79583,
 54: 525239.30131,
 49: 355488.1754,
 47: 511965.9882,
 51: 454785.4201500001,
 42: 352648.04406,
 50: 454227.09572000016,


Above there are total insurance costs paid by different age groups.