# U.S. Medical Insurance Costs

In this project, a CSV file with medical insurance costs will be investigated using Python fundamentals. The goal with this project will be to analyze various attributes within insurance.csv to learn more about the patient information in the file. 
In this project I will try to check the next ideas: 


- What is the average age of the patients in the dataset?
- What is the average yearly medical charges of the patients?
- Where are the patients from?
- Which region has the highest average cost? The least?
- Are the average costs different for each generation?
- Do smokers pay more than non-smokers overall? 
- What is the number of males vs. females counted in the dataset?
- Do men or women pay more on average overall? 

Finally I will create a dictionary that contains all patient information. 

### 1. Import library

In [1]:
import csv

### 2. Importing Dataset.

Checking the file `insurance.csv` it is possible to notice that it contains the following columns:

- Patient Age
- Patient Sex
- Patient BMI
- Patient Number of Children
- Patient Smoking Status
- Patient U.S Geopraphical Region
- Patient Yearly Medical Insurance Cost

To store this information, seven empty lists will be created hold each individual column of data from insurance.csv. After that, I will import the data set through a function that updates these lists with the values of each of its rows.

In [169]:
ages = []
sexes = []
bmis = []
num_children = []
smoker_statuses = []
regions = []
insurance_charges = []

In [170]:
def update_list (lst, file, column_name):
    with open(file) as file_dataset: 
        csv_dicc = csv.DictReader(file_dataset)
        for row in csv_dicc:
            lst.append(row[column_name])

In [191]:
update_list(ages, 'insurance.csv', 'age')
update_list(sexes, 'insurance.csv', 'sex')
update_list(bmis, 'insurance.csv', 'bmi')
update_list(num_children, 'insurance.csv', 'children')
update_list(smoker_statuses, 'insurance.csv', 'smoker')
update_list(regions, 'insurance.csv', 'region')
update_list(insurance_charges, 'insurance.csv', 'charges')

To get the answer of the initial questions, a class called `PatientsInfo` will be built out which contains several methods: 
- With `analyze_ages()` will be possible to know the average patient age.
- `average_charges()` will calculate the average insurance charge in the dataset
- With `unique_regions()` will possible to know the regions where the patients are from.
- With `analyze_sexes()` will notice how many female and male patient we have in the dataset.
- And with `create_dictionary()` we will get a dictionary with all information per each patient.

After that, I will use for loops to get the answers to the remaining questions.

In [105]:
class PatientsInfo:
    def __init__ (self, patients_ages, patients_sexes, patients_bmis, patients_num_children, 
                 patients_smoker_statuses, patients_regions, patients_charges):
        self.patients_ages = patients_ages
        self.patients_sexes = patients_sexes
        self.patients_bmis = patients_bmis
        self.patients_num_children = patients_num_children
        self.patients_smoker_statuses = patients_smoker_statuses
        self.patients_regions = patients_regions
        self.patients_charges = patients_charges
    
    def analyze_ages(self):
        total_ages = 0
        for age in self.patients_ages: 
            total_ages += int(age)
        average_ages = round(total_ages / len(self.patients_ages))
        return('The Average Patient Age is: {average} years old'.format(average = average_ages))
    
    def average_charges(self):
        total_charge = 0
        for charge in self.patients_charges:
            total_charge += float(charge)
        average_charge = round(total_charge/len(self.patients_charges),2)
        return('The Average Patient Insurance charge is: {c} dollars'.format(c = average_charge))
    
    def regions (self):
        regions = []
        for region in self.patients_regions:
            if region not in regions:
                regions.append(region)
        return regions
    
    def analyze_sex (self):
        male = 0
        female = 0
        for sex in self.patients_sexes:
            if sex == 'male':
                male += 1
            else:
                female +=1
        print("Count for female: {f}".format(f = female))
        print("Count for male: {m}".format(m = male))
        
    def create_dictionary(self):
        self.patients_dictionary = {}
        self.patients_dictionary["age"] = [int(age) for age in self.patients_ages]
        self.patients_dictionary["sex"] = self.patients_sexes
        self.patients_dictionary["bmi"] = self.patients_bmis
        self.patients_dictionary["children"] = self.patients_num_children
        self.patients_dictionary["smoker"] = self.patients_smoker_statuses
        self.patients_dictionary["regions"] = self.patients_regions
        self.patients_dictionary["charges"] = self.patients_charges
        return self.patients_dictionary    


### 3. Getting answers

In [106]:
patient_info = PatientsInfo(ages,sexes,bmis,num_children,smoker_statuses,regions,insurance_charges)

**1. What is the average age of the patients in the dataset?**

In [107]:
patient_info.analyze_ages()

'The Average Patient Age is: 39 years old'

**2. What is the average yearly medical charges of the patients?**

In [108]:
patient_info.average_charges()

'The Average Patient Insurance charge is: 13270.42 dollars'

**3. Where are the patients from? How many patients do we have per region?**

In [109]:
region_charge = []
for i in range(len(regions)):
    region_charge.append([regions[i], insurance_charges[i]])

regions_data = patient_info.regions()

print('The patients are from: {region}'.format(region = regions_data))
print('')

regions_insurance_cost = [0 * i for i in range(len(regions_data))]
regions_patients = [0 * i for i in range(len(regions_data))]

for index in range(len(region_charge)):
    if region_charge[index][0] == regions_data[0]:
        regions_insurance_cost[0] += float(region_charge[index][1])
        regions_patients[0] += 1
    elif region_charge[index][0] == regions_data[1]:
        regions_insurance_cost[1] += float(region_charge[index][1])
        regions_patients[1] += 1
    elif region_charge[index][0] == regions_data[2]:
        regions_insurance_cost[2] += float(region_charge[index][1])
        regions_patients[2] += 1
    else:
        regions_insurance_cost[-1] += float(region_charge[index][1])
        regions_patients[-1] += 1

average_cost_region = []

for index in range(len(regions_insurance_cost)):
    average_cost = regions_insurance_cost[index] / regions_patients[index]
    average_charge.append(average_cost)
    average_cost_region.append([regions_data[index], average_cost])

for index in range(len(average_cost_region)):
    print("In region {r} there are {p} patientes.".format(r = average_cost_region[index][0], 
                                                          p = regions_patients[index]))
    print("The average insurance cost in {r} is: {c} dollars".format(r = average_cost_region[index][0], 
                                                                     c = round(average_cost_region[index][1],2)))


The patients are from: ['southwest', 'southeast', 'northwest', 'northeast']

In region southwest there are 325 patientes.
The average insurance cost in southwest is: 12346.94 dollars
In region southeast there are 364 patientes.
The average insurance cost in southeast is: 14735.41 dollars
In region northwest there are 325 patientes.
The average insurance cost in northwest is: 12417.58 dollars
In region northeast there are 324 patientes.
The average insurance cost in northeast is: 13406.38 dollars


**4. Which region has the highest average cost? The least?**

The region with highest average cost is *southeast*. And the region with the least average cost is *southwest*.

**5. Are the average costs different for each generation?**

First, I will check the max and min age. Then, I will create a list with 4 categories: `-25 years`, `26-40 years`, `41-60 years` and `+60 years` and create a for loop to iterate and count how many patients do we have pero category. Finally, I will calculate each average insurance cost and I will compare the results between the categories.

In [184]:
print('Minimum Age in dataset: {}'.format(min(ages)))
print('Maximum Age in dataset: {}'.format(max(ages)))

Minimum Age in dataset: 18
Maximum Age in dataset: 64


In [185]:
cost_by_age = [[int(age), float(charge)] for age, charge in list(zip(ages,insurance_charges))]

categories = ['-25 years', '26-40 years', '41-60 years', '+60 years']
patients_per_category = [i * 0 for i in range(len(categories))]
cost_per_category = [i * 0 for i in range(len(categories))]

In [186]:
for index in range(len(cost_by_age)):
    
    if cost_by_age[index][0] <= 25:
        patients_per_category[0] += 1
        cost_per_category[0] += cost_by_age[index][1]
        
    elif cost_by_age[index][0] >= 26 and cost_by_age[index][0] <= 40:
        patients_per_category[1] += 1
        cost_per_category[1] += cost_by_age[index][1]
        
    elif cost_by_age[index][0] >= 41 and cost_by_age[index][0] <= 60:
        patients_per_category[2] += 1
        cost_per_category[2] += cost_by_age[index][1]
        
    else:
        patients_per_category[-1] += 1
        cost_per_category[-1] += cost_by_age[index][1]

avg_cost_category = []

for index in range(len(categories)):
    average_cost = round((cost_per_category[index] / patients_per_category[index]),2)
    avg_cost_category.append([categories[index], average_cost])
    print('The average insurance cost for people in category "{c}" is: {d} dollaras'. format(
        c = avg_cost_category[index][0], d = avg_cost_category[index][1]))

The average insurance cost for people in category "-25 years" is: 9087.02 dollaras
The average insurance cost for people in category "26-40 years" is: 11096.68 dollaras
The average insurance cost for people in category "41-60 years" is: 15888.76 dollaras
The average insurance cost for people in category "+60 years" is: 21063.16 dollaras


The average insurance cost changes between the different categories of generation. The older the age, the more expensive the insurance.

**6. Do smokers pay more than non-smokers overall?**

In [192]:
cost_by_status = [[smoker, float(charge)] for smoker, charge in list(zip(smoker_statuses,insurance_charges))]

categories = ['Non-Smoker', 'Smoker']
patients_per_category = [i * 0 for i in range(len(categories))]
cost_per_category = [i * 0 for i in range(len(categories))]

for index in range(len(cost_by_status)):
    
    if cost_by_status[index][0] == 'no':
        patients_per_category[0] += 1
        cost_per_category[0] += cost_by_status[index][1]
            
    else:
        patients_per_category[-1] += 1
        cost_per_category[-1] += cost_by_status[index][1]

avg_cost_category = []

for index in range(len(categories)):
    average_cost = round((cost_per_category[index] / patients_per_category[index]),2)
    avg_cost_category.append([categories[index], average_cost])
    print('The average insurance cost for people in category "{c}" is: {d} dollaras'. format(
        c = avg_cost_category[index][0], d = avg_cost_category[index][1]))

The average insurance cost for people in category "Non-Smoker" is: 8434.27 dollaras
The average insurance cost for people in category "Smoker" is: 32050.23 dollaras


The average insurance cost for smokers people is higher than the insurance for peoplo who doesn't smoke.

**7. What is the number of males vs. females counted in the dataset?**

In [193]:
patient_info.analyze_sex()

Count for female: 662
Count for male: 676


There are more males than females in the dataset.

**8. Do men or women pay more on average overall?**

In [196]:
cost_by_sex = [[sex, float(charge)] for sex, charge in list(zip(sexes,insurance_charges))]

categories = ['Female', 'Male']
patients_per_category = [i * 0 for i in range(len(categories))]
cost_per_category = [i * 0 for i in range(len(categories))]

for index in range(len(cost_by_sex)):
    
    if cost_by_sex[index][0] == 'female':
        patients_per_category[0] += 1
        cost_per_category[0] += cost_by_sex[index][1]
            
    else:
        patients_per_category[-1] += 1
        cost_per_category[-1] += cost_by_sex[index][1]

avg_cost_category = []

for index in range(len(categories)):
    average_cost = round((cost_per_category[index] / patients_per_category[index]),2)
    avg_cost_category.append([categories[index], average_cost])
    print('The average insurance cost for people in category "{c}" is: {d} dollaras'. format(
        c = avg_cost_category[index][0], d = avg_cost_category[index][1]))

The average insurance cost for people in category "Female" is: 12569.58 dollaras
The average insurance cost for people in category "Male" is: 13956.75 dollaras


The insurance cost is more expenssive for males than for female.

## Patient Information

In [198]:
patients = patient_info.create_dictionary()