# U.S. Medical Insurance Costs

The goal of this project is to identify the mean insurance costs for different categories of people. 

1. We will take a look on the mean charges for each sex, and identify if there is a premium for being male or female.
2. The mean charges for each region will be calculated and compared, so we will find out, if there is a premium for a specific region.
3. Next, we will find out if there is a premium for smoking, and how high it is.
4. Finally, we will find out, how the charges change with the BMI.

## Creating the necessary datasets

To do that, we have to import the necessary csv-library & csv-file.

We also create lists for the different parameters and a dictionary for the dataset.

In [1]:
import csv
        
def load_data(csv_file, col_name):
    lst = []
    with open(csv_file, newline='') as csvfile:
        reader = csv.DictReader(csvfile)
        for row in reader:
            lst.append(row[col_name])
    return lst

def load_dataset(csv_file):
    dct = {}
    with open(csv_file, newline='') as csvfile:
        reader = csv.DictReader(csvfile)
        i = 0
        for row in reader:
            dct[i] = row
            i += 1
    return dct
                
age_list = load_data('insurance.csv', 'age')
sex_list = load_data('insurance.csv', 'sex')
bmi_list = load_data('insurance.csv', 'bmi')
children_list = load_data('insurance.csv', 'children')
smoker_list = load_data('insurance.csv', 'smoker')
region_list = load_data('insurance.csv', 'region')
charges_list = load_data('insurance.csv', 'charges')
dataset = load_dataset('insurance.csv') 

## Defining a class with the necessary methods

To calculate the charges, we will define a class with the name 'Analysing'. In this class we will define all methods to analyze the mean charges, the premiums, and the dependencies.

In [2]:
class Analysed_data:
    def __init__(self, age, sex, bmi, children, smoker, region, charges):
        self.age = age_list
        self.sex = sex_list
        self.bmi = bmi_list
        self.children = children_list
        self.smoker = smoker_list
        self.region = region_list
        self.charges = charges_list
        
    def set_self(self, category):
        if category == 'sex':
            return self.sex
        elif category == 'age':
            return self.age
        elif category == 'bmi':
            return self.bmi
        elif category == 'children':
            return self.children
        elif category == 'smoker':
            return self.smoker
        elif category == 'region':
            return self.region
        else:
            return 'error'
        
    def mean_charge(self, category, item):
        sum = 0
        count = 0
        if category in self.__dict__.keys():
            self.category = self.set_self(category)
            for row in zip(self.category, self.charges):
                if row[0] == item:
                    sum += float(row[1])
                    count += 1
            return round(sum/count, 2)
        else:
            return 'Invalid category'
    
    def premium(self, mean_charge):
        sum = 0
        count = 0
        for row in self.charges:
            sum += float(row)
            count += 1
        overall_mean = round(sum/count, 2)
        return round(mean_charge-overall_mean, 2) 
    
data = Analysed_data(age_list, sex_list, bmi_list, children_list, smoker_list, region_list, charges_list)

## 1. Mean charges for each sex and premium

To calculate the mean charges for each sex, we have to create an object of the class 'Analysed_data' and use the method 'mean_charges' for 'female' and 'male' items. 

Afterwards, we calculate the premium for being male.

In [7]:
female_mean_charges = data.mean_charge('sex', 'female')
male_mean_charges = data.mean_charge('sex', 'male')
male_sex_premium = data.premium(male_mean_charges)
female_sex_premium = data.premium(female_mean_charges)

print("The mean charges for women are " + str(female_mean_charges) + ".")
print("The mean charges for men are " + str(male_mean_charges) + ".")
print("Thus, the premium for being male is " + str(male_sex_premium) + " and the premium for being female is " + str(female_sex_premium) + ".")

The mean charges for women are 12569.58.
The mean charges for men are 13956.75.
Thus, the premium for being male is 686.33 and the premium for being female is -700.84.


## 2. Mean charges for each region and premium

Similar to the 1. calculation, we use the method 'mean_charges' with the items for each region.



In [8]:
northeast_mean_charges = data.mean_charge('region', 'northeast')
northwest_mean_charges = data.mean_charge('region', 'northwest')
southeast_mean_charges = data.mean_charge('region', 'southeast')
southwest_mean_charges = data.mean_charge('region', 'southwest')
northeast_premium = data.premium(northeast_mean_charges)
northwest_premium = data.premium(northwest_mean_charges)
southeast_premium = data.premium(southeast_mean_charges)
southwest_premium = data.premium(southwest_mean_charges)

print("The mean charges for northeast are " + str(northeast_mean_charges) + " and the premium is " + str(northeast_premium) + ".")
print("The mean charges for northwest are " + str(northwest_mean_charges) + " and the premium is " + str(northwest_premium) + ".")
print("The mean charges for southeast are " + str(southeast_mean_charges) + " and the premium is " + str(southeast_premium) + ".")
print("The mean charges for southwest are " + str(southwest_mean_charges) + " and the premium is " + str(southwest_premium) + ".")

The mean charges for northeast are 13406.38 and the premium is 135.96.
The mean charges for northwest are 12417.58 and the premium is -852.84.
The mean charges for southeast are 14735.41 and the premium is 1464.99.
The mean charges for southwest are 12346.94 and the premium is -923.48.


## 3. Premium for smoking

Now we will calculate the mean charges for smokers, and how high their average premium is.

In [9]:
smoker_mean_charges = data.mean_charge('smoker', 'yes')
nonsmoker_mean_charges = data.mean_charge('smoker', 'no')
smoker_premium = data.premium(smoker_mean_charges)
nonsmoker_premium = data.premium(nonsmoker_mean_charges)

print("The mean charges for smokers are " + str(smoker_mean_charges) + " and the premium is " + str(smoker_premium) + ".")
print("On the other hand, the mean charges for non-smokers are " + str(nonsmoker_mean_charges) + " and the premium is " + str(nonsmoker_premium) + ".")

The mean charges for smokers are 32050.23 and the premium is 18779.81.
On the other hand, the mean charges for non-smokers are 8434.27 and the premium is -4836.15.


## 4. Charges in dependency of the BMI

For this final taks, we will create a dictionary that shows the changes of the charges based on the BMI. The first column represents the BMI. Therefore we will look for the minimum and maximum BMI in the dataset.

In [12]:
print("The smallest BMI is " + min(bmi_list) + " and the biggest is " + max(bmi_list) + ".")

The smallest BMI is 15.96 and the biggest is 53.13.


Now we know the upper and lower limits of our "table". For every BMI between two values, we now want to calculate the mean charges. Afterwards, we can compare the values and find out about the dependency between the charges and the BMI.

In [25]:
def charges_table_bmi(steps):
    lower_limit = int(float(min(bmi_list)))
    upper_limit = int(float(max(bmi_list)))
    while lower_limit <= upper_limit:
        sum = 0
        count = 0
        for row in zip(bmi_list, charges_list):
            if float(row[0]) >= lower_limit and float(row[0]) < lower_limit+steps: 
                sum += float(row[1])
                count += 1
        if count > 0:
            mean_charge = round(sum/count, 2)
        else:
            mean_charge = 'no values'
        print("BMI: " + str(lower_limit) + " - " +str(lower_limit+steps) + " | Mean charge: " + str(mean_charge))
        lower_limit += steps
        
charges_table_bmi(5)

BMI: 15 - 20 | Mean charge: 8838.56
BMI: 20 - 25 | Mean charge: 10572.37
BMI: 25 - 30 | Mean charge: 10987.51
BMI: 30 - 35 | Mean charge: 14419.67
BMI: 35 - 40 | Mean charge: 17022.26
BMI: 40 - 45 | Mean charge: 16569.6
BMI: 45 - 50 | Mean charge: 17815.04
BMI: 50 - 55 | Mean charge: 16034.31
