# U.S. Medical Insurance Costs

In this project, a `.csv` file with medical insurance costs will be investigated using Python fundamentals. The goal of this project will be to analyze various attributes within `insurance.csv` to learn more about the patient information in the file and gain insight into potential use cases for the dataset.

In [1]:
# import csv
import csv

Import all necessary module to start. For this project we will only be using the `csv` library to work with the `insurance.csv` data. There are other potential libraries which could make help with this project's depth and experience, however for this simple analysis just the `csv` module will suffice.

The next step is to get ourselves acquainted with the data in `insurance.csv`. The file contains 7 columns: age, sex, bmi, children, smoker, region and charges, all filled with patient information (no missing data). To hold the information of these 7 attributes 7 empty lists will be created.

In [2]:
# create empty lists for eac column in insurance.csv
ages = []
sexes = []
bmis = []
num_children = []
smoker_statuses = []
regions = []
insurance_charges = []

To save ourselves from re writing code over and over again to use `insurance.csv` we will define a 'helper' function to make loading the data into the lists as efficient as possible. Without this function we would have to manually load and iterate the data 7 times.

In [3]:
# define helper function to load and write data to empty lists
def load_list(list, csv_file, column_name):
    # open csv file
    with open(csv_file) as csv_info:
        # read csv file as a dict
        csv_dict = csv.DictReader(csv_info)
        # iterate through each row of the file
        for row in csv_dict:
            # add the data from each row to a list
            list.append(row[column_name])
            
        return list

In [4]:
%%capture 
# the above executes code whilst supressing output of cell for aesthetic reasons
# load csv file data into list
load_list(ages, 'insurance.csv', 'age')
load_list(sexes, 'insurance.csv', 'sex')
load_list(bmis, 'insurance.csv', 'bmi')
load_list(num_children, 'insurance.csv', 'children')
load_list(smoker_statuses, 'insurance.csv', 'smoker')
load_list(regions, 'insurance.csv', 'region')
load_list(insurance_charges, 'insurance.csv', 'charges')

Now that all the data has been loaded into their respective lists we can start the analysis. There are many aspects of this dataset which we could look into. For this project the following operations will be implemented.

* average age of patients
* number of males vs females counted in the dataset 
* find geographical location of the patients
* return the average yearly medical charges of the patients
* creating a dictionary that contains all patient information

To carry out these operations, we will be making a class `PatientsInfo` which contains 5 methods: `analyse_ages()`, `analyse_sexes()`, `unique_regions()`, `average_charges()` and `create_dictionary()`.

In [5]:
class PatientsInfo:
    def __init__(self, patients_ages, patients_sexes, patients_bmis, patients_num_of_children, patients_smoker_statuses, patients_regions, patients_charges):
        # init method that takes in each parameter 
        self.patients_ages = patients_ages
        self.patients_sexes = patients_sexes
        self.patients_bmis = patients_bmis
        self.patients_num_children = patients_num_of_children
        self.patients_smoker_statuses = patients_smoker_statuses
        self.patients_regions = patients_regions
        self.patients_charges = patients_charges
    
    # method that calculates the average age of the patients in insurance.csv
    def analyse_ages(self):
        total_age = 0
        for age in self.patients_ages:
            total_age += int(age)

        return ("The average age of the patients is: " + str(round(total_age/len(self.patients_ages), 2)) + " years")

    # method that counts the number of males and females in insurance.csv
    def analyse_sexes(self):
        males = 0
        females = 0
        for sex in self.patients_sexes:
            if sex == "female":
                females += 1
            elif sex == "male":
                males += 1

        print("Males: " + str(males))
        print("Females: " + str(females))

    # method to find the unique regions that the patients are from in insurance.csv
    def analyse_regions(self):
        unique_regions = []
        for region in self.patients_regions:
            if region not in unique_regions:
                unique_regions.append(region)

        return unique_regions

    # method that calculates the average yearly medical costs of the patients in insurance.csv 
    def analyse_charges(self):
        total_charges = 0
        for charge in self.patients_charges:
            total_charges += float(charge)
        
        return ("The average yearly medical insurance cost incurred by the patients is: " + str(round(total_charges/len(self.patients_charges), 2)) + " dollars")

    # method to create a dictionary holding all the patients' information
    def create_dictionary(self):
        self.patients_dictionary = {}
        self.patients_dictionary["age"] = [int(age) for age in self.patients_ages]
        self.patients_dictionary["sex"] = self.patients_sexes
        self.patients_dictionary["bmi"] = self.patients_bmis
        self.patients_dictionary["children"] = self.patients_num_children
        self.patients_dictionary["smoker"] = self.patients_smoker_statuses
        self.patients_dictionary["regions"] = self.patients_regions
        self.patients_dictionary["charges"] = self.patients_charges

        return self.patients_dictionary

With our new class we must first instantiate it with `patient_info`, this allows us to use each method to see the results of the analysis.

In [6]:
patient_info = PatientsInfo(ages, sexes, bmis, num_children, smoker_statuses, regions, insurance_charges)

In [7]:
patient_info.analyse_ages()

'The average age of the patients is: 39.21 years'

In [8]:
patient_info.analyse_sexes()

Males: 676
Females: 662


In [9]:
patient_info.analyse_regions()

['southwest', 'southeast', 'northwest', 'northeast']

In [10]:
patient_info.analyse_charges()

'The average yearly medical insurance cost incurred by the patients is: 13270.42 dollars'

From the code cells above we can see that the first four methods work correctly and return the desired analysis on the data in `insurance.csv`. The next thing to do is to use our `create_dictionary()` method to organise all of the date into a dictionary. This is convenient as it allows for further analysis and manipulation of data without working in the `class` block of code.

In [15]:
patients_info_dict = patient_info.create_dictionary()
# print(patients_info_dict)