# U.S. Medical Insurance Costs

The dataset I am using contains information about US Medical Insurance Costs in a CSV file such as patient's age, sex, if he is a smoker or not, and some other caracteristics.

We are interesed about the following:

- The average age of the patients.
- Where a majority of the individuals are from.
- Costs between smokers vs. non-smokers.
- Average age for someone who has at least one child.

For the project, I will be using vanilla python, i;e I won't be importing pandas or numpy package.

In [4]:
import csv
from statistics import mean as smean

To parse the data from the csv file, I am creating a Patient class that represents any single patient, i;e any singe row from the file. Each column name becomes an object attribute (age, sex, bmi, children, smoker, region, charges).

In [3]:
# Class to represent a single patient regarding insurance
class Patient:

    def __init__(self, age: int, sex: str, bmi: float, children: int,
                 smoker: str, region: str, charges: float):
        self.age = age
        self.sex = sex
        self.bmi = bmi
        self.children = children
        self.smoker = smoker
        self.region = region
        self.charges = charges
        

I am creating an InsuranceData class, that will load and store the data. There will be some helper functions that will help us calculate some basics statistics. The class contains the columns names, and their respective data as list. It is important to mention that the class functionality would be limited as I would only be using it for the purpose of analyzing specific traits of this particular data sets. It is not an alternative to pandas, and I haven't use pandas for me to be able to practice using plain python syntax and built-ins.

In [26]:
# Class to represent insurance's data of different patient
class InsuranceData:

    def __init__(self, file_path: str):
        # All the attributes are private and Read-Only
        # setters, getters, and deleters are implemented to interact with them
        self.__ages = []
        self.__sexes = []
        self.__bmis = []
        self.__children = []
        self.__smokers = []
        self.__regions = []
        self.__charges = []
        
        with open(file_path, "r", newline="") as file:
            reader = csv.reader(file)
            self.__colnames = next(reader)
            # For every row in the file, a Patient object captures the values
            # Each column data is appended to the respective column
            for row in reader:
                p = Patient(*row)
                self.__ages.append(int(p.age))
                self.__sexes.append(p.sex)
                self.__bmis.append(float(p.bmi))
                self.__children.append(int(p.children))
                self.__smokers.append(p.smoker)
                self.__regions.append(p.region)
                self.__charges.append(float(p.charges))

        self.__ncols = len(self.__colnames)  # Number of columns
        self.__nrows = len(self.__ages)  # Number of rows
        self.__categorical_col = ["sexes", "smokers", "regions", "children"]
        self.__numerical_col = ["ages", "bmis", "children", "charges"]
        # Panda's DataFrame like presentation
        self.__table = list(zip(range(self.__nrows), self.__ages, self.__sexes, self.__bmis, self.__children,
                          self.__smokers, self.__regions, self.__charges))

    @property
    def table(self):
        return self.__table

    @table.setter
    def table(self, table):
        raise Exception("Read-Only Attribute")

    @table.deleter
    def table(self):
        raise Exception("Read-Only Attribute")

    @property
    def colnames(self):
        return self.__colnames[:]

    @colnames.setter
    def colnames(self, colnames):
        raise Exception("Read-Only Attribute")

    @colnames.deleter
    def colnames(self):
        raise Exception("Read-Only Attribute")

    @property
    def ncols(self):
        return self.__ncols

    @ncols.setter
    def ncols(self, ncols):
        raise Exception("Read-Only Attribute")

    @ncols.deleter
    def ncols(self):
        raise Exception("Read-Only Attribute")

    @property
    def nrows(self):
        return self.__nrows

    @nrows.setter
    def nrows(self, nrows):
        raise Exception("Read-Only Attribute")

    @nrows.deleter
    def nrows(self):
        raise Exception("Read-Only Attribute")
            
    @property
    def ages(self):
        return self.__ages[:]

    @ages.setter
    def ages(self, ages):
        raise Exception("Read-Only Attribute")

    @ages.deleter
    def ages(self):
        raise Exception("Read-Only Attribute")

    @property
    def sexes(self):
        return self.__sexes[:]

    @sexes.setter
    def sexes(self, sexes):
        raise Exception("Read-Only Attribute")

    @sexes.deleter
    def sexes(self):
        raise Exception("Read-Only Attribute")

    @property
    def bmis(self):
        return self.__bmis[:]

    @bmis.setter
    def bmis(self, sexes):
        raise Exception("Read-Only Attribute")

    @bmis.deleter
    def bmis(self):
        raise Exception("Read-Only Attribute")

    @property
    def children(self):
        return self.__children[:]

    @children.setter
    def children(self, sexes):
        raise Exception("Read-Only Attribute")

    @children.deleter
    def children(self, sexes):
        raise Exception("Read-Only Attribute")

    @property
    def smokers(self):
        return self.__smokers[:]

    @smokers.setter
    def smokers(self, sexes):
        raise Exception("Read-Only Attribute")

    @smokers.deleter
    def smokers(self):
        raise Exception("Read-Only Attribute")

    @property
    def regions(self):
        return self.__regions[:]

    @regions.setter
    def regions(self, regions):
        raise Exception("Read-Only Attribute")

    @regions.deleter
    def regions(self):
        raise Exception("Read-Only Attribute")

    @property
    def charges(self):
        return self.__charges[:]

    @charges.setter
    def charges(self, charges):
        raise Exception("Read-Only Attribute")

    @charges.deleter
    def charges(self):
        raise Exception("Read-Only Attribute")

    # Statistic Methods
    # calculate the mean of numeric columns by passing the column name
    def mean(self, col="charges"):
        if col in self.__numerical_col:
            # get the entire column as list
            data = getattr(self, col)
            if isinstance(data[0], int):
                # for int, an int mean is returned, especially that age should be whole
                return int(smean(data))
            return smean(data)
        else:
            raise Exception(f"Column should be in {self.__numerical_col}")

    # calculate the frequency for categorical column
    def values_count(self, col):
        if col in self.__categorical_col:
            col_data = getattr(self, col)
            val_count = dict.fromkeys(col_data, 0)
            for key in val_count:
                val_count[key] = col_data.count(key)
            return val_count
        else:
            raise Exception(f"Column should be in {self.__categorical_col}")

    # group by operation 
    # by: categorical column name
    # op: the aggregate to be performed
    # on: column to use for the aggregate
    def group(self, by, op, on):
        agg = ["sum", "mean"]
        if by in self.__categorical_col and op in agg and on in self.__numerical_col:
            by_col = getattr(self, by)  # column data 
            on_col = getattr(self, on)  # column data
            uniq = {key: [] for key in self.values_count(by)} # extracting unique values
            for key, value in zip(by_col, on_col):
                uniq[key].append(value)
            match op:
                case "sum":
                    result = {k : sum(v) for k, v in uniq.items()}
                case "mean":
                    result = {k : smean(v) for k, v in uniq.items()}
            return result
        else:
            raise Exception(f"By column should be in {self.__categorical_col}\nAggregate operation in {agg}\nOn column in {self.__numerical_col}")

    def __str__(self):
        pass
        

Now we will begin to answer the questions we are interested about:
1. What is the average age of the patients?
2. Where a majority of the individuals are from?
3. What is the Costs between smokers vs. non-smokers?
4. What is the average age for someone who has at least one child?

In [27]:
# Loading the insurance data through the InsuranceData Class
df = InsuranceData("insurance.csv")


In [9]:
# Average age of the patients
avg_patients_age = df.mean("ages")
print(f"The average age of the patients is: {avg_patients_age}")

The average age of the patients is: 39


In [20]:
# Origin of most of the patient
origin_val_count = df.values_count("regions")
max_origin = max(origin_val_count, key=origin_val_count.get)
print(f"Most patients are from {max_origin}.")
print(f"{origin_val_count.get(max_origin)} out of {df.nrows} patients.")

Most patients are from southeast.
364 out of 1338 patients.


In [18]:
# Costs between smokers and non-smokers
smokers_avg_charges = df.group("smokers", "mean", "charges")
smokers_yes = smokers_avg_charges.get("yes")
smokers_no = smokers_avg_charges.get("no")
print(f"The average costs for smokers is ${smokers_yes:.2f}")
print(f"The average costs for non-smokers is ${smokers_no:.2f}")

The average costs for smokers is $32050.23
The average costs for non-smokers is $8434.27


In [33]:
# Average cost for people with one child
children_avg_cost = df.group("children", "mean", "charges")
print(f"The average cost for people with one child is ${children_avg_cost.get(1):.2f}")

The average cost for people with one child is $12731.17
