# U.S. Medical Insurance Costs

## Overview 
Target dataset: insurance.csv - the located in the same folder as this current us-medical-insurance-costs.ipynb file.  
This project demands investigating the dataset of U.S. medical insurance costs composed of different information using Python to figure out some useful insights. Purpose of this project is mainly to practice with fundamentals coding with Python and apply these skills into analyzing a common sense dataset.

The dataset is organized into 7 columns corresponding to each patient's information: age, sex, bmi, children, smoker, region, charges.  
This dataset, medical insurance costs along with its attributes, can be analyzed mainly focusing on total insurance cost of each individual yearly, but relating with other attribute so that we can clarify the association between many factors with insurance charges. Currently, this project will not go further than that, e.g. promoting specific actions to be drawn using analytic results, but only researches on relationships of attributes towards charges. However, as for such a small project framework, we only study on the pairs of attributes in which there's always one part is data of insurance charges.

## Project Ideation
The dataset includes completely different individuals without repetition of information presenting as different rows.  
For efforless executing, the columns within dataset can be analyzed into different lists. And, for multiple reuses, a function of extracting data columns shoulbe assigned. Eventually, a class of multiple methods with various functionalities can be assigned so that the dataset can be repeatedly analyzed in various ways using instances from the class.  

Besides, it might look less complicated if the data from csv is synthesized into the only dictionary, yet in cases when the user only wants to observe the changes of data from a few columns and he only passes in some specific relatable parameters into the class instance and that should work as properly as when all parameters passed in. Therefore, each column-based data should be separated into different lists first of all.


As we're working with a CSV-formatted dataset, csv module should be imported so that we can access the dataset properly. Although, there will be other modules imported for easier analysis but with only csv module, it is still sufficient to work on. Besides, for more advanced calculations, statistics module should also imported.

In [40]:
import csv
import statistics

As mentioned earlier, 7 columns within the dataset can be accessed and extracted multiple times by using one function. The function should access and open the insurance.csv file, then, extract data and return the demanded column within the list as a list.

In [41]:
def column_extracter(target_list, csv_file, column_name):    
    # Using context-managing block starting with keyword 'with', we open the dataset in read mode.
    with open(csv_file, newline = '') as dataset:
        # Set the dataset into dictionary format
        data_dict = csv.DictReader(dataset)
        # Add the data from each row to a list by looping through the data_dict
        for row in data_dict:
            target_list.append(row[column_name])
            # By using brackets for indexing the specific column to extract that data from that column like row[column_name], the target_list was appended the desired column.

    return target_list

Purposes of parameters using within the above function: 
- target_list: the empty list assigned to be used as a container for the data of desired column. The empty list must be assigned outside the function.
- csv_file: the csv dataset used to extract data. This parameter must be in type string
- column_name: the name of desired column within the dataset (csv file). This parameter must be in type string

The 7 lists presenting 7 columns within the dataset can be created using the newly-created-function 'column_extracter()'.

In [42]:
# Assign 7 empty lists
ages = []
sexes = []
bmis = []
num_children = []
smoker_statuses = []
regions = []
insurance_charges = []

# Append to 7 empty lists newly created using column_extracter() function
column_extracter(ages, 'insurance.csv', 'age')
column_extracter(sexes, 'insurance.csv', 'sex')
column_extracter(bmis, 'insurance.csv', 'bmi')
column_extracter(num_children, 'insurance.csv', 'children')
column_extracter(smoker_statuses, 'insurance.csv', 'smoker')
column_extracter(regions, 'insurance.csv', 'region')
column_extracter(insurance_charges, 'insurance.csv', 'charges')


['16884.924',
 '1725.5523',
 '4449.462',
 '21984.47061',
 '3866.8552',
 '3756.6216',
 '8240.5896',
 '7281.5056',
 '6406.4107',
 '28923.13692',
 '2721.3208',
 '27808.7251',
 '1826.843',
 '11090.7178',
 '39611.7577',
 '1837.237',
 '10797.3362',
 '2395.17155',
 '10602.385',
 '36837.467',
 '13228.84695',
 '4149.736',
 '1137.011',
 '37701.8768',
 '6203.90175',
 '14001.1338',
 '14451.83515',
 '12268.63225',
 '2775.19215',
 '38711',
 '35585.576',
 '2198.18985',
 '4687.797',
 '13770.0979',
 '51194.55914',
 '1625.43375',
 '15612.19335',
 '2302.3',
 '39774.2763',
 '48173.361',
 '3046.062',
 '4949.7587',
 '6272.4772',
 '6313.759',
 '6079.6715',
 '20630.28351',
 '3393.35635',
 '3556.9223',
 '12629.8967',
 '38709.176',
 '2211.13075',
 '3579.8287',
 '23568.272',
 '37742.5757',
 '8059.6791',
 '47496.49445',
 '13607.36875',
 '34303.1672',
 '23244.7902',
 '5989.52365',
 '8606.2174',
 '4504.6624',
 '30166.61817',
 '4133.64165',
 '14711.7438',
 '1743.214',
 '14235.072',
 '6389.37785',
 '5920.1041',
 '176

The 7 lists created all contain type strings.

## Building Class for Analysis 

As mentioned, the major aim is around the 'charges' column, also known as the list of 'insurance_charges' recently created. In other words, to know which attribute influences the most on the change of insurance cost, we have to find patterns between attributes with charges, then conclude. As for float type like insurance costs are, to understand this value also means that to be clear about its fluctuances. That means to analyze when the value grows up and when drops down.  

Be aware that 6 attributes are in different types, including string (region), float (bmi), integer (age, children), boal (yes or no, male or female) originally. Although once the function column_extracter() is generated through all dataset making the columns in format of list of strings, it's safe to convert them into orginal types for proper calculations.  

For producing clean and modular code, creating class with various methods is the more excellent option rather than buidling many functions.  
The class created should have methods with practical applications in analysis the dataset. The methods can have the following functionalities to suit best with the current demands or goals of the project: min, max, average calculation, counting, sorting.  

One key notice, we can easily run a program to sort the change of charges in a specific order, but it is unhelpful if the change is presented as only one column of charges, then we cannot find any new insights from such data. Therefore, the class should have a method that can compare multiple columns or at least 2 columns at the same time including the column of charge changes. 


In [43]:
"""
For reminding, the lists of corresponding columns in the dataset are as follows:  
    ages, sexes, bmis, num_children, smoker_statuses, regions, insurance_charges
"""
class CostAnalyzer:
    def __init__(self, _ages=None, _sexes=None, _bmis=None, _num_children=None, _smoker_statuses=None, _regions=None, _insurance_charges=None):
        self.values = {'Ages':_ages, 'Sexes':_sexes, 'BMIs':_bmis, 'Number Of Children':_num_children, 'Smoker Statuses':_smoker_statuses, 'Regions':_regions, 'Insurance Charges': _insurance_charges}

    """
        For sufficient calculation, a method of converting the column data in to number value is needed.
        As not all columns are able to be turned into numeric form, the method should use try-except pair.
        Plus, this method is the initially crucial part for other methods' performance.
    """
    def _get_column_as_numbers(self, target_column): # target_column inputted must be a list of values.
        col = self.values.get(target_column) 
        # .get() is a function used to take out the value corresponded to the specific key, the key was mentioned as a parameter inside the brackets.
        if col is None:
            raise ValueError(f"Column '{target_column}' is None or not provided.") # As the initial parameters might not be filled in, then stay as none values.
        # Convert to float safely (works for int/float strings too).
        try:
            return [float(x) for x in col]
        except (TypeError, ValueError):
            raise ValueError(f"Column '{target_column}' contains non-numeric values.") # As the column might contain non-numeric values.
    
    def min_finder(self, target_column): # similarly, target_column inputted should be a list.
        values = self._get_column_as_numbers(target_column) # The list target_column will then converted into numeric format if possible.
        return min(values)

    def max_finder(self, target_column):
        values = self._get_column_as_numbers(target_column)
        return max(values)

    def mean_finder(self, target_column):
        values = self._get_column_as_numbers(target_column)
        return (sum(values) / len(values))

    def median_finder(self, target_column):
        values = self._get_column_as_numbers(target_column)
        return statistics.median(values)

    def rows_sorter(self, target_column, direction="ascending", _format=False):
        values = self._get_column_as_numbers(target_column)
        if direction == "ascending":
            sorted_vals = sorted(values)
        elif direction == "descending":
            sorted_vals = sorted(values, reverse=True)
        else:
            raise ValueError("direction must be 'ascending' or 'descending'")  # safely sorting the values by using raise ValueError().
        # Convert to vertical string if wanted using _format as we might want to return pure list of values for further analyzing instead of a nicely organized string.
        if _format == True:
            return "\n".join(str(v) for v in sorted_vals) # return a nicely organized string of values
        else:
            return sorted_vals # return a list of sorted values

    def rows_counter(self, ages=None, sexes=None, bmis=None, children=None, smokers=None, regions=None, charges=None):
        # map arguments to column keys is used to check the demand of counting specific columns
        arg_map = {
            "Ages": ages,
            "Sexes": sexes,
            "BMIs": bmis,
            "Number Of Children": children,
            "Smoker Statuses": smokers,
            "Regions": regions,
            "Insurance Charges": charges
        }
        # Case 1 - When all parameters received none
        if all(v is None for v in arg_map.values()):
            return len(self.values.get("Ages"))

        # Case 2 - When at least one parameter is specified
        result = {} # To store the counts of each column
        for col_name, arg in arg_map.items():
            if arg:
                counts = {} # To store the count of each component value within each column
                for value in self.values[col_name]:
                    counts[value] = counts.get(value, 0) + 1 # if inside counts, existing the value, 1 will be added continuously. Otherwise, 0 will be replaced.
                result[col_name] = counts # After looping, counts dictionary will be added as a value of the specific column key within result dicitonary.
        return result
                
    # This method works as a formatter to compare the changes of multiple vertical pieces of information.
    def rows_view(self, target_columns, sort_by=None, direction="ascending"):
        """
            target_columns will contain a list of column names in string type to be returned in a desired format.
            sort_by will receive a column name as a string to play the role of the key column of the sorting process,
        meaning that other target_columns will be arranged along with the sorting of this key column.
            To do this, the crucial reminder is that focusing on the index of each list of data from columns
        so that we can return vertical lines of data in desired format while maintain the data from rows as exact as in orginal rows from csv file.
        """
        order = None # this will be later used as a list container only if the next condition is satisfied.
        if sort_by: # as sort_by might not be filled by mistake, this condition is needed
            sorted_key_column = self.rows_sorter(sort_by)
            org_key_column = self._get_column_as_numbers(sort_by) # This will be used as a storage of oringal indices
            # Storage of indices contained by order list
            order = [org_key_column.index(value) for value in sorted_key_column]

        # build rows
        n = len(self.values[target_columns[0]]) # this will be used as a maximum value to be reached in the loop in case the above block of code is False.
        lines = [] # This will contain a list of formatted lines joined by "      ".
        for i in (order if order else range(n)):
            row = [str(self.values[col][i]) for col in target_columns] # for each i in either the order list or in range(n), row list will contain all the values on the same row of all target_columns
            """By using this list comprehension, the formatted row will be done in the order of the original order of the target_columns list provided."""
            lines.append("      ".join(row)) # then, all values in row list will be added into lines list using join with "      ".
        return "\n".join(lines) # finally, return formatted lines joined by new spaces, "\n".


        # Additional methods
        # When working with a dictionary, when we need to get the key by using the value of that key, use the following method
        def get_keys(dictionary, value):
            return [k for k, v in dictionary.items() if v == value] # This comprehension will match and get the multiple keys that have the same values out.
    
    
    def __str__(self):
        return f"{self.values}"

        
# Creating example object
o = CostAnalyzer(_ages=ages, _sexes=sexes, _bmis=bmis, _num_children=num_children, _smoker_statuses=smoker_statuses, _regions=regions, _insurance_charges=insurance_charges)


Fairly speaking, method rows_sorter() is not a common method to be used for analyzing in this project unless e.g. the user only want to see the specific directed change of only one column.

#### Ages Analyzing

In [44]:
print("---AGE ANALYZING---")
print("Min age:", o.min_finder("Ages"))
print("Max age:", o.max_finder("Ages"))
print("Mean age:", o.mean_finder("Ages"))
print("Median age:", o.median_finder("Ages"))
print("Ages with its counts:", (o.rows_counter(ages="Ages")))

print("\nComparing columns Ages and Insurance Charges:")
# NOTICE: To properly print out the desired format of strings, print() must be outside of the whole code as follows, but if only the inner code doesn't work as desired!
print(o.rows_view(["Ages", "Insurance Charges"], sort_by="Insurance Charges"))


---AGE ANALYZING---
Min age: 18.0
Max age: 64.0
Mean age: 39.20702541106129
Median age: 39.0
Ages with its counts: {'Ages': {'19': 68, '18': 69, '28': 28, '33': 26, '32': 26, '31': 27, '46': 29, '37': 25, '60': 23, '25': 28, '62': 23, '23': 28, '56': 26, '27': 28, '52': 29, '30': 27, '34': 26, '59': 25, '63': 23, '55': 26, '22': 28, '26': 28, '35': 25, '24': 28, '41': 27, '38': 25, '36': 25, '21': 28, '48': 29, '40': 27, '58': 25, '53': 28, '43': 27, '64': 22, '20': 29, '61': 23, '44': 27, '57': 26, '29': 27, '45': 29, '54': 28, '49': 28, '47': 29, '51': 29, '42': 27, '50': 29, '39': 25}}

Comparing columns Ages and Insurance Charges:
18      1121.8739
18      1131.5066
18      1135.9407
18      1136.3994
18      1137.011
18      1137.4697
18      1141.4451
18      1146.7966
18      1149.3959
18      1163.4627
19      1241.565
19      1242.26
19      1242.816
19      1252.407
19      1253.936
19      1256.299
19      1261.442
19      1261.859
19      1263.249
20      1391.5287
21      

Besides basic findings e.g. minimum, maximum, mean, median age and the counts, a formatted table of 2 columns Ages and Insurance Charges printed out as above for a clearer observation in the both changes. Also as what presents in the formatted table, there is a relatively increasing trend of age along with the increase of insurance charges. Going with the ups of charges, sometimes the growth of corresponding ages is not consistent all over, e.g. these are mostly showed at a little post-start of the list for the uncommonly larger age figures among smaller ones; then near the end of the whole formatted list, there's a small part of significantly smaller age figures among larger ones.  However, those uncommon features are not strongly influential on the whole trend, then it's safe to assert there's a general increasing tendency of ages along with the increase of insurance costs. In other words, ages and medical cost changes are relatively proportional to each other

#### BMIs Analyzing

In [45]:
print("\n---BMIs ANALYZING---")
print("Min BMI:", o.min_finder("BMIs"))
print("Max BMI:", o.max_finder("BMIs"))
print("Mean BMI:", o.mean_finder("BMIs"))
print("Median BMI:", o.median_finder("BMIs"))
print("BMIs with its counts:", (o.rows_counter(bmis="BMIs")))

print("\nComparing columns BMIs and Insurance Charges:")
# NOTICE: To properly print out the desired format of strings, print() must be outside of the whole code as follows, but if only the inner code doesn't work as desired!
print(o.rows_view(["BMIs", "Insurance Charges"], sort_by="Insurance Charges"))


---BMIs ANALYZING---
Min BMI: 15.96
Max BMI: 53.13
Mean BMI: 30.66339686098655
Median BMI: 30.4
BMIs with its counts: {'BMIs': {'27.9': 1, '33.77': 2, '33': 6, '22.705': 3, '28.88': 8, '25.74': 4, '33.44': 4, '27.74': 6, '29.83': 6, '25.84': 5, '26.22': 4, '26.29': 1, '34.4': 4, '39.82': 3, '42.13': 4, '24.6': 3, '30.78': 5, '23.845': 3, '40.3': 1, '35.3': 4, '36.005': 1, '32.4': 1, '34.1': 8, '31.92': 5, '28.025': 5, '27.72': 4, '23.085': 2, '32.775': 7, '17.385': 1, '36.3': 4, '35.6': 1, '26.315': 5, '28.6': 3, '28.31': 9, '36.4': 1, '20.425': 1, '32.965': 4, '20.8': 2, '36.67': 4, '39.9': 1, '26.6': 6, '36.63': 3, '21.78': 2, '30.8': 8, '37.05': 3, '37.3': 1, '38.665': 1, '34.77': 3, '24.53': 1, '35.2': 7, '35.625': 4, '33.63': 6, '28': 3, '34.43': 4, '28.69': 3, '36.955': 4, '31.825': 5, '31.68': 2, '22.88': 1, '37.335': 2, '27.36': 7, '33.66': 5, '24.7': 4, '25.935': 3, '22.42': 3, '28.9': 5, '39.1': 1, '36.19': 3, '23.98': 3, '24.75': 1, '28.5': 5, '28.1': 2, '32.01': 2, '27.4':

By finding minimum and maximum values of BMI throughout the dataset, it's clear to see a relative range of the BMI change, from 15.96 to 53.13. There's no significant difference between mean (30.66) and median values (30.40) of BMIs, so BMI data is not a skewed range. Using the counts of each BMI is not necessary in this case as the result.  
Observing the comparison of BMIs to insurance charge changes, it's safely to assert that there's no consistency in the change of BMI values along with the increase of insurance costs. While the charges are increasing, BMI values are fluctuating in a chaotic pattern, sometimes increasing but sometimes decreasing unsually. Even the top smallest BMIs match with the top highest medical bills. Therefore, there's no a significant relationship between BMIs and medical insurance cost changes. 

#### Number of Children Analyzing

In [46]:
print("\n---NUMBER OF CHILDREN ANALYZING---")
print("Min number of children:", o.min_finder("Number Of Children"))
print("Max number of children:", o.max_finder("Number Of Children"))
print("Mean number of children:", o.mean_finder("Number Of Children"))
print("Median number of children:", o.median_finder("Number Of Children"))

print("\nComparing columns Number Of Children and Insurance Charges:")
# NOTICE: To properly print out the desired format of strings, print() must be outside of the whole code as follows, but if only the inner code doesn't work as desired!
print(o.rows_view(["Number Of Children", "Insurance Charges"], sort_by="Insurance Charges"))


---NUMBER OF CHILDREN ANALYZING---
Min number of children: 0.0
Max number of children: 5.0
Mean number of children: 1.0949177877429
Median number of children: 1.0

Comparing columns Number Of Children and Insurance Charges:
0      1121.8739
0      1131.5066
0      1135.9407
0      1136.3994
0      1137.011
0      1137.4697
0      1141.4451
0      1146.7966
0      1149.3959
0      1163.4627
0      1241.565
0      1242.26
0      1242.816
0      1252.407
0      1253.936
0      1256.299
0      1261.442
0      1261.859
0      1263.249
0      1391.5287
0      1515.3449
0      1526.312
0      1532.4697
0      1534.3045
0      1607.5101
0      1615.7667
0      1621.3402
0      1621.8827
0      1622.1885
0      1625.43375
0      1627.28245
0      1628.4709
0      1629.8335
0      1631.6683
0      1631.8212
0      1632.03625
0      1632.56445
0      1633.0444
0      1633.9618
0      1634.5734
0      1635.73365
0      1639.5631
0      1639.5631
0      1646.4297
0      1664.9996
0      1674.6323


With only understanding the tendency of size of offspring, or scrolling through the comparing columns above, it's logical to state that there's no significant relationship between the change in number of children and medical bill changes. As the numbers of children change in a unpredictable way, not in a consistent pattern so that we can clearly say. 

#### Analyzing the Changes in Sexes

In [47]:
print("\n---SEXES ANALYZING---")
print("Number of two sexes:", o.rows_counter(sexes="Sexes"))
m_bills = []
f_bills = []
for i in range(len(sexes)):
    if sexes[i] == "male":
        m_bills.append(float(insurance_charges[i]))
    else:
        f_bills.append(float(insurance_charges[i]))

print("\nMax medical insurance charges of men:", max(m_bills))
print("Min medical insurance charges of men:", min(m_bills))
print("Median value of medical insurance charges of men:", statistics.median(m_bills), "\n")

print("Max medical insurance charges of women:", max(f_bills))
print("Min medical insurance charges of women:", min(f_bills))
print("Median value of medical insurance charges of women:", statistics.median(f_bills))

print("\nComparing columns Sexes and Insurance Charges:")
# NOTICE: To properly print out the desired format of strings, print() must be outside of the whole code as follows, but if only the inner code doesn't work as desired!
print(o.rows_view(["Sexes", "Insurance Charges"], sort_by="Insurance Charges"))


---SEXES ANALYZING---
Number of two sexes: {'Sexes': {'female': 662, 'male': 676}}

Max medical insurance charges of men: 62592.87309
Min medical insurance charges of men: 1121.8739
Median value of medical insurance charges of men: 9369.61575 

Max medical insurance charges of women: 63770.42801
Min medical insurance charges of women: 1607.5101
Median value of medical insurance charges of women: 9412.9625

Comparing columns Sexes and Insurance Charges:
male      1121.8739
male      1131.5066
male      1135.9407
male      1136.3994
male      1137.011
male      1137.4697
male      1141.4451
male      1146.7966
male      1149.3959
male      1163.4627
male      1241.565
male      1242.26
male      1242.816
male      1252.407
male      1253.936
male      1256.299
male      1261.442
male      1261.859
male      1263.249
male      1391.5287
male      1515.3449
male      1526.312
male      1532.4697
male      1534.3045
female      1607.5101
female      1615.7667
male      1621.3402
female    

In general, both sexes spend money for medical costs relatively equally. Throughout the whole list as above, both sexes appear almost all parts. Eenthough female spents are always over male ones at all categories, maximum, minimum and median values of insurance charges, the differences are not strongly impactful. Additionally, both sexes take turns positioning themselves all over the parts of that list. Therefore, it's fair to speak that sex is not a star impact on insurance cost changes.

#### Smoker Statuses Analyzing

Purely expecting, independently one attribute of smoker statuses cannot reveal significant insights but it can contribute some if it parts with other attributel. However, in the frame of this project, we only try to check this one attribute change in pair with the medical costs.

In [48]:
print("\n---SMOKER STATUSES ANALYZING---")
print(o.rows_counter(smokers="Smoker Statuses"))

print("\nComparing columns Smoker Statuses and Insurance Charges:")
# NOTICE: To properly print out the desired format of strings, print() must be outside of the whole code as follows, but if only the inner code doesn't work as desired!
print(o.rows_view(["Smoker Statuses", "Insurance Charges"], sort_by="Insurance Charges"))


---SMOKER STATUSES ANALYZING---
{'Smoker Statuses': {'yes': 274, 'no': 1064}}

Comparing columns Smoker Statuses and Insurance Charges:
no      1121.8739
no      1131.5066
no      1135.9407
no      1136.3994
no      1137.011
no      1137.4697
no      1141.4451
no      1146.7966
no      1149.3959
no      1163.4627
no      1241.565
no      1242.26
no      1242.816
no      1252.407
no      1253.936
no      1256.299
no      1261.442
no      1261.859
no      1263.249
no      1391.5287
no      1515.3449
no      1526.312
no      1532.4697
no      1534.3045
no      1607.5101
no      1615.7667
no      1621.3402
no      1621.8827
no      1622.1885
no      1625.43375
no      1627.28245
no      1628.4709
no      1629.8335
no      1631.6683
no      1631.8212
no      1632.03625
no      1632.56445
no      1633.0444
no      1633.9618
no      1634.5734
no      1635.73365
no      1639.5631
no      1639.5631
no      1646.4297
no      1664.9996
no      1674.6323
no      1682.597
no      1694.7964
no     

As showing in the above, there's absolutely non-smoker status from the start to the medical cost of about 13000 USD. From there onwards, frequency of smoker statuses increases significantly and consistently, occasionally interspersed with non-smoker statuses but not much, which really points out the mutually enhancing relationship of smoking with insurance costs for medicical treatment. 

#### Regions Analyzing

In [49]:
print("\n---REGIONS ANALYZING---")
print(o.rows_counter(regions="Regions"))



print("\nComparing columns Regions and Insurance Charges:")
# NOTICE: To properly print out the desired format of strings, print() must be outside of the whole code as follows, but if only the inner code doesn't work as desired!
print(o.rows_view(["Regions", "Insurance Charges"], sort_by="Insurance Charges"))


---REGIONS ANALYZING---
{'Regions': {'southwest': 325, 'southeast': 364, 'northwest': 325, 'northeast': 324}}

Comparing columns Regions and Insurance Charges:
southeast      1121.8739
southeast      1131.5066
southeast      1135.9407
southeast      1136.3994
southeast      1137.011
southeast      1137.4697
southeast      1141.4451
southeast      1146.7966
southeast      1149.3959
southeast      1163.4627
southwest      1241.565
southwest      1242.26
southwest      1242.816
southwest      1252.407
southwest      1253.936
southwest      1256.299
southwest      1261.442
southwest      1261.859
southwest      1263.249
southeast      1391.5287
southeast      1515.3449
southwest      1526.312
southeast      1532.4697
southeast      1534.3045
southeast      1607.5101
southeast      1615.7667
northwest      1621.3402
southeast      1621.8827
southeast      1622.1885
northwest      1625.43375
northwest      1627.28245
northwest      1628.4709
southeast      1629.8335
southeast      1631.6683

As other attributes except ages and smoker statuses, this attribute of regions also shows a inconsistency in its change along with the increase of medical costs. Generally, all over the ranges of medical bill changes, there are all types of regions with a relative similar frequency. Therefore, it's safe to conclude that regions is not an impactful attribute as to the change in medical costs.  

#### Additional Analyzing

It was stated earlier that this project will be too long and complicated than it should be if multiple attributes are studied at once. However, after the fundamental observations as above, ages and smoker statuses are the most influential attributes in the context of comparing each of them to the increase of insurance costs. Therefore, it's reasonable to step further and continue with combining three columns of ages, smoker statuses, and insurance charges at once so that several insights can be gained in some way.

In [50]:
print("\nComparing columns Ages, Smoker Statuses and Insurance Charges:\n")
# NOTICE: To properly print out the desired format of strings, print() must be outside of the whole code as follows, but if only the inner code doesn't work as desired!
print(o.rows_view(["Ages", "Smoker Statuses", "Insurance Charges"], sort_by="Insurance Charges"))


Comparing columns Ages, Smoker Statuses and Insurance Charges:

18      no      1121.8739
18      no      1131.5066
18      no      1135.9407
18      no      1136.3994
18      no      1137.011
18      no      1137.4697
18      no      1141.4451
18      no      1146.7966
18      no      1149.3959
18      no      1163.4627
19      no      1241.565
19      no      1242.26
19      no      1242.816
19      no      1252.407
19      no      1253.936
19      no      1256.299
19      no      1261.442
19      no      1261.859
19      no      1263.249
20      no      1391.5287
21      no      1515.3449
21      no      1526.312
21      no      1532.4697
21      no      1534.3045
18      no      1607.5101
18      no      1615.7667
19      no      1621.3402
18      no      1621.8827
18      no      1622.1885
19      no      1625.43375
19      no      1627.28245
19      no      1628.4709
18      no      1629.8335
18      no      1631.6683
18      no      1631.8212
19      no      1632.03625
19      

As for the least insurance costs spent, it's observed that the youngest ages with non-smoker statuses appeared, this is matched with what's normal that the younger has a stronger health who needs less medical treatment but obviously without smoking.  
When scrolling further, people at senior ages for about 50 to 60 will usually spend higher fees for medical treatment regardless statuses of smoker.  
Additionally, even at the young age, people can also pay top highest fees for medicine if they are smokers, and plus, it's also logical when observing that similar pattern as for the seniors who are about 60 years old and smokers. 

### Conclusion

By analyzing using the pairs of different attributes with one column insurance charges, we recognized the specific relationships of ages and smoker statuses with the change in medical fees. The combination of smoker status and highe age will frequently lead to a high medical charges. Additionally, the younger can also pay top highest fees if he is a smoker. Generally, independently aging or smoker status certainly leads to increasing medical treatment fee.  
After studying each attribute, it's clear that the dataset was designed and prepared the data very equally as each component within each column set are collected with a relatively similar frequency. For example, within regions list, all four places have nearly the same number of data available as in Regions Analyzing part.  
##### Personal Thoughts after the Project
The project provides a clean and organized dataset which is not a usual case in real-world projects as we're going to deal with more ambiguous and unsystemized dataset in real life. However, this project is within reach for a beginner.  
As a beginner on path to become a professional data scientist, it's a must to familiarize oneself with a starting from scratch which means the project requires man to find insights from... not a sigle hint since this is simply the job of a data scientist. One has to understand the datasets were given to him by observe manually it before touching any computer tools, then himself has to find out ways to reveal helpful insights for the development and innovation of the organization he's working with. To do that, he has to start by asking what and why and many other WH questions to gradually recognize some powerful secrets but hidden. And, that's it, probably!