# U.S. Medical Insurance Costs

## Goals
- Analyze a dataset by building out functions or class methods
- Use libraries to assist in your analysis
- Optional: Document and organize your findings
- Optional: Make predictions about a dataset’s features based on your findings


## Data
Top 4 rows

![image.png](attachment:ea95cbaf-8fee-4b21-a8e0-e942d546d4ed.png)

## Analysis
- avarage cost by region
- avarage cost by age groups young, middle and older
- avarage by gender


In [27]:
# import data from csv
import csv



In [28]:
#Create empty lists for the various attributes in insurance.csv
ages = []
sexes = []
bmis = []
num_children = []
smoker_statuses = []
regions = []
insurance_charges = []

**insurance.csv** contains the following columns:
* Patient Age
* Patient Sex 
* Patient BMI
* Patient Number of Children
* Patient Smoking Status
* Patient U.S Geopraphical Region
* Patient Yearly Medical Insurance Cost

There are no signs of missing data. To store this information, seven empty lists will be created hold each individual column of data from **insurance.csv**.


In [29]:
# helper function to load csv data
def load(list_to_load, column_name):
    
    with open("insurance.csv", newline='') as data_csv:
        insurance_data = csv.DictReader(data_csv)
        for row in insurance_data:
            # add the data from each row to a list
            list_to_load.append(row[column_name])

In [30]:
# load lists
load(ages,  'age')
load(sexes, 'sex')
load(bmis, 'bmi')
load(num_children, 'children')
load(smoker_statuses, 'smoker')
load(regions,  'region')
load(insurance_charges, 'charges')

In [34]:
# common functions
def calculate_total(list_data):
    total = 0
    for item in list_data:
        total += item
    return total
def calculate_avg(list_data):
    return calculate_total(list_data)/len(list_data)

# cost by gender
def calculate_avg_cost_by_gender():
    result = dict()
    f_cost = []
    m_cost = []
    for row in data:
        cost = row['charges']
        if row['sex'] == 'female': f_cost.append(float(cost))
        else: m_cost.append(float(cost))
    
    result['male_avg'] = calculate_avg(m_cost)
    result['female_avg'] = calculate_avg(f_cost)
    print("Result of Avarage Cost by Gender")
    print("================================")
    print(result)

# cost by gender
calculate_avg_cost_by_gender()


    

Result of Avarage Cost by Gender
{'male_avg': 13956.751177721886, 'female_avg': 12569.57884383534}


In [24]:
# cost by region
def calculate_avg_cost_by_region():
    regions = dict()
    result = dict()
    for row in data:
        region = row['region']
        cost = row['charges']
        # if a region found first time create a empty list
        if region not in regions:
            regions[region] = []
        
        regions[region].append(float(cost)) 
    # get region avarages
    for r in regions:
        result[r] = calculate_avg(regions[r])
    
    print("Result of Avarage Cost by Region")
    print("================================")
    print(result)
        
calculate_avg_cost_by_region()

{'southwest': 12346.93737729231, 'southeast': 14735.411437609895, 'northwest': 12417.575373969228, 'northeast': 13406.3845163858}


In [33]:
# avarage cost by age groups
# young < 30
# middle > 30 and < 50
# old > 50
def calculate_cost_by_age_group():
    age_groups = {"young": [],"middle": [], "old":[]}
    result = dict()
    for i in range(len(data)):
        cost = float(insurance_charges[i])
        if int(ages[i]) < 30: age_groups["young"].append(cost)
        elif int(ages[i]) > 30 and int(ages[i]) < 50: age_groups["middle"].append(cost)
        else: age_groups["old"].append(cost)
        
    for age_group in age_groups:
        result[age_group] = calculate_avg(age_groups[age_group])
    
    print("Result of Avarage Cost by Age Group")
    print("================================")
    print(result)
        
calculate_cost_by_age_group()


Result of Avarage Cost by Age Group
{'young': 9182.487125153473, 'middle': 13145.047805127702, 'old': 17562.860501844654}


# Results
- more complicated analysis might be done.
- by age group insurance cost is significantly increasing
- Where people live doesnt have much effect on insurance base on regions
- Males pay more insurance than females