# U.S. Medical Insurance Costs

In this project I am going to take a look a medical data costs and its relationship with a persons' attributes. I am going to use Python as the main tool for the analysis. The following objectives are that I am going to explore with this data:

* Check male and female data
* Average insurance cost
* Average savings a smoker can have if they would quit smoking
* Does it exist a region where the insurance is the most expensive, and which one is it?



In [2]:
# import csv library since the data is in this format
import csv

# Also, import locale library to format currency
import locale
locale.setlocale( locale.LC_ALL, 'en_US' )

# Create lists to store data to analyze
ages = []
sexes = []
bmis = []
num_children = []
smoker_statuses = []
regions = []
insurance_charges = []


Next, populate the empty lists with the data. With these lists we can calculate averages, max and min of the different numerical data. Also, we can count the incidence in different categorical attributes.

In [3]:
def load_file(file):
  # Open file given filename
  with open(file) as medical_costs_csv:
    medical_costs_reader = csv.DictReader(medical_costs_csv)
    # Create lists with each column from the file
    for row in medical_costs_reader:
      ages.append(row["age"])
      sexes.append(row["sex"])
      bmis.append(row["bmi"])
      num_children.append(row["children"])
      smoker_statuses.append(row["smoker"])
      regions.append(row["region"])
      insurance_charges.append(row["charges"])
  
load_file("insurance.csv")

### How many Females and Males in data:

In [4]:
def analyze_sexes():
  # initialize variable for the count
  female = 0
  male = 0
  # loop through sexes list
  for s in sexes:
    if s == "female":
      female += 1
    else:
      male += 1

  print(f"There are {female} females and {male} males in the data")

analyze_sexes()

There are 662 females and 676 males in the data


### Calculate average Age of patients in Data

In [5]:
def calculate_avg(data):
  total = 0
  for d in data:
    total += float(d)
  return round(total/len(data), 2)

print(f"Average patient age is: {calculate_avg(ages)} years")


Average patient age is: 39.21 years


### Average savings between non-smoker and smoker

In [6]:
def average_savings_nonsmokers_vs_smoker():
  # Create a list pairing smorker status and cost for easier manipulation
  smoker_status_cost = list(zip(smoker_statuses, insurance_charges))

  smoker_total_cost = 0
  smoker_avg_cost = 0
  non_smoker_total_cost = 0
  non_smoker_avg_cost = 0
  total_smokers = 0
  total_non_smokers = 0
  for i in smoker_status_cost:
    if i[0] == "yes":
      smoker_total_cost += float(i[1])
      total_smokers += 1
    else:
      non_smoker_total_cost += float(i[1])
      total_non_smokers += 1

  smoker_avg_cost = smoker_total_cost / total_smokers
  non_smoker_avg_cost = non_smoker_total_cost / total_non_smokers

  print(f"Average Smoker insurance cost:        {locale.currency(smoker_avg_cost, grouping=True)}")
  print(f"Average Non-smoker insurance cost:    {locale.currency(non_smoker_avg_cost, grouping=True)}")
  print("------------------------------------------------")
  print(f"Non-smoker vs Smoker savings are:     {locale.currency(smoker_avg_cost - non_smoker_avg_cost, grouping=True)}")

average_savings_nonsmokers_vs_smoker()

Average Smoker insurance cost:        $32,050.23
Average Non-smoker insurance cost:    $8,434.27
------------------------------------------------
Non-smoker vs Smoker savings are:     $23,615.96


### Find unique regions

For the most expensive region, lets first find the unique regions

In [7]:
def list_unique_regions():
    unique_regions = []

    for region in regions:
        if region not in unique_regions: 
            unique_regions.append(region)

    return unique_regions

list_unique_regions()

['southwest', 'southeast', 'northwest', 'northeast']

## Caculate and compare average cost
Now, lets compare each region's average cost

In [8]:
def avg_cost_per_region():
  region_costs = list(zip(regions, insurance_charges))

  unique_region_avg_cost = {}

  for region in list_unique_regions():
    unique_region_avg_cost[region] = 0
    unique_region_total_cost = 0
    count = 0
    for i in region_costs:
      if i[0] == region:
        unique_region_total_cost += float(i[1])
        count += 1
    unique_region_avg_cost[region] = unique_region_total_cost / count
    print(f"Average cost for {region}: {locale.currency(unique_region_avg_cost[region], grouping=True)}")

avg_cost_per_region()

Average cost for southwest: $12,346.94
Average cost for southeast: $14,735.41
Average cost for northwest: $12,417.58
Average cost for northeast: $13,406.38


Southeast region has the highest average insurance cost.