# U.S. Medical Insurance Costs

### 1. Goal | Research Questions (from Grok)
1. What is the average insurance charge for individuals under 30 years old?
2. How many individuals are smokers in the dataset?
3. What is the total number of children for all individuals in a specific region (e.g., “southwest”)?
4. Which sex (male or female) has the higher average insurance charge?
5. How many individuals have a BMI greater than 30?
6. What is the highest insurance charge for nonsmokers?
7. How many individuals in each region have exactly 0 children?
8. What is the average age of smokers versus nonsmokers?
9. How many individuals have charges above $10,000 and are female?
10. What is the lowest insurance charge for individuals with more than 2 children?

### 2. Prepare Data for Analysis

#### 2.1 Create List of Insurance Data

In [1]:
import csv

insurance_data = []
with open("insurance.csv") as insurance_csv:
    insurance_reader = csv.DictReader(insurance_csv)
    for row in insurance_reader:
        insurance_data.append(row)
#print(insurance_data)
#print(len(insurance_data))
#print(insurance_data[0].keys())

#### 2.2 Create Lists for Individual Features 

In [2]:
ages = []
sexes = []
bmis = []
num_children = []
smoker_status = []
regions = []
all_charges = []

for record in insurance_data:
    ages.append(int(record["age"]))
    sexes.append(record["sex"])
    bmis.append(float(record["bmi"]))
    num_children.append(int(record["children"]))
    smoker_status.append(record["smoker"])
    regions.append(record["region"])
    all_charges.append(float(record["charges"]))

#print(ages)
#print(sexes)
#print(bmis)
#print(num_children)
#print(smoker_status)
#print(regions)
#print(all_charges)

In [3]:
insurance_db = []

for i in range(len(all_charges)):
    record = {}
    
    record["age"] = ages[i]
    record["sex"] = sexes[i]
    record["bmi"] = bmis[i]
    record["children"] = num_children[i]
    record["smoker"] = smoker_status[i]
    record["region"] = regions[i]
    record["charges"] = all_charges[i]
    insurance_db.append(record)
#print(insurance_db)


### 3. Analysis

#### Question 1: What is the average insurance charge for individuals under 30 years old?

In [4]:
sum_charges_under_30_years_old = 0
num_charges_under_30_years_old = 0
for i in range(len(ages)):
    if ages[i] < 30:
        sum_charges_under_30_years_old += all_charges[i]
        num_charges_under_30_years_old += 1
        
average_charge_under_30_years_old = round(sum_charges_under_30_years_old / num_charges_under_30_years_old)
print(average_charge_under_30_years_old)

9182


##### Answer 1: The average charge for individuals under 30 years old is 9182 dollars.

#### Question 2: How many individuals are smokers in the dataset?

In [5]:
num_smokers = 0
for i in range(len(smoker_status)):
    if smoker_status[i] == "yes":
        num_smokers += 1

print(num_smokers)
    

274


##### Answer 2: There are 274 individuals who are smokers.

#### Question 3: What is the total number of children for all individuals in a specific region (e.g., “southwest”)?

In [6]:
children_per_region = {}

for record in insurance_data:
    region = record["region"]
    if region in children_per_region:
        children_per_region[region] += int(record["children"])
    else:
        children_per_region[region] = int(record["children"])


print(children_per_region)

{'southwest': 371, 'southeast': 382, 'northwest': 373, 'northeast': 339}


##### Answer 3: There are 371 children in the southwest, 382 in the southeast, 373 in the northwest and 339 in the northeast.

#### Question 4: Which sex (male or female) has the higher average insurance charge?

In [7]:
male_total_charge = 0
female_total_charge = 0
male_count = 0
female_count = 0

for i in range(len(sexes)):
    if sexes[i] == "male":
        male_total_charge += all_charges[i]
        male_count += 1
    elif sexes[i] == "female":
        female_total_charge += all_charges[i]
        female_count += 1

male_avg_charge = round(male_total_charge / male_count)
female_avg_charge = round(female_total_charge / female_count)
print(male_avg_charge)
print(female_avg_charge)

13957
12570


##### Answer 4: The average charge is 13,957 USD for men and 12,570 USD for women. So men have a higher average insurance charge.

#### Question 5: How many individuals have a BMI greater than 30?

In [8]:
def count_individuals_bmi_over_30(insurance_db):
    count = 0
    for record in insurance_db:
        if record["bmi"] > 30:
            count += 1
    return count

count_individuals_bmi_over_30 = count_individuals_bmi_over_30(insurance_db)
print(count_individuals_bmi_over_30)

705


##### Answer 5: There are 705 individuals with a BMI over 30

#### Question 6: What is the highest insurance charge for nonsmokers?

In [9]:
highest_charge_nonsmokers = 0
for record in insurance_db:
    if record["smoker"] == "no" and record["charges"] > highest_charge_nonsmokers:
        highest_charge_nonsmokers = record["charges"]
highest_charge_nonsmokers = round(highest_charge_nonsmokers)
print(highest_charge_nonsmokers)

36911


##### Answer 6: The highest insurance charge for nonsmokers is 36,911 USD.

#### Question 7: How many individuals in each region have exactly 0 children?

In [10]:
def individuals_with_no_children_per_region(insurance_db):
    individuals_per_region = {}
    for record in insurance_db:
        if record["children"] > 0:
            continue
        region = record["region"]
        if region in individuals_per_region:
            individuals_per_region[region] += 1
        else:
            individuals_per_region[region] = 1
    return individuals_per_region

individuals_with_no_children_per_region = individuals_with_no_children_per_region(insurance_db)
print(individuals_with_no_children_per_region)
        

{'southwest': 138, 'northwest': 132, 'southeast': 157, 'northeast': 147}


##### Answer 7: In the southwest there are 138 individuals with no children, in the northwest 132, in the southeast 157 and in the northeast 147.

#### Question 8: What is the average age of smokers versus nonsmokers?

In [11]:
def average_age_smoker(insurance_db):
    total_age = 0
    smoker_count = 0
    for record in insurance_db:
        if record["smoker"] == "yes":
            
            total_age += record["age"]
            smoker_count += 1
    print(total_age)
    print(smoker_count)
    return round(total_age / smoker_count,2)

def average_age_nonsmoker(insurance_db):
    total_age = 0
    nonsmoker_count = 0
    for record in insurance_db:
        if record["smoker"] == "no":
            total_age += record["age"]
            nonsmoker_count += 1
    print(total_age)
    print(nonsmoker_count)
    return round(total_age / nonsmoker_count,2)

average_age_smoker = average_age_smoker(insurance_db)
average_age_nonsmoker = average_age_nonsmoker(insurance_db)
print("The average age of a smoker is " + str(average_age_smoker))
print("The average age of a non-smoker is " + str(average_age_nonsmoker))

10553
274
41906
1064
The average age of a smoker is 38.51
The average age of a non-smoker is 39.39


##### Answer 8: The average age of smokers and non-smokers is 39 years for both.

#### Question 9: How many individuals have charges above $10,000 and are female?

In [12]:
females_over_10k_charges = 0

for record in insurance_db: 
    if record["sex"] == "female" and record["charges"] > 10000:
        females_over_10k_charges += 1

print(females_over_10k_charges)

307


##### Answer 9: There are 307 females with charges over 10,000 USD.

#### Question 10: What is the lowest insurance charge for individuals with more than 2 children?

In [13]:
lowest_charge = 9999999

for record in insurance_db:
    if record["children"] > 2 and record["charges"] < lowest_charge:
        lowest_charge = round(record["charges"])
print(lowest_charge)

3443


##### Answer 10: The lowest charge for individuals with more than 2 children is 3443 USD.