<a href="https://colab.research.google.com/github/favour-osawaru/python-projects/blob/main/us_medical_insurance_costs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# U.S. Medical Insurance Costs

This project analyzes a US medical insurance costs dataset
For the scope of this project i would be using intermiedate python to perform descriptive analysis to find out the following

*   Find out the average age of the patients in the dataset.
*   Analyze where a majority of the individuals are from.
*   Look at the different costs between smokers vs. non-smokers.
*  Figure out what the average age is for someone who has at least one child in this dataset.






**The variables in this dataset include:**


age: Age of the primary beneficiary.
sex: Gender of the primary beneficiary (male/female).
bmi: Body mass index, providing an understanding of whether the individual is underweight, normal weight, overweight, or obese.
children: Number of children/dependents covered by insurance.
smoker: Smoking status of the individual (yes/no).
region: Residential area of the beneficiary (northwest, southeast, southwest, northeast).
charges: Individual medical costs billed by health insurance.


In [3]:
# used to import the csv file
import csv
data =[]
with open('insurance.csv', newline='') as insurance:
    reader = csv.DictReader(insurance)
    for row in reader:
        data.append(row)

# Display the first few rows
for row in data[:5]:
    print(row)

{'age': '19', 'sex': 'female', 'bmi': '27.9', 'children': '0', 'smoker': 'yes', 'region': 'southwest', 'charges': '16884.924'}
{'age': '18', 'sex': 'male', 'bmi': '33.77', 'children': '1', 'smoker': 'no', 'region': 'southeast', 'charges': '1725.5523'}
{'age': '28', 'sex': 'male', 'bmi': '33', 'children': '3', 'smoker': 'no', 'region': 'southeast', 'charges': '4449.462'}
{'age': '33', 'sex': 'male', 'bmi': '22.705', 'children': '0', 'smoker': 'no', 'region': 'northwest', 'charges': '21984.47061'}
{'age': '32', 'sex': 'male', 'bmi': '28.88', 'children': '0', 'smoker': 'no', 'region': 'northwest', 'charges': '3866.8552'}


Data Preprocessing:
Convert categorical variables to numerical values.

In [6]:
# mapping for categorical variables
for row in data:
    row['sex'] = 1 if row['sex'] == 'male' else 0
    row['smoker'] = 1 if row['smoker'] == 'yes' else 0
    row['region'] = ['southwest', 'southeast', 'northwest', 'northeast'].index(row['region'])



ValueError: 0 is not in list

In [7]:
# Display the updated first few rows
for row in data[:5]:
    print(row)

{'age': '19', 'sex': 0, 'bmi': '27.9', 'children': '0', 'smoker': 0, 'region': 0, 'charges': '16884.924'}
{'age': '18', 'sex': 1, 'bmi': '33.77', 'children': '1', 'smoker': 0, 'region': 1, 'charges': '1725.5523'}
{'age': '28', 'sex': 1, 'bmi': '33', 'children': '3', 'smoker': 0, 'region': 1, 'charges': '4449.462'}
{'age': '33', 'sex': 1, 'bmi': '22.705', 'children': '0', 'smoker': 0, 'region': 2, 'charges': '21984.47061'}
{'age': '32', 'sex': 1, 'bmi': '28.88', 'children': '0', 'smoker': 0, 'region': 2, 'charges': '3866.8552'}


Exploratory Data Analysis:
Calculate basic descriptive statistics.


In [8]:
# Convert data to appropriate types
for row in data:
    row['age'] = int(row['age'])
    row['bmi'] = float(row['bmi'])
    row['children'] = int(row['children'])
    row['charges'] = float(row['charges'])

# Calculate mean, median, and standard deviation for numerical features
ages = [row['age'] for row in data]
bmis = [row['bmi'] for row in data]
children = [row['children'] for row in data]
charges = [row['charges'] for row in data]

mean_age = sum(ages) / len(ages)
mean_bmi = sum(bmis) / len(bmis)
mean_children = sum(children) / len(children)
mean_charges = sum(charges) / len(charges)

print(f'Mean Age: {mean_age}')
print(f'Mean BMI: {mean_bmi}')
print(f'Mean Children: {mean_children}')
print(f'Mean Charges: {mean_charges}')


Mean Age: 39.20702541106129
Mean BMI: 30.663396860986538
Mean Children: 1.0949177877429
Mean Charges: 13270.422265141257


Analysis of Insurance Dataset


1. Distribution of Individuals by Region
First, we need to determine where the majority of the individuals are from by counting the occurrences of each region.

In [17]:
# Counting individuals in each region
region_counts = {'southwest': 0, 'southeast': 0, 'northwest': 0, 'northeast': 0}

for row in data:
    region_index = row['region']
    region_name = ['southwest', 'southeast', 'northwest', 'northeast'][region_index]
    region_counts[region_name] += 1

# Display the counts
for region, count in region_counts.items():
    print(f'{region.capitalize()}: {count} individuals')


Southwest: 325 individuals
Southeast: 364 individuals
Northwest: 325 individuals
Northeast: 324 individuals


Costs Comparison Between Smokers and Non-Smokers
Next, we compare the costs between smokers and non-smokers by calculating the average charges for each group.



In [18]:
# Separating charges for smokers and non-smokers
smoker_charges = []
non_smoker_charges = []

for row in data:
    if row['smoker'] == 1:
        smoker_charges.append(row['charges'])
    else:
        non_smoker_charges.append(row['charges'])

# Calculating average charges
average_smoker_charges = sum(smoker_charges) / len(smoker_charges)
average_non_smoker_charges = sum(non_smoker_charges) / len(non_smoker_charges)

print(f'Average charges for smokers: ${average_smoker_charges:.2f}')
print(f'Average charges for non-smokers: ${average_non_smoker_charges:.2f}')

Average charges for smokers: $32105.78
Average charges for non-smokers: $8442.20


 Average Age of Individuals with at Least One Child
Finally, we calculate the average age of individuals who have at least one child in the dataset.

In [19]:
# Finding ages of individuals with at least one child
ages_with_children = [row['age'] for row in data if row['children'] >= 1]

# Calculating the average age
average_age_with_children = sum(ages_with_children) / len(ages_with_children)

print(f'Average age of individuals with at least one child: {average_age_with_children:.2f} years')


Average age of individuals with at least one child: 39.78 years
