# Medical Insurance Costs Analysis

## Introduction 

#### This project investigates a medical insurance costs dataset using Python. The goals of the analysis include:

##### Finding the average age of the patients in the dataset.
##### Analyzing the geographic distribution of the individuals.
##### Comparing insurance costs between smokers and non-smokers.
##### Determining the average age of someone with at least one child

In [5]:
import pandas as pd

# Load the dataset
file_path = 'insurance.csv'
insurance_data = pd.read_csv(file_path)

# Display the first few rows of the dataset
insurance_data.head()


Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,female,27.9,0,yes,southwest,16884.924
1,18,male,33.77,1,no,southeast,1725.5523
2,28,male,33.0,3,no,southeast,4449.462
3,33,male,22.705,0,no,northwest,21984.47061
4,32,male,28.88,0,no,northwest,3866.8552


#### Dataset Columns
#### age: Age of the primary beneficiary 
#### sex: Insurance contractor gender (female, male)
#### bmi: Body mass index, providing an understanding of body fat
#### children: Number of children covered by health insurance
#### smoker: Smoking status (yes, no)
#### region: Residential area in the US (northeast, southeast, southwest, northwest)
#### charges: Individual medical costs billed by health insurance

# Step 2: Understand the Dataset
#### check the data types of the columns and display summary statistics to get a basic understanding of the dataset

In [8]:
# Check the data types of the columns
insurance_data.info()

# Display summary statistics
insurance_data.describe()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1338 entries, 0 to 1337
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       1338 non-null   int64  
 1   sex       1338 non-null   object 
 2   bmi       1338 non-null   float64
 3   children  1338 non-null   int64  
 4   smoker    1338 non-null   object 
 5   region    1338 non-null   object 
 6   charges   1338 non-null   float64
dtypes: float64(2), int64(2), object(3)
memory usage: 73.3+ KB


Unnamed: 0,age,bmi,children,charges
count,1338.0,1338.0,1338.0,1338.0
mean,39.207025,30.663397,1.094918,13270.422265
std,14.04996,6.098187,1.205493,12110.011237
min,18.0,15.96,0.0,1121.8739
25%,27.0,26.29625,0.0,4740.28715
50%,39.0,30.4,1.0,9382.033
75%,51.0,34.69375,2.0,16639.912515
max,64.0,53.13,5.0,63770.42801


# Step 3: Analyze the Dataset

#### Find the Average Age of the Patients

In [12]:
# Calculate the average age of the patients
average_age = insurance_data['age'].mean()
print(f"The average age of the patients is {average_age:.2f} years.")


The average age of the patients is 39.21 years.


#### Analyze the Geographic Distribution of the Individuals

In [15]:
# Analyze where a majority of the individuals are from
region_counts = insurance_data['region'].value_counts()
print(region_counts)


region
southeast    364
southwest    325
northwest    325
northeast    324
Name: count, dtype: int64


#### Compare Costs Between Smokers and Non-Smokers

In [18]:
# Calculate the average insurance cost for smokers and non-smokers
average_cost_smokers = insurance_data[insurance_data['smoker'] == 'yes']['charges'].mean()
average_cost_non_smokers = insurance_data[insurance_data['smoker'] == 'no']['charges'].mean()

print(f"The average insurance cost for smokers is ${average_cost_smokers:.2f}.")
print(f"The average insurance cost for non-smokers is ${average_cost_non_smokers:.2f}.")


The average insurance cost for smokers is $32050.23.
The average insurance cost for non-smokers is $8434.27.


#### Average Age for Someone with at Least One Child

In [21]:
# Calculate the average age for someone who has at least one child
average_age_with_children = insurance_data[insurance_data['children'] > 0]['age'].mean()
print(f"The average age for someone with at least one child is {average_age_with_children:.2f} years.")


The average age for someone with at least one child is 39.78 years.


# Step 4: Organize the Data for Analysis
#### store some columns in variables for further analysis.

In [24]:
ages = insurance_data['age']
regions = insurance_data['region']
charges = insurance_data['charges']
smokers = insurance_data['smoker']
children = insurance_data['children']


# Step 5 Build Analysis Functions
#### build functions to modularize our analysis.

In [27]:
def calculate_average_age(data):
    return data['age'].mean()

def calculate_region_distribution(data):
    return data['region'].value_counts()

def compare_smoker_costs(data):
    smokers_cost = data[data['smoker'] == 'yes']['charges'].mean()
    non_smokers_cost = data[data['smoker'] == 'no']['charges'].mean()
    return smokers_cost, non_smokers_cost

def average_age_with_children(data):
    return data[data['children'] > 0]['age'].mean()


# Step 6 Perform the Analysis Using Functions
#### use the functions to perform the analysis.

In [30]:
# Calculate the average age
avg_age = calculate_average_age(insurance_data)
print(f"Average age: {avg_age:.2f} years")

# Calculate the region distribution
region_dist = calculate_region_distribution(insurance_data)
print("Region distribution:")
print(region_dist)

# Compare the costs for smokers and non-smokers
smoker_cost, non_smoker_cost = compare_smoker_costs(insurance_data)
print(f"Average cost for smokers: ${smoker_cost:.2f}")
print(f"Average cost for non-smokers: ${non_smoker_cost:.2f}")

# Calculate the average age for someone with at least one child
avg_age_children = average_age_with_children(insurance_data)
print(f"Average age for someone with at least one child: {avg_age_children:.2f} years")


Average age: 39.21 years
Region distribution:
region
southeast    364
southwest    325
northwest    325
northeast    324
Name: count, dtype: int64
Average cost for smokers: $32050.23
Average cost for non-smokers: $8434.27
Average age for someone with at least one child: 39.78 years


# Conclusion

#### Determined that the average age of patients is around 39.21 years.
#### Found that the majority of individuals are from the southeast region.
#### Observed that smokers have a higher average insurance cost:  (32050.23 dollars) compared to nonsmokers (8434.27 dollars).
#### Calculated that the average age for individuals with at least one child is around 39.78 years.