# U.S. Medical Insurance Costs

## Importing and Exploring the Dataset

In this step, we will import the medical insurance costs dataset and extract relevant columns for analysis. The dataset is stored in a CSV file named `insurance.csv`, which contains the following columns:

- `age`: Age of the primary beneficiary
- `sex`: Insurance contractor gender (male/female)
- `bmi`: Body mass index, providing an understanding of body fat
- `children`: Number of children/dependents covered by the insurance
- `smoker`: Smoking status (yes/no)
- `region`: The beneficiary’s residential area in the US (northeast, southeast, southwest, northwest)
- `charges`: Individual medical costs billed by health insurance

In [41]:
import csv

# Importing dataset
with open('insurance.csv') as insurance_data:
    data = csv.DictReader(insurance_data)

    ages = []
    bmis = []
    charges = []
    regions = []
    smokers = []

    for row in data:
        ages.append(float(row['age']))
        bmis.append(float(row['bmi']))
        charges.append(float(row['charges']))
        regions.append(row['region'])
        smokers.append(row['smoker'])

## Calculating the Average Age

In this step, we will create a function to calculate the average age of the individuals in the dataset. This will help us understand the central tendency of the age distribution among the beneficiaries.

### Steps:
1. Define a function `calculate_average_age` that takes a list of ages as input.
2. Calculate the sum of all ages.
3. Divide the sum by the number of ages to get the average.
4. Return the calculated average age.

This function will provide a simple yet insightful statistic about the age distribution in our dataset.

In [42]:
def calculate_average_age(ages_list):
    """
    Calculate the average age from a list of ages.

    Parameters:
    ages (list of float): A list containing the ages of individuals.

    Returns:
    float: The average age of the individuals.
    """
    total_ages = sum(ages_list)
    average = total_ages / len(ages_list)
    return average


average_age = calculate_average_age(ages)
print(round(average_age))

39


## Determining the Majority Region

In this step, we will create a function to determine the region where the majority of individuals in the dataset come from. This will help us understand the distribution of individuals across different regions.

### Steps:
1. Define a function `calculate_majority_region` that takes a list of regions as input.
2. Use a dictionary to count the occurrences of each region.
3. Identify the region with the highest count.
4. Return the region with the highest count as the majority region.

The function will be documented using a docstring to provide clear information about its purpose, parameters, and return value.

In [43]:
def calculate_majority_region(regions_list):
    """
    Determine the region where the majority of individuals come from.

    Parameters:
    regions_list (list of str): A list containing the regions of individuals.

    Returns:
    str: The region with the highest number of individuals.
    """
    regions_count = {}
    for region in regions_list:
        if region in regions_count:
            regions_count[region] += 1
        else:
            regions_count[region] = 1

    majority = max(regions_count, key=regions_count.get)
    return majority


majority_region = calculate_majority_region(regions)
print(majority_region)

southeast


## Calculating Cost Difference Between Smokers and Non-Smokers

In this step, we will create a function to calculate the difference in average medical insurance costs between smokers and non-smokers. This will help us understand the financial impact of smoking on medical insurance costs.

### Steps:
1. Define a function `calculate_cost_difference` that takes a list of smoker statuses and a list of charges as input.
2. Separate the charges into two lists: one for smokers and one for non-smokers.
3. Calculate the average charge for smokers and the average charge for non-smokers.
4. Compute the difference between the average charges.
5. Return the cost difference.

In [48]:
def calculate_cost_difference(smokers_charges_zip):
    smokers_charges = []
    non_smokers_charges = []
    
    for charge in smokers_charges_zip:
        if charge[0] == 'yes':
            smokers_charges.append(charge[1])
        else:
            non_smokers_charges.append(charge[1])
        
    average_smokers_charges = sum(smokers_charges) / len(smokers_charges)
    average_non_smokers_charges = sum(non_smokers_charges) / len(non_smokers_charges)
    smokers_cost_difference = average_smokers_charges - average_non_smokers_charges
    
    return abs(smokers_cost_difference)


smokers_charges_joined = zip(smokers, charges)
cost_difference = calculate_cost_difference(smokers_charges_joined)
print(cost_difference)

23615.96353367665
