# U.S. Medical Insurance Costs

In this project, a **CSV** file with medical insurance costs will be investigated using Python fundamentals. The goal with this project will be to analyze various attributes within **insurance.csv** to learn more about the patient information in the file and gain insight into potential use cases for the dataset.

In [5]:
import csv

# Initialize a list to hold the dictionaries
data_list = []

# Read the CSV file
with open('insurance.csv', mode='r') as file:
    reader = csv.DictReader(file)
    for row in reader:
        # Convert specific fields to appropriate data types
        row['age'] = int(row['age'])
        row['bmi'] = float(row['bmi'])
        row['children'] = int(row['children'])
        row['charges'] = float(row['charges'])
        data_list.append(row)

# Convert list of dictionaries to a dictionary with an identifier as key (e.g., row index)
data_dict = {i: row for i, row in enumerate(data_list)}


To start the necessarry csv library was imported to read in the insurance file.  The `insurance.csv` file is read and converted into a structured dictionary. Each row is processed to convert specific fields (`age`, `bmi`, `children`, `charges`) to the correct data types. These rows are first stored in a list, then transformed into a dictionary with row indices as keys. This structured format facilitates easier data manipulation and analysis.

Now the data is organized and ready to analyze, the next step is to define functions to begin investigating trends in the data.

In [6]:
def average_variable(insurance_x):
    total = 0
    for record in data_dict:
        variable = data_dict[record][insurance_x]
        total += variable
    average = total/len(data_dict)
    return average
print("The average cost of insurance is: " + str(round(average_variable('charges'),2)))
print("The average age: " + str(round(average_variable('age'),2)))
print("The average number of children is: " + str(round(average_variable('children'),2)))
print("The average bmi is: " + str(round(average_variable('bmi'),2)))

The average cost of insurance is: 13270.42
The average age: 39.21
The average number of children is: 1.09
The average bmi is: 30.66


The function `average_variable` was defined. The function calculates the average of a specified variable (insurance_x) from the data_dict. It then prints the average insurance cost, age, number of children, and BMI, rounded to two decimal places.

In [7]:
def average_bmi_by_sex(data_dict):
    total_male_bmi = 0
    total_males = 0
    total_female_bmi = 0
    total_females = 0
    for record in data_dict:
        sex = data_dict[record]['sex']
        if sex == 'male':
          total_male_bmi += data_dict[record]['bmi']
          total_males += 1
        elif sex == 'female':
          total_female_bmi += data_dict[record]['bmi']
          total_females += 1
    average_male_bmi = round(total_male_bmi/total_males,2)
    average_female_bmi = round(total_female_bmi/total_females,2)
    return(average_male_bmi, average_female_bmi)
print(average_bmi_by_sex(data_dict)) 

(30.94, 30.38)


The function, `average_bmi_by_sex`, calculates and returns the average BMI for males and females separately from the `data_dict`. It then prints these average BMIs, rounded to two decimal places.

In [8]:
def bmi_by_region(data_dict):
    region_bmi = {}
    total_southwest_bmi = 0
    total_southwest_records = 0
    total_southeast_bmi = 0
    total_southeast_records = 0
    total_northwest_bmi = 0
    total_northwest_records = 0
    total_northeast_bmi = 0
    total_northeast_records = 0
    for record in data_dict:
      region = data_dict[record]['region']
      bmi = data_dict[record]['bmi']
      if region not in region_bmi:
        region_bmi[region] = 0
      elif region in region_bmi:
        if region == 'southwest':
            total_southwest_bmi += bmi
            total_southwest_records += 1
            region_bmi['southwest'] = round(total_southwest_bmi/total_southwest_records, 2)
        if region == 'southeast':
            total_southeast_bmi += bmi
            total_southeast_records += 1
            region_bmi['southeast'] = round(total_southeast_bmi/total_southeast_records, 2)
        if region == 'northwest':
            total_northwest_bmi += bmi
            total_northwest_records += 1
            region_bmi['northwest'] = round(total_northwest_bmi/total_northwest_records, 2)
        if region == 'northeast':
            total_northeast_bmi += bmi
            total_northeast_records += 1
            region_bmi['northeast'] = round(total_northeast_bmi/total_northeast_records, 2)
    return region_bmi
print(bmi_by_region(data_dict))

{'southwest': 30.6, 'southeast': 33.35, 'northwest': 29.22, 'northeast': 29.17}


This code defines a function, `bmi_by_region`, which calculates the average BMI for each region (southwest, southeast, northwest, and northeast) from the provided data_dict. It iterates through the records, accumulating the total BMI and the count of records for each region. The averages are rounded to two decimal places and stored in a dictionary, which is then returned and printed.

# Summary

The provided code snippets collectively demonstrate the process of reading, processing, and analyzing data from a CSV file named `insurance.csv`. The first snippet reads the CSV file and converts its contents into a dictionary of dictionaries (`data_dict`) with row indices as keys, ensuring proper data types for specific fields like `age`, `bmi`, `children`, and `charges`. 

The second snippet defines a function, `average_variable`, to calculate the average of a specified variable (e.g., `charges`, `age`, `children`, `bmi`) across all records in `data_dict` and prints the results.

The third snippet introduces a function, `average_bmi_by_sex`, which computes the average BMI separately for males and females using `data_dict`, and prints these averages.

The final snippet defines a function, `bmi_by_region`, which calculates the average BMI for each region (southwest, southeast, northwest, and northeast) by iterating through `data_dict`, and returns a dictionary with these regional averages, which is then printed.

Together, these snippets illustrate a comprehensive workflow for reading, processing, and analyzing CSV data, including type conversion, averaging values, and categorizing data based on specific criteria like sex and region.

In [9]:
getcwd()

NameError: name 'getcwd' is not defined