<a href="https://colab.research.google.com/github/hfatima2/HDS-assignments/blob/main/midterm/midterm_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Mid-term for HDS5210

Your supervisor is concerned about 4-year survival risks for COPD. She has asked for you to do some analysis using a new metric, BODE. BODE is an improvement on a previous metric and promises to provide insight on survival risks.

BODE is defined here. https://www.mdcalc.com/calc/3916/bode-index-copd-survival#evidence

Your assignment is to create a BODE calculation, use it to calculate BODE scores and BODE survival rates for a group of patients. Then we want to evaluate the average BODE scores and BODE survival rates for each area hospital.

Your patient input file will have the following columns:
NAME,SSN,LANGUAGE,JOB,HEIGHT_M,WEIGHT_KG,fev_pct,dyspnea_description,distance_in_meters,hospital

BODE calculations require a BMI value, so you will have to create a function for it.

Your output should be in the form of two CSV files, patient_output.csv and hospital_output.csv.

Patient_output will have the following columns:
NAME,BODE_SCORE,BODE_RISK,HOSPITAL

Hospital output will have the following columns:
HOSPITAL_NAME, COPD_COUNT, PCT_OF_COPD_CASES_OVER_BEDS, AVG_SCORE, AVG_RISK

Each function you create should have documentation and a suitable number of test cases. If the input data could be wrong, make sure to raise a Value Error.

For this assignment, use the doctest, json, and csv libraries. Pandas is not allowed for this assignment.

In [232]:
import doctest
import json
import csv

### Step 1: Calculate BMI

In [233]:
def getBMI(weight_kg, height_m):
  """
  Calculate Body Mass Index based on weight in kilograms and height in meters.

  Parameters:
  weight_kg (float): Weight in kilograms.
  height_m (float): Height in meters.

  Returns:
  float: BMI value.

  Example:
  >>> getBMI(70, 1.75)
  22.857142857142858

  >>> getBMI(80, 1.8)
  24.691358024691358

  >>> getBMI(60, 1.6)
  23.437499999999996

  >>> getBMI(0, 1.75)
  Traceback (most recent call last):
      ...
  ValueError: Weight and height must be greater than zero.

  """
  #check if weight and height are greater than zero
  if weight_kg <= 0 or height_m <= 0:
    raise ValueError("Weight and height must be greater than zero.")

  #calculate bmi by dividing weight in kilograms by the squared height
  bmi = weight_kg / (height_m ** 2)

  #return calculated BMI
  return bmi

In [234]:
assert round(getBMI(70, 1.75), 2) == 22.86
assert round(getBMI(80, 1.8), 2) == 24.69
assert round(getBMI(60, 1.6),2) == 23.44

In [235]:
import doctest
doctest.run_docstring_examples(getBMI, globals(), verbose=True)

Finding tests in NoName
Trying:
    getBMI(70, 1.75)
Expecting:
    22.857142857142858
ok
Trying:
    getBMI(80, 1.8)
Expecting:
    24.691358024691358
ok
Trying:
    getBMI(60, 1.6)
Expecting:
    23.437499999999996
ok
Trying:
    getBMI(0, 1.75)
Expecting:
    Traceback (most recent call last):
        ...
    ValueError: Weight and height must be greater than zero.
ok


### Step 2: Calculate BODE Score

In [236]:
def calculate_bode_score(fev1_percent, dyspnea_description, six_minute_walk_distance, bmi):
    """
    Calculate the BODE score based on FEV1%, dyspnea description,
    six-minute walk distance, and BMI.

    Parameters:
    fev1_percent (float): FEV1% value.
    dyspnea_description (int): Dyspnea score acc to description (mMRC scale).
    six_minute_walk_distance (float): Six- minute walk distance in meters.
    BMI (float): BMI value.

    Returns:
    int: BODE score.

    Raises:
    ValueError: If the FEV1%, six-minute walk distance, or BMI is negative.

    Examples:
    >>> calculate_bode_score(70, "ONLY STRENUOUS EXERCISE", 350, 25)
    0
    >>> calculate_bode_score(55, "WALKING UPHILL", 300, 22)
    2
    >>> calculate_bode_score(40, "STOPS AFTER A FEW MINUTES", 100, 18)
    8
    >>> calculate_bode_score(30, "UNABLE TO LEAVE HOME", 50, 20)
    10

    """
    # Check for negative values
    if fev1_percent < 0 or six_minute_walk_distance < 0 or bmi < 0:
        raise ValueError("FEV1%, six-minute walk distance, and BMI cannot be negative.")

    #FEV1% score
    if fev1_percent >= 65:
        fev1_score = 0
    elif 50 <= fev1_percent <= 64:
        fev1_score = 1
    elif 36 <= fev1_percent <= 49:
        fev1_score = 2
    else:  # fev1_percent <= 35
        fev1_score = 3

    #Dyspnea scale based on descriptions
    dyspnea_scale = 0
    if dyspnea_description == "ONLY STRENUOUS EXERCISE":
        dyspnea_scale = 0
    elif dyspnea_description == "WALKING UPHILL" or dyspnea_description == "WHEN HURRYING":
        dyspnea_scale = 1
    elif dyspnea_description == "STOPS WHEN WALKING AT PACE" or dyspnea_description == "SLOWER THAN PEERS":
        dyspnea_scale = 2
    elif dyspnea_description == "STOPS AFTER 100 YARDS" or dyspnea_description == "STOPS AFTER A FEW MINUTES":
        dyspnea_scale = 3
    elif dyspnea_description == "BREATHLESS WHEN DRESSING" or dyspnea_description == "UNABLE TO LEAVE HOME":
        dyspnea_scale = 4

    # Dyspnea score (mMRC scale)
    if dyspnea_scale in [0, 1]:
        dyspnea_score = 0
    elif dyspnea_scale == 2:
        dyspnea_score = 1
    elif dyspnea_scale == 3:
        dyspnea_score = 2
    else:  # dyspnea_scale == 4
        dyspnea_score = 3

    # 6-Minute Walk Test score
    if six_minute_walk_distance >= 350:
        walk_score = 0
    elif 250 <= six_minute_walk_distance <= 349:
        walk_score = 1
    elif 150 <= six_minute_walk_distance <= 249:
        walk_score = 2
    else:  # six_minute_walk_distance <= 149
        walk_score = 3

    # BMI score
    bmi_score = 0 if bmi >= 21 else 1

    # Calculate total BODE score
    bode_score = fev1_score + dyspnea_score + walk_score + bmi_score
    return bode_score

In [237]:
assert calculate_bode_score(70, "ONLY STRENUOUS EXERCISE", 350, 25) == 0
assert calculate_bode_score(55, "WALKING UPHILL", 300, 22) == 2
assert calculate_bode_score(40, "STOPS AFTER A FEW MINUTES", 100, 18) == 8
assert calculate_bode_score(30, "UNABLE TO LEAVE HOME", 50, 20) == 10

In [238]:
import doctest
doctest.run_docstring_examples(calculate_bode_score, globals(), verbose=True)

Finding tests in NoName
Trying:
    calculate_bode_score(70, "ONLY STRENUOUS EXERCISE", 350, 25)
Expecting:
    0
ok
Trying:
    calculate_bode_score(55, "WALKING UPHILL", 300, 22)
Expecting:
    2
ok
Trying:
    calculate_bode_score(40, "STOPS AFTER A FEW MINUTES", 100, 18)
Expecting:
    8
ok
Trying:
    calculate_bode_score(30, "UNABLE TO LEAVE HOME", 50, 20)
Expecting:
    10
ok


### Step 3: Calculate BODE Risk

In [239]:
def calculate_bode_risk(bode_score):
 """
Calculate the BODE risk based on the BODE score.

Parameters:
bode_score (int): BODE score.

Returns:
int: BODE risk.

Raises:
ValueError: If the BODE score is negative.

Risk categories
-0 to 2: 80%
-3 to 4: 67%
-5 to 6: 57%
-7 to 10: 18%

Examples:
>>> calculate_bode_risk(0)
80

>>> calculate_bode_risk(3)
67

>>> calculate_bode_risk(6)
57

>>> calculate_bode_risk(8)
18

>>> calculate_bode_risk(-1)
Traceback (most recent call last):
    ...
ValueError: BODE score cannot be negative.
 """
# Check if the BODE score is negative, which is not allowed, and raise an error if so
 if bode_score < 0:
    raise ValueError("BODE score cannot be negative.")

#calculate bode_score:
 if bode_score <= 2:
        return 80
 elif bode_score <= 4:
        return 67
 elif bode_score <= 6:
        return 57
 else:
        return 18

In [240]:
assert calculate_bode_risk(0) == 80
assert calculate_bode_risk(3) == 67
assert calculate_bode_risk(6) == 57
assert calculate_bode_risk(8) == 18

In [241]:
import doctest
doctest.run_docstring_examples(calculate_bode_risk, globals(), verbose=True)

Finding tests in NoName
Trying:
    calculate_bode_risk(0)
Expecting:
    80
ok
Trying:
    calculate_bode_risk(3)
Expecting:
    67
ok
Trying:
    calculate_bode_risk(6)
Expecting:
    57
ok
Trying:
    calculate_bode_risk(8)
Expecting:
    18
ok
Trying:
    calculate_bode_risk(-1)
Expecting:
    Traceback (most recent call last):
        ...
    ValueError: BODE score cannot be negative.
ok


### Step 4: Load Hospital Data

In [242]:
# import files
import json
import csv
# Open the JSON file containing hospital data
with open("hospitals.json") as f:
      hospital_data = json.load(f)


### Step 5: Main business logic

Call BODE Score, BODE Risk functions for each patient.

For each hospital, calculate Avg BODE score and Avg BODE risk and count the number of cases for each hospital.

In [243]:
import csv
import json

# File paths for input and output files
patient_csv = "patient.csv"
hospital_json = "hospitals.json"
patient_output_file = "patient_output.csv"
hospital_output_file = "hospital_output.csv"

# Load hospital data from JSON file and initialize a dictionary for hospital statistics
all_unique_hospitals = {}

# Open the hospitals.json file and process it
with open(hospital_json) as f:
    hospital_data = json.load(f)
    #Iterate over hospital systems and their hospitals
    for system in hospital_data:
        for hospital in system["hospitals"]:
            hospital_name = hospital["name"]  #extract hospital names
            beds = hospital["beds"]  # extract number of beds
            # Initialize the hospital's statistics: total BODE score, patient count, total BODE risk, and beds
            all_unique_hospitals[hospital_name] = {'sum_bode_score': 0, 'count': 0, 'total_bode_risk': 0, 'beds': beds}

# Preparing lists for patient and hospital output
patient_results = [['NAME', 'BODE_SCORE', 'BODE_RISK', 'HOSPITAL']]
hospital_output_list = [['HOSPITAL_NAME', 'COPD_COUNT', 'PCT_OF_COPD_CASES_OVER_BEDS', 'AVG_SCORE', 'AVG_RISK']]

# Processing each patient from the CSV file
with open(patient_csv) as p:
    patient_data = csv.reader(p)
    headers = next(patient_data)

     # Loop over each row in the patient data
    for row in patient_data:
      #Extract relevant data from each row
        try:
            hospital_name = row[headers.index('hospital')]
            name = row[headers.index('NAME')]
            fev_pct = float(row[headers.index('fev_pct')])
            dyspnea_description = row[headers.index('dyspnea_description')]
            distance_in_meters = float(row[headers.index('distance_in_meters')])
            weight_kg = float(row[headers.index('WEIGHT_KG')])
            height_m = float(row[headers.index('HEIGHT_M')])

            bmi = getBMI(weight_kg, height_m)
            bode_score = calculate_bode_score(fev_pct, dyspnea_description, distance_in_meters, bmi,)
            bode_risk = calculate_bode_risk(bode_score)

            # Update hospital statistics
            if hospital_name in all_unique_hospitals:
               all_unique_hospitals[hospital_name]['sum_bode_score'] += bode_score
               all_unique_hospitals[hospital_name]['count'] += 1
               all_unique_hospitals[hospital_name]['total_bode_risk'] += bode_risk

            # Adding the patient's results to the output
            patient_results.append([name, bode_score, bode_risk, hospital_name])
        #exception
        except ValueError as e:
            print(f"{row[headers.index('NAME')]}: {e}")

# Calculating statistics for each hospital
for hospital_name, stats in all_unique_hospitals.items():
    count = stats['count']
    beds = stats['beds']
    sum_bode_score = stats['sum_bode_score']
    total_bode_risk = stats['total_bode_risk']  # Get total BODE risk

# calculating average BODE score and Bode risk
    if count > 0:
        avg_bode_score = sum_bode_score / count
        avg_bode_risk = total_bode_risk / count  # Calculate average BODE risk directly
        pct_of_copd_cases_over_beds = (count / beds) * 100
    else:
        avg_bode_score = 0
        avg_bode_risk = 0
        pct_of_copd_cases_over_beds = 0

    # Adding hospital statistics to the output
    hospital_output_list.append([hospital_name, count, pct_of_copd_cases_over_beds, avg_bode_score, avg_bode_risk])

# Adding the patient results to CSV
with open(patient_output_file, 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerows(patient_results)

# Adding the hospital statistics to CSV
with open(hospital_output_file, 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerows(hospital_output_list)

In [244]:
print (hospital_output_list)

[['HOSPITAL_NAME', 'COPD_COUNT', 'PCT_OF_COPD_CASES_OVER_BEDS', 'AVG_SCORE', 'AVG_RISK'], ['BJC', 184, 9.2, 4.4021739130434785, 59.20108695652174], ['BJC WEST COUNTY', 171, 17.1, 4.12280701754386, 61.56140350877193], ['MISSOURI BAPTIST', 161, 20.125, 4.111801242236025, 60.7639751552795], ['SAINT LOUIS UNIVERSITY', 164, 16.400000000000002, 4.4939024390243905, 58.09756097560975], ["ST.MARY'S", 156, 31.2, 4.198717948717949, 60.76923076923077], ["ST.LUKE'S", 164, 20.5, 4.323170731707317, 59.33536585365854]]


In [245]:
print (all_unique_hospitals)

{'BJC': {'sum_bode_score': 810, 'count': 184, 'total_bode_risk': 10893, 'beds': 2000}, 'BJC WEST COUNTY': {'sum_bode_score': 705, 'count': 171, 'total_bode_risk': 10527, 'beds': 1000}, 'MISSOURI BAPTIST': {'sum_bode_score': 662, 'count': 161, 'total_bode_risk': 9783, 'beds': 800}, 'SAINT LOUIS UNIVERSITY': {'sum_bode_score': 737, 'count': 164, 'total_bode_risk': 9528, 'beds': 1000}, "ST.MARY'S": {'sum_bode_score': 655, 'count': 156, 'total_bode_risk': 9480, 'beds': 500}, "ST.LUKE'S": {'sum_bode_score': 709, 'count': 164, 'total_bode_risk': 9731, 'beds': 800}}


In [246]:
print (patient_results)

[['NAME', 'BODE_SCORE', 'BODE_RISK', 'HOSPITAL'], ['Vanessa Roberts', 3, 67, "ST.LUKE'S"], ['Christopher Fox', 3, 67, 'SAINT LOUIS UNIVERSITY'], ['Benjamin Johnston', 4, 67, 'BJC'], ['Christopher Hernandez', 3, 67, 'MISSOURI BAPTIST'], ['Valerie Burch', 0, 80, 'BJC WEST COUNTY'], ['Heather Hart', 3, 67, 'SAINT LOUIS UNIVERSITY'], ['Ronald Cobb', 4, 67, "ST.MARY'S"], ['Austin French', 6, 57, 'SAINT LOUIS UNIVERSITY'], ['Mary Leonard', 8, 18, 'BJC'], ['Mrs. Nicole Smith', 5, 57, "ST.MARY'S"], ['Ashley Warren', 8, 18, 'BJC'], ['Jeffrey Jacobson', 4, 67, 'BJC WEST COUNTY'], ['Angela Bauer', 5, 57, 'BJC WEST COUNTY'], ['Jerry Rogers', 5, 57, 'BJC'], ['Lisa Beck', 3, 67, 'BJC'], ['Bryan Pena', 6, 57, 'SAINT LOUIS UNIVERSITY'], ['Jessica Henderson', 3, 67, 'SAINT LOUIS UNIVERSITY'], ['Daniel Mitchell', 3, 67, 'MISSOURI BAPTIST'], ['Melanie Graham', 4, 67, 'BJC'], ['Deborah Jimenez', 5, 57, 'MISSOURI BAPTIST'], ['Kathryn Rasmussen', 1, 80, 'BJC WEST COUNTY'], ['Brian Leon', 4, 67, 'BJC'], ['Ro