<a href="https://colab.research.google.com/github/hfatima2/HDS-assignments/blob/main/midterm/midterm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Mid-term for HDS5210

Your supervisor is concerned about 4-year survival risks for COPD. She has asked for you to do some analysis using a new metric, BODE. BODE is an improvement on a previous metric and promises to provide insight on survival risks.

BODE is defined here. https://www.mdcalc.com/calc/3916/bode-index-copd-survival#evidence

Your assignment is to create a BODE calculation, use it to calculate BODE scores and BODE survival rates for a group of patients. Then we want to evaluate the average BODE scores and BODE survival rates for each area hospital.

Your patient input file will have the following columns:
NAME,SSN,LANGUAGE,JOB,HEIGHT_M,WEIGHT_KG,fev_pct,dyspnea_description,distance_in_meters,hospital

BODE calculations require a BMI value, so you will have to create a function for it.

Your output should be in the form of two CSV files, patient_output.csv and hospital_output.csv.

Patient_output will have the following columns:
NAME,BODE_SCORE,BODE_RISK,HOSPITAL

Hospital output will have the following columns:
HOSPITAL_NAME, COPD_COUNT, PCT_OF_COPD_CASES_OVER_BEDS, AVG_SCORE, AVG_RISK

Each function you create should have documentation and a suitable number of test cases. If the input data could be wrong, make sure to raise a Value Error.

For this assignment, use the doctest, json, and csv libraries. Pandas is not allowed for this assignment.

In [158]:
import doctest
import json
import csv

### Step 1: Calculate BMI

In [159]:
def getBMI(weight_kg, height_m):
  """
  Calculate Body Mass Index based on weight in kilograms and height in meters.

  Parameters:
  weight_kg (float): Weight in kilograms.
  height_m (float): Height in meters.

  Returns:
  float: BMI value.

  Example:
  >>> getBMI(70, 1.75)
  22.857142857142858

  >>> getBMI(80, 1.8)
  24.691358024691358

  >>> getBMI(60, 1.6)
  23.437499999999996

  >>> getBMI(0, 1.75)
  Traceback (most recent call last):
      ...
  ValueError: Weight and height must be greater than zero.

  """
  #check if weight and height are greater than zero
  if weight_kg <= 0 or height_m <= 0:
    raise ValueError("Weight and height must be greater than zero.")

  #calculate bmi by dividing weight in kilograms by the squared height
  bmi = weight_kg / (height_m ** 2)

  #return calculated BMI
  return bmi

In [160]:
assert round(getBMI(70, 1.75), 2) == 22.86
assert round(getBMI(80, 1.8), 2) == 24.69
assert round(getBMI(60, 1.6),2) == 23.44

In [161]:
import doctest
doctest.run_docstring_examples(getBMI, globals(), verbose=True)

Finding tests in NoName
Trying:
    getBMI(70, 1.75)
Expecting:
    22.857142857142858
ok
Trying:
    getBMI(80, 1.8)
Expecting:
    24.691358024691358
ok
Trying:
    getBMI(60, 1.6)
Expecting:
    23.437499999999996
ok
Trying:
    getBMI(0, 1.75)
Expecting:
    Traceback (most recent call last):
        ...
    ValueError: Weight and height must be greater than zero.
ok


### Step 2: Calculate BODE Score

In [162]:
def calculate_bode_score(fev1_percent, dyspnea_description, six_minute_walk_distance, bmi):
    """
    Calculate the BODE score based on FEV1%, dyspnea description,
    six-minute walk distance, and BMI.

    Parameters:
    fev1_percent (float): FEV1% value.
    dyspnea_description (int): Dyspnea score acc to description (mMRC scale).
    six_minute_walk_distance (float): Six- minute walk distance in meters.
    BMI (float): BMI value.

    Returns:
    int: BODE score.

    Raises:
    ValueError: If the FEV1%, six-minute walk distance, or BMI is negative.

    Examples:
    >>> calculate_bode_score(70, "ONLY STRENUOUS EXERCISE", 350, 25)
    0
    >>> calculate_bode_score(55, "WALKING UPHILL", 300, 22)
    2
    >>> calculate_bode_score(40, "STOPS AFTER A FEW MINUTES", 100, 18)
    8
    >>> calculate_bode_score(30, "UNABLE TO LEAVE HOME", 50, 20)
    10

    """
    # Check for negative values
    if fev1_percent < 0 or six_minute_walk_distance < 0 or bmi < 0:
        raise ValueError("FEV1%, six-minute walk distance, and BMI cannot be negative.")

    #FEV1% score
    if fev1_percent > 65:
        fev1_score = 0
    elif 50 <= fev1_percent <= 64:
        fev1_score = 1
    elif 36 <= fev1_percent <= 49:
        fev1_score = 2
    else:  # fev1_percent ≤ 35
        fev1_score = 3

    #Dyspnea scale based on descriptions
    dyspnea_scale = 0
    if dyspnea_description == "ONLY STRENUOUS EXERCISE":
        dyspnea_scale = 0
    elif dyspnea_description == "WALKING UPHILL" or dyspnea_description == "WHEN HURRYING":
        dyspnea_scale = 1
    elif dyspnea_description == "STOPS WHEN WALKING AT PACE" or dyspnea_description == "SLOWER THAN PEERS":
        dyspnea_scale = 2
    elif dyspnea_description == "STOPS AFTER 100 YARDS" or dyspnea_description == "STOPS AFTER A FEW MINUTES":
        dyspnea_scale = 3
    elif dyspnea_description == "BREATHLESS WHEN DRESSING" or dyspnea_description == "UNABLE TO LEAVE HOME":
        dyspnea_scale = 4

    # Dyspnea score (mMRC scale)
    if dyspnea_scale in [0, 1]:
        dyspnea_score = 0
    elif dyspnea_scale == 2:
        dyspnea_score = 1
    elif dyspnea_scale == 3:
        dyspnea_score = 2
    else:  # dyspnea_scale == 4
        dyspnea_score = 3

    # 6-Minute Walk Test score
    if six_minute_walk_distance >= 350:
        walk_score = 0
    elif 250 <= six_minute_walk_distance <= 349:
        walk_score = 1
    elif 150 <= six_minute_walk_distance <= 249:
        walk_score = 2
    else:  # six_minute_walk_distance < 149
        walk_score = 3

    # BMI score
    bmi_score = 0 if bmi >= 21 else 1

    # Calculate total BODE score
    bode_score = fev1_score + dyspnea_score + walk_score + bmi_score
    return bode_score

In [163]:
assert calculate_bode_score(70, "ONLY STRENUOUS EXERCISE", 350, 25) == 0
assert calculate_bode_score(55, "WALKING UPHILL", 300, 22) == 2
assert calculate_bode_score(40, "STOPS AFTER A FEW MINUTES", 100, 18) == 8
assert calculate_bode_score(30, "UNABLE TO LEAVE HOME", 50, 20) == 10

In [164]:
import doctest
doctest.run_docstring_examples(calculate_bode_score, globals(), verbose=True)

Finding tests in NoName
Trying:
    calculate_bode_score(70, "ONLY STRENUOUS EXERCISE", 350, 25)
Expecting:
    0
ok
Trying:
    calculate_bode_score(55, "WALKING UPHILL", 300, 22)
Expecting:
    2
ok
Trying:
    calculate_bode_score(40, "STOPS AFTER A FEW MINUTES", 100, 18)
Expecting:
    8
ok
Trying:
    calculate_bode_score(30, "UNABLE TO LEAVE HOME", 50, 20)
Expecting:
    10
ok


### Step 3: Calculate BODE Risk

In [165]:
def calculate_bode_risk(bode_score):
 """
Calculate the BODE risk based on the BODE score.

Parameters:
bode_score (int): BODE score.

Returns:
str: BODE risk.

Raises:
ValueError: If the BODE score is negative.

Risk categories
-0 to 2: '80% (Low Risk)'
-3 to 4: '67% (Moderate Risk)'
-5 to 6: '57% (High Risk)'
-7 to 10:'18% (Very High Risk)'

Examples:
>>> calculate_bode_risk(0)
'80% (Low Risk)'

>>> calculate_bode_risk(3)
'67% (Moderate Risk)'

>>> calculate_bode_risk(6)
'57% (High Risk)'

>>> calculate_bode_risk(8)
'18% (Very High Risk)'

>>> calculate_bode_risk(-1)
Traceback (most recent call last):
    ...
ValueError: BODE score cannot be negative.
 """

 if bode_score < 0:
    raise ValueError("BODE score cannot be negative.")

 if bode_score <= 2:
        return '80% (Low Risk)'
 elif bode_score <= 4:
        return '67% (Moderate Risk)'
 elif bode_score <= 6:
        return '57% (High Risk)'
 else:
        return '18% (Very High Risk)'

In [166]:
assert calculate_bode_risk(0) == '80% (Low Risk)'
assert calculate_bode_risk(3) == '67% (Moderate Risk)'
assert calculate_bode_risk(6) == '57% (High Risk)'
assert calculate_bode_risk(8) == '18% (Very High Risk)'

In [188]:
import doctest
doctest.run_docstring_examples(calculate_bode_risk, globals(), verbose=True)

Finding tests in NoName
Trying:
    calculate_bode_risk(0)
Expecting:
    '80% (Low Risk)'
ok
Trying:
    calculate_bode_risk(3)
Expecting:
    '67% (Moderate Risk)'
ok
Trying:
    calculate_bode_risk(6)
Expecting:
    '57% (High Risk)'
ok
Trying:
    calculate_bode_risk(8)
Expecting:
    '18% (Very High Risk)'
ok
Trying:
    calculate_bode_risk(-1)
Expecting:
    Traceback (most recent call last):
        ...
    ValueError: BODE score cannot be negative.
ok


### Step 4: Load Hospital Data

In [189]:
import json
import csv
# Open the JSON file containing hospital data
with open("hospitals.json") as f:
      hospital_data = json.load(f)
      # Iterate through each entry in the hospital data
      for data in hospital_data:
           for hospitals in data["hospitals"]:
               hospital_name = hospitals["name"] #extract the name of the hospitals
               beds = hospitals["beds"]  #extract the number of beds in hospitals
               print(f"Hospital: {hospital_name}, Beds: {beds}")

# Open the CSV file containing patient data
patient_csv = "patient.csv"
with open(patient_csv) as p:
     patient_data = csv.reader(p)


Hospital: BJC, Beds: 2000
Hospital: BJC WEST COUNTY, Beds: 1000
Hospital: MISSOURI BAPTIST, Beds: 800
Hospital: SAINT LOUIS UNIVERSITY, Beds: 1000
Hospital: ST.MARY'S, Beds: 500
Hospital: ST.LUKE'S, Beds: 800


### Step 5: Main business logic

Call BODE Score, BODE Risk functions for each patient.

For each hospital, calculate Avg BODE score and Avg BODE risk and count the number of cases for each hospital.

In [187]:
patient_csv = "patient.csv"
hospital_json = "hospitals.json"

patient_output_file = "patient_output.csv"
hospital_output_file = "hospital_output.csv"

# Create a dictionary to store hospital information and statistics
all_unique_hospital = {}

# Open and load hospital data from a JSON file
with open(hospital_json) as f:
      hospital_data = json.load(f)
      # Iterate through each entry in the hospital data
      for data in hospital_data:
           for hospitals in data["hospitals"]:
               hospital_name = hospitals["name"] #extract hospital names
               beds = hospitals["beds"]   # extract number of beds
               all_unique_hospital[hospital_name] = {"sum_bode_score":0, "count": 0, "beds": beds}

# Prepare lists for output results
patient_results = [["NAME","BODE_SCORE","BODE_RISK","HOSPITAL"]]
hospital_output_list = [["HOSPITAL_NAME", "COPD_COUNT", "PCT_OF_COPD_CASES_OVER_BEDS", "AVG_SCORE", "AVG_RISK"]]

# Open and read patient data from the CSV file
with open(patient_csv) as p:
     patient_data = csv.reader(p)
     #Extract the headers (first row)
     headers = next(patient_data)

    #Process each row in the patient data
     for row in patient_data:
          # Extract relevant patient data using header indices
          try :
               hospital = row[headers.index("hospital")]
               distance_in_meters = float(row[headers.index("distance_in_meters")])
               dyspnea_description = row[headers.index("dyspnea_description")]
               fev_pct = float(row[headers.index("fev_pct")])
               WEIGHT_KG= float(row[headers.index("WEIGHT_KG")])
               HEIGHT_M = float(row[headers.index("HEIGHT_M")])
               NAME = row[headers.index("NAME")]

               # Calculate BMI using the weight and height
               bmi = getBMI(WEIGHT_KG,HEIGHT_M)
               # Calculate BODE score based on patient data
               bode_score = calculate_bode_score(fev_pct, dyspnea_description, distance_in_meters, bmi)
               # Calculate BODE risk based on BODE score
               bode_risk = calculate_bode_risk(bode_score)

               # Update the hospital's statistics
               sum_bode_score = all_unique_hospital[hospital]["sum_bode_score"]
               count = all_unique_hospital[hospital]["count"]
               bed = all_unique_hospital[hospital]["beds"]

               # Increment the sum of BODE scores and count of patients for this hospital
               all_unique_hospital[hospital] = {"sum_bode_score": sum_bode_score + bode_score, "count" : count + 1, "beds" : beds}

               # Append the results for the patient to the patient_results list
               patient_results.append([NAME,bode_score,bode_risk,hospital])
          except ValueError:
               print(f"Cannot convert string to float")

# Loop through each unique hospital in the dictionary 'all_unique_hospital'
     for key in all_unique_hospital:
          hospital_name = key
          value = all_unique_hospital[key]
          count = value["count"]
          bed = value["beds"]
          sum_bode_score = value["sum_bode_score"]

# Calculate the average BODE score by dividing the total BODE score by the count of cases
          AVG_SCORE = sum_bode_score/count
          # Calculate the average risk
          AVG_RISK = calculate_bode_risk(AVG_SCORE)
          # Calculate the percentage of COPD cases relative to the number of beds available
          PCT_OF_COPD_CASES_OVER_BEDS = (count / bed) * 100

# Append the results for the hospital to the hospital_output_list
          hospital_output_list.append([hospital_name, count, PCT_OF_COPD_CASES_OVER_BEDS, AVG_SCORE, AVG_RISK])



# Write Patient_output.csv
with open(patient_output_file, "w", newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerows(patient_results)
#Write Hospital_output.csv
with open(hospital_output_file, "w", newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerows(hospital_output_list)

In [184]:
print (patient_results)

[['NAME', 'BODE_SCORE', 'BODE_RISK', 'HOSPITAL'], ['Vanessa Roberts', 3, '67% (Moderate Risk)', "ST.LUKE'S"], ['Christopher Fox', 3, '67% (Moderate Risk)', 'SAINT LOUIS UNIVERSITY'], ['Benjamin Johnston', 4, '67% (Moderate Risk)', 'BJC'], ['Christopher Hernandez', 3, '67% (Moderate Risk)', 'MISSOURI BAPTIST'], ['Valerie Burch', 0, '80% (Low Risk)', 'BJC WEST COUNTY'], ['Heather Hart', 3, '67% (Moderate Risk)', 'SAINT LOUIS UNIVERSITY'], ['Ronald Cobb', 4, '67% (Moderate Risk)', "ST.MARY'S"], ['Austin French', 6, '57% (High Risk)', 'SAINT LOUIS UNIVERSITY'], ['Mary Leonard', 8, '18% (Very High Risk)', 'BJC'], ['Mrs. Nicole Smith', 5, '57% (High Risk)', "ST.MARY'S"], ['Ashley Warren', 8, '18% (Very High Risk)', 'BJC'], ['Jeffrey Jacobson', 4, '67% (Moderate Risk)', 'BJC WEST COUNTY'], ['Angela Bauer', 5, '57% (High Risk)', 'BJC WEST COUNTY'], ['Jerry Rogers', 5, '57% (High Risk)', 'BJC'], ['Lisa Beck', 3, '67% (Moderate Risk)', 'BJC'], ['Bryan Pena', 6, '57% (High Risk)', 'SAINT LOUIS UN

In [185]:
print (hospital_output_list)

[['HOSPITAL_NAME', 'COPD_COUNT', 'PCT_OF_COPD_CASES_OVER_BEDS', 'AVG_SCORE', 'AVG_RISK'], ['BJC', 184, 23.0, 4.4021739130434785, '57% (High Risk)'], ['BJC WEST COUNTY', 171, 21.375, 4.12280701754386, '57% (High Risk)'], ['MISSOURI BAPTIST', 161, 20.125, 4.111801242236025, '57% (High Risk)'], ['SAINT LOUIS UNIVERSITY', 164, 20.5, 4.4939024390243905, '57% (High Risk)'], ["ST.MARY'S", 156, 19.5, 4.198717948717949, '57% (High Risk)'], ["ST.LUKE'S", 164, 20.5, 4.323170731707317, '57% (High Risk)']]


In [186]:
print (all_unique_hospital)

{'BJC': {'sum_bode_score': 810, 'count': 184, 'beds': 800}, 'BJC WEST COUNTY': {'sum_bode_score': 705, 'count': 171, 'beds': 800}, 'MISSOURI BAPTIST': {'sum_bode_score': 662, 'count': 161, 'beds': 800}, 'SAINT LOUIS UNIVERSITY': {'sum_bode_score': 737, 'count': 164, 'beds': 800}, "ST.MARY'S": {'sum_bode_score': 655, 'count': 156, 'beds': 800}, "ST.LUKE'S": {'sum_bode_score': 709, 'count': 164, 'beds': 800}}
