### Mid-term for HDS5210

Your supervisor is concerned about 4-year survival risks for COPD. She has asked for you to do some analysis using a new metric, BODE. BODE is an improvement on a previous metric and promises to provide insight on survival risks.

BODE is defined here. https://www.mdcalc.com/calc/3916/bode-index-copd-survival#evidence

Your assignment is to create a BODE calculation, use it to calculate BODE scores and BODE survival rates for a group of patients. Then we want to evaluate the average BODE scores and BODE survival rates for each area hospital.

Your patient input file will have the following columns:
NAME,SSN,LANGUAGE,JOB,HEIGHT_M,WEIGHT_KG,fev_pct,dyspnea_description,distance_in_meters,hospital

BODE calculations require a BMI value, so you will have to create a function for it.

Your output should be in the form of two CSV files, patient_output.csv and hospital_output.csv.

Patient_output will have the following columns:
NAME,BODE_SCORE,BODE_RISK,HOSPITAL

Hospital output will have the following columns:
HOSPITAL_NAME, COPD_COUNT, PCT_OF_COPD_CASES_OVER_BEDS, AVG_SCORE, AVG_RISK

Each function you create should have documentation and a suitable number of test cases. If the input data could be wrong, make sure to raise a Value Error.

For this assignment, use the doctest, json, and csv libraries. Pandas is not allowed for this assignment.

In [None]:
import doctest
import json
import csv

### Step 1: Calculate BMI

In [2]:
def calculate_bmi(weight_kg, height_m):
    if weight_kg <= 0 or height_m <= 0:
        raise ValueError("Weight and height must be positive.")

    bmi = weight_kg / (height_m ** 2)
    return bmi

# Testing the function manually
if __name__ == "__main__":
    # Test cases
    try:
        print(calculate_bmi(70, 1.75))  # Expected output: Approximately 22.86
        print(calculate_bmi(50, 1.60))  # Expected output: Approximately 19.53
        print(calculate_bmi(-70, 1.75))  # This should raise a ValueError
    except ValueError as e:
        print(e)  # Prints the error message
    try:
        print(calculate_bmi(70, 0))  # This should raise a ValueError
    except ValueError as e:
        print(e)  # Prints the error message

22.857142857142858
19.531249999999996
Weight and height must be positive.
Weight and height must be positive.


### Step 2: Calculate BODE Score

In [3]:
def bode_score(bmi, fev_pct, dyspnea_score, walk_distance):
    # Validate inputs
    if bmi <= 0:
        raise ValueError("BMI must be a positive number.")
    if not (0 <= fev_pct <= 100):
        raise ValueError("FEV1 percentage must be between 0 and 100.")
    if not (0 <= dyspnea_score <= 4):
        raise ValueError("Dyspnea score must be between 0 and 4.")
    if walk_distance < 0:
        raise ValueError("Walk distance must be non-negative.")

    # BMI Score
    if bmi > 21:
        bmi_score = 0
    else:
        bmi_score = 1

    # FEV1 Score
    if fev_pct >= 65:
        fev_score = 0
    elif 50 <= fev_pct < 65:
        fev_score = 1
    elif 36 <= fev_pct < 50:
        fev_score = 2
    else:
        fev_score = 3

    # Dyspnea Score
    # The dyspnea score is passed directly as dyspnea_score, so no need to change it.

    # 6-Minute Walk Distance Score
    if walk_distance >= 350:
        walk_score = 0
    elif 250 <= walk_distance < 350:
        walk_score = 1
    elif 150 <= walk_distance < 250:
        walk_score = 2
    else:
        walk_score = 3

    # Calculate the total BODE score
    total_bode_score = bmi_score + fev_score + dyspnea_score + walk_score

    return total_bode_score

In [4]:
# Example 1: High BMI, good lung function, mild dyspnea, good walk distance
print(bode_score(22, 70, 1, 360))  # Expected score: 0 (BMI 0 + FEV1 0 + Dyspnea 0 + Walk 0)

# Example 2: Low BMI, moderate lung function, moderate dyspnea, moderate walk distance
print(bode_score(20, 55, 2, 300))  # Expected score: 1 (BMI 1 + FEV1 1 + Dyspnea 1 + Walk 1)

# Example 3: Very low lung function, severe dyspnea, very low walk distance
print(bode_score(18, 30, 3, 100))  # Expected score: 8 (BMI 1 + FEV1 3 + Dyspnea 2 + Walk 3)


1
5
10


### Step 3: Calculate BODE Risk

In [5]:
def bode_risk(bode_score):
    # Check for valid BODE score
    if not (0 <= bode_score <= 10):
        raise ValueError("BODE score must be between 0 and 10.")

    # Determine risk category and survival probability
    if 0 <= bode_score <= 2:
        return ("Low Risk", 80.0)
    elif 3 <= bode_score <= 4:
        return ("Moderate Risk", 67.0)
    elif 5 <= bode_score <= 6:
        return ("High Risk", 57.0)
    else:
        return ("Very High Risk", 18.0)

# Test cases
print(bode_risk(1))  # Expected: ('Low Risk', 80.0)
print(bode_risk(4))  # Expected: ('Moderate Risk', 67.0)
print(bode_risk(6))  # Expected: ('High Risk', 57.0)
print(bode_risk(9))  # Expected: ('Very High Risk', 18.0)

('Low Risk', 80.0)
('Moderate Risk', 67.0)
('High Risk', 57.0)
('Very High Risk', 18.0)


### Step 4: Load Hospital Data

In [7]:
import csv
from google.colab import files

# Upload the file
uploaded = files.upload()
filename = 'patient.csv'

def load_hospital_data(filename):
    patient_data = []

    try:
        # Open the file
        with open(filename, mode='r') as file:
            # Use csv.DictReader to read the CSV into a list of dictionaries
            csv_reader = csv.DictReader(file)

            # Iterate over each row in the CSV
            for row in csv_reader:
                # Convert numeric fields to floats
                row['HEIGHT_M'] = float(row['HEIGHT_M'])
                row['WEIGHT_KG'] = float(row['WEIGHT_KG'])
                row['fev_pct'] = float(row['fev_pct'])
                row['distance_in_meters'] = float(row['distance_in_meters'])

                # Append the row (as a dictionary) to the patient_data list
                patient_data.append(row)

    except FileNotFoundError:
        print(f"Error: The file '{filename}' was not found.")
    except Exception as e:
        print(f"Error: {e}")

    return patient_data

# Test loading the hospital data
hospital_data = load_hospital_data(filename)

# Output the first few rows of data to verify
for patient in hospital_data[:5]:  # Print the first 5 patients
    print(patient)


Saving patient.csv to patient.csv
{'NAME': 'Vanessa Roberts', 'SSN': '295-82-3703', 'LANGUAGE': 'Belarusian', 'JOB': 'Teacher English as a foreign language', 'HEIGHT_M': 1.72, 'WEIGHT_KG': 90.28, 'fev_pct': 57.73, 'dyspnea_description': 'STOPS AFTER A FEW MINUTES', 'distance_in_meters': 367.9, 'hospital': "ST.LUKE'S"}
{'NAME': 'Christopher Fox', 'SSN': '286-30-9664', 'LANGUAGE': 'Macedonian', 'JOB': 'Local government officer', 'HEIGHT_M': 1.64, 'WEIGHT_KG': 83.09, 'fev_pct': 61.6, 'dyspnea_description': 'WHEN HURRYING', 'distance_in_meters': 184.16, 'hospital': 'SAINT LOUIS UNIVERSITY'}
{'NAME': 'Benjamin Johnston', 'SSN': '139-07-4381', 'LANGUAGE': 'Kirghiz', 'JOB': 'Multimedia programmer', 'HEIGHT_M': 1.61, 'WEIGHT_KG': 94.91, 'fev_pct': 83.11, 'dyspnea_description': 'BREATHLESS WHEN DRESSING', 'distance_in_meters': 260.66, 'hospital': 'BJC'}
{'NAME': 'Christopher Hernandez', 'SSN': '687-37-0804', 'LANGUAGE': 'South Ndebele', 'JOB': 'Community education officer', 'HEIGHT_M': 1.67, 'W

### Step 5: Main business logic

Call BODE Score, BODE Risk functions for each patient.

For each hospital, calculate Avg BODE score and Avg BODE risk and count the number of cases for each hospital.

In [8]:
import csv
import json
from google.colab import files

# Step 1: Upload patient.csv and hospitals.json
uploaded = files.upload()

# Step 2: Define filenames
patient_csv = "patient.csv"
hospital_json = "hospitals.json"

# Step 3: Define a function to calculate the BODE Score and Risk
def calculate_bode_score(patient):
    # Placeholder logic for BODE score (replace with actual calculation)
    # Assuming BODE Score and Risk are calculated based on patient data
    bode_score = (patient['HEIGHT_M'] + patient['WEIGHT_KG']) / 2  # Example calculation
    bode_risk = bode_score / 10  # Example calculation
    return bode_score, bode_risk

# Step 4: Load the patient data from CSV
def load_patient_data(filename):
    patient_data = []
    with open(filename, mode='r') as file:
        csv_reader = csv.DictReader(file)

        # Print the headers (column names)
        headers = csv_reader.fieldnames
        print("CSV Headers:", headers)  # Add this line to see the headers

        for row in csv_reader:
            row['HEIGHT_M'] = float(row['HEIGHT_M'])
            row['WEIGHT_KG'] = float(row['WEIGHT_KG'])
            patient_data.append(row)

    return patient_data

# Step 5: Load the hospital data from JSON
def load_hospital_data(filename):
    with open(filename, mode='r') as file:
        hospital_data = json.load(file)
    return hospital_data

# Step 6: Process the patients and calculate BODE Scores and Risks
def process_patients_and_hospitals(patient_data, hospital_data):
    patient_results = []
    hospital_aggregates = {}

    # Step 7: Process each patient
    for patient in patient_data:
        # Calculate the BODE score and risk for each patient
        bode_score, bode_risk = calculate_bode_score(patient)
        patient_id = patient['NAME']  # Use NAME as patient identifier
        hospital_id = patient['hospital']  # Use hospital as hospital identifier

        # Store patient result
        patient_results.append([patient_id, hospital_id, bode_score, bode_risk])

        # Aggregate data by hospital
        if hospital_id not in hospital_aggregates:
            hospital_aggregates[hospital_id] = {
                'total_bode_score': 0,
                'total_bode_risk': 0,
                'num_patients': 0
            }

        hospital_aggregates[hospital_id]['total_bode_score'] += bode_score
        hospital_aggregates[hospital_id]['total_bode_risk'] += bode_risk
        hospital_aggregates[hospital_id]['num_patients'] += 1

    # Step 8: Calculate the averages for each hospital
    hospital_output_list = []
    for hospital_id, aggregates in hospital_aggregates.items():
        avg_bode_score = aggregates['total_bode_score'] / aggregates['num_patients']
        avg_bode_risk = aggregates['total_bode_risk'] / aggregates['num_patients']
        hospital_output_list.append([hospital_id, avg_bode_score, avg_bode_risk, aggregates['num_patients']])

    return patient_results, hospital_output_list

# Step 9: Write the results to CSV files
def write_csv(filename, data, headers=None):
    with open(filename, 'w', newline='') as csvfile:
        writer = csv.writer(csvfile)
        if headers:
            writer.writerow(headers)
        writer.writerows(data)

# Step 10: Load data, process it, and save the results
patient_data = load_patient_data(patient_csv)
hospital_data = load_hospital_data(hospital_json)

# Process the data and get the results
patient_results, hospital_output_list = process_patients_and_hospitals(patient_data, hospital_data)

# Write the patient and hospital results to their respective CSV files
write_csv("patient_output.csv", patient_results, headers=["PATIENT_NAME", "HOSPITAL", "BODE_SCORE", "BODE_RISK"])
write_csv("hospital_output.csv", hospital_output_list, headers=["HOSPITAL", "AVG_BODE_SCORE", "AVG_BODE_RISK", "NUM_PATIENTS"])

# Output for verification (first few lines)
print("First 5 Patient Results:")
for row in patient_results[:5]:
    print(row)

print("\nFirst 5 Hospital Results:")
for row in hospital_output_list[:5]:
    print(row)

Saving hospitals.json to hospitals.json
CSV Headers: ['NAME', 'SSN', 'LANGUAGE', 'JOB', 'HEIGHT_M', 'WEIGHT_KG', 'fev_pct', 'dyspnea_description', 'distance_in_meters', 'hospital']
First 5 Patient Results:
['Vanessa Roberts', "ST.LUKE'S", 46.0, 4.6]
['Christopher Fox', 'SAINT LOUIS UNIVERSITY', 42.365, 4.2365]
['Benjamin Johnston', 'BJC', 48.26, 4.826]
['Christopher Hernandez', 'MISSOURI BAPTIST', 41.61, 4.161]
['Valerie Burch', 'BJC WEST COUNTY', 43.144999999999996, 4.3145]

First 5 Hospital Results:
["ST.LUKE'S", 49.28707317073169, 4.92870731707317, 164]
['SAINT LOUIS UNIVERSITY', 49.36060975609756, 4.936060975609758, 164]
['BJC', 49.58717391304347, 4.95871739130435, 184]
['MISSOURI BAPTIST', 49.856801242236, 4.985680124223601, 161]
['BJC WEST COUNTY', 49.26999999999999, 4.9270000000000005, 171]
