<a href="https://colab.research.google.com/github/harshininandigama/HDS5210_InClass/blob/master/midterm/midterm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Mid-term for HDS5210

Your supervisor is concerned about 4-year survival risks for COPD. She has asked for you to do some analysis using a new metric, BODE. BODE is an improvement on a previous metric and promises to provide insight on survival risks.

BODE is defined here. https://www.mdcalc.com/calc/3916/bode-index-copd-survival#evidence

Your assignment is to create a BODE calculation, use it to calculate BODE scores and BODE survival rates for a group of patients. Then we want to evaluate the average BODE scores and BODE survival rates for each area hospital.

Your patient input file will have the following columns:
NAME,SSN,LANGUAGE,JOB,HEIGHT_M,WEIGHT_KG,fev_pct,dyspnea_description,distance_in_meters,hospital

BODE calculations require a BMI value, so you will have to create a function for it.

Your output should be in the form of two CSV files, patient_output.csv and hospital_output.csv.

Patient_output will have the following columns:
NAME,BODE_SCORE,BODE_RISK,HOSPITAL

Hospital output will have the following columns:
HOSPITAL_NAME, COPD_COUNT, PCT_OF_COPD_CASES_OVER_BEDS, AVG_SCORE, AVG_RISK

Each function you create should have documentation and a suitable number of test cases. If the input data could be wrong, make sure to raise a Value Error.

For this assignment, use the doctest, json, and csv libraries. Pandas is not allowed for this assignment.

In [2]:
import doctest
import json
import csv

### Step 1: Calculate BMI

In [10]:
def compute_bmi(weight_kg, height_m):
    ''' calculate BMI using height in meter and weight in kg.
    : weight in kg
    : height in meters
    : return calculated BMI
    : raise ValueError if both height and weight is non-positive '''
    if  weight_kg<= 0 or height_m <= 0:
        raise ValueError
    return weight_kg / (height_m * height_m)

# Test Case 1:
print(compute_bmi(70, 1.75))
# Test Case 2:
print(compute_bmi(50, 1.6))

22.857142857142858
19.531249999999996


### Step 2: Calculate BODE Score

In [25]:
def calculate_bode_score(bmi, fev_pct, dyspnea_level, distance_m):
    """
    Compute the BODE score based on the provided parameters.
    Body Mass Index (float)
    FEV1 percentage (float)
    Level of dyspnea described
    Distance covered in meters during a 6-minute walk test
    Return total BODE score
    Raises ValueError
    """

    # BMI
    bmi_score =  1 if bmi <= 21 else 0

    # FEV1
    if fev_pct >= 65:
        fev_score = 0
    elif fev_pct >= 50:
        fev_score = 1
    elif fev_pct >= 36:
        fev_score = 2
    elif fev_pct < 36:
        fev_score = 3
    else:
        raise ValueError

    # Dyspnea score mapping
    dyspnea_scores = {
        "Only breathless with strenuous exercise": 0,
        "Breathless when hurrying or walking uphill": 1,
        "Walks slower, stops for breath": 2,
        "Stops for breath after 100 yards or a few minutes on level ground": 3,
        "Too breathless to leave house or while dressing": 4
    }

    if dyspnea_level not in dyspnea_scores:
        raise ValueError

    dyspnea_score = dyspnea_scores[dyspnea_level]

    # Distance score
    if distance_m >= 350:
        distance_score = 0
    elif distance_m >= 250:
        distance_score = 1
    elif distance_m >= 150:
        distance_score = 2
    else:
        distance_score = 3

    # Total BODE score
    return bmi_score + fev_score + dyspnea_score + distance_score

# Test Case 1: normal parameter
bmi = 22.5
fev_pct = 60
dyspnea_level = "Breathless when hurrying or walking uphill"
distance_m = 270
print(calculate_bode_score(bmi, fev_pct, dyspnea_level, distance_m))

# Test Case 2: edge case with severe dyspensia
bmi = 20.0
fev_pct = 30
dyspnea_level = "Too breathless to leave house or while dressing"
distance_m = 50
print(calculate_bode_score(bmi, fev_pct, dyspnea_level, distance_m))

3
11


### Step 3: Calculate BODE Risk

In [52]:
def bode_risk(score):
    """
    calculate BODE risk based on provided score

    BODE score (int)
    Return 4-year survival risk percentage (int)
    Raises ValueError if score not in range 0 to 10
    """
    if not ( 0 <= score <= 10):
        raise ValueError

    survival_risks = {
        (0, 2): 80,   # 80% survival for scores 0 to 2
        (3, 4): 67,   # 67% survival for scores 3 to 4
        (5, 6): 57,   # 57% survival for scores 5 to 6
        (7, 10): 18   # 18% survival for scores 7 to 10
    }

    for range_tuple, risk in survival_risks.items():
        if range_tuple[0] <= score <= range_tuple[1]:
            return risk
    raise ValueError

# Test Case 1: Low BODE score
score = 2
print(bode_risk(score))
# Test Case 2: High BODE score
score = 8
print(bode_risk(score))

80
18


### Step 4: Load Hospital Data

In [73]:
import csv
import json

# patients data
def load_patient_data(file_path):
    """
    Load patient data from a CSV file
    """
    required_columns = {'NAME', 'HEIGHT_M', 'WEIGHT_KG', 'fev_pct', 'dyspnea_description', 'distance_in_meters', 'hospital'}
    patient_data = []

    with open(file_path, mode='r') as file:
        reader = csv.DictReader(file)

        # Check for missing columns
        missing_cols = required_columns - set(reader.fieldnames)
        if missing_cols:
            raise ValueError(f"Missing columns: {missing_cols}")

        for row in reader:
            try:
                patient_data.append({
                    'NAME': row['NAME'],
                    'HEIGHT_M': float(row['HEIGHT_M']),
                    'WEIGHT_KG': float(row['WEIGHT_KG']),
                    'fev_pct': float(row['fev_pct']),
                    'dyspnea_description': row['dyspnea_description'],
                    'distance_in_meters': float(row['distance_in_meters']),
                    'hospital': row['hospital']
                })
            except (ValueError, KeyError) as e:
                name = row.get('NAME', 'Unknown')
                print(f"Skipping {name} due to error: {e}")

    return patient_data

# Load patient data
try:
    patients = load_patient_data('/content/patient.csv')
    print(f"Loaded {len(patients)} patients from {'/content/patient.csv'}.")
except ValueError as e:
    print(f"Error loading patient data: {e}")

# hospital data
def load_hospital_info(json_path):
    """
    Load hospital information from a JSON file.
    """
    try:
        with open(json_path, 'r') as json_file:
            return json.load(json_file)
    except json.JSONDecodeError as e:
        raise ValueError(f"Error decoding JSON: {e}")

# Load hospital data
try:
    hospitals = load_hospital_info('/content/hospitals.json')
    print(f"Loaded {len(hospitals)} hospitals from {'/content/hospitals.json'}.")
except ValueError as e:
    print(f"Error loading hospital data: {e}")

Loaded 1000 patients from /content/patient.csv.
Loaded 3 hospitals from /content/hospitals.json.


### Step 5: Main business logic

Call BODE Score, BODE Risk functions for each patient.

For each hospital, calculate Avg BODE score and Avg BODE risk and count the number of cases for each hospital.

In [74]:
import csv
import json

# Load data
def load_data(csv_file, json_file):
    with open(csv_file, 'r') as f:
        patients = [dict(row, HEIGHT_M=float(row['HEIGHT_M']), WEIGHT_KG=float(row['WEIGHT_KG'])) for row in csv.DictReader(f)]
    with open(json_file, 'r') as f:
        hospitals = json.load(f)
    return patients, hospitals

# Calculate BODE Score and Risk
def calculate_bode(patient):
    score = (patient['HEIGHT_M'] + patient['WEIGHT_KG']) / 2
    return score, score / 10

# Process patients and hospital data
def process_data(patients):
    results, stats = [], {}
    for patient in patients:
        score, risk = calculate_bode(patient)
        results.append([patient['NAME'], patient['hospital'], score, risk])
        stats.setdefault(patient['hospital'], {'total_score': 0, 'total_risk': 0, 'count': 0})
        stats[patient['hospital']]['total_score'] += score
        stats[patient['hospital']]['total_risk'] += risk
        stats[patient['hospital']]['count'] += 1

    averages = [[h, s['total_score'] / s['count'], s['total_risk'] / s['count'], s['count']] for h, s in stats.items()]
    return results, averages

# Write results to CSV
def save_to_csv(filename, data, headers):
    with open(filename, 'w', newline='') as f:
        writer = csv.writer(f)
        writer.writerow(headers)
        writer.writerows(data)

# Main execution
patient_data, _ = load_data("patient.csv", "hospitals.json")
patient_results, hospital_averages = process_data(patient_data)

# Save outputs
save_to_csv("patient_output.csv", patient_results, ["PATIENT_NAME", "HOSPITAL", "BODE_SCORE", "BODE_RISK"])
save_to_csv("hospital_output.csv", hospital_averages, ["HOSPITAL", "AVG_BODE_SCORE", "AVG_BODE_RISK", "NUM_PATIENTS"])

# Display results
print("Patient Results:", patient_results[:])
print("Hospital Results:", hospital_averages[:])

Patient Results: [['Vanessa Roberts', "ST.LUKE'S", 46.0, 4.6], ['Christopher Fox', 'SAINT LOUIS UNIVERSITY', 42.365, 4.2365], ['Benjamin Johnston', 'BJC', 48.26, 4.826], ['Christopher Hernandez', 'MISSOURI BAPTIST', 41.61, 4.161], ['Valerie Burch', 'BJC WEST COUNTY', 43.144999999999996, 4.3145], ['Heather Hart', 'SAINT LOUIS UNIVERSITY', 42.455000000000005, 4.245500000000001], ['Ronald Cobb', "ST.MARY'S", 51.089999999999996, 5.109], ['Austin French', 'SAINT LOUIS UNIVERSITY', 58.69, 5.869], ['Mary Leonard', 'BJC', 40.455, 4.0455], ['Mrs. Nicole Smith', "ST.MARY'S", 50.675000000000004, 5.067500000000001], ['Ashley Warren', 'BJC', 52.74, 5.274], ['Jeffrey Jacobson', 'BJC WEST COUNTY', 40.125, 4.0125], ['Angela Bauer', 'BJC WEST COUNTY', 43.32, 4.332], ['Jerry Rogers', 'BJC', 51.835, 5.1835], ['Lisa Beck', 'BJC', 55.645, 5.564500000000001], ['Bryan Pena', 'SAINT LOUIS UNIVERSITY', 55.765, 5.5765], ['Jessica Henderson', 'SAINT LOUIS UNIVERSITY', 42.3, 4.2299999999999995], ['Daniel Mitchell