<a href="https://colab.research.google.com/github/harshininandigama/HDS5210_InClass/blob/master/midterm/midterm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Mid-term for HDS5210

Your supervisor is concerned about 4-year survival risks for COPD. She has asked for you to do some analysis using a new metric, BODE. BODE is an improvement on a previous metric and promises to provide insight on survival risks.

BODE is defined here. https://www.mdcalc.com/calc/3916/bode-index-copd-survival#evidence

Your assignment is to create a BODE calculation, use it to calculate BODE scores and BODE survival rates for a group of patients. Then we want to evaluate the average BODE scores and BODE survival rates for each area hospital.

Your patient input file will have the following columns:
NAME,SSN,LANGUAGE,JOB,HEIGHT_M,WEIGHT_KG,fev_pct,dyspnea_description,distance_in_meters,hospital

BODE calculations require a BMI value, so you will have to create a function for it.

Your output should be in the form of two CSV files, patient_output.csv and hospital_output.csv.

Patient_output will have the following columns:
NAME,BODE_SCORE,BODE_RISK,HOSPITAL

Hospital output will have the following columns:
HOSPITAL_NAME, COPD_COUNT, PCT_OF_COPD_CASES_OVER_BEDS, AVG_SCORE, AVG_RISK

Each function you create should have documentation and a suitable number of test cases. If the input data could be wrong, make sure to raise a Value Error.

For this assignment, use the doctest, json, and csv libraries. Pandas is not allowed for this assignment.

In [7]:
import doctest
import json
import csv

### Step 1: Calculate BMI

In [10]:
def calculate_bmi(weight_kg, height_m):
    """
    Calculate the Body Mass Index (BMI) using weight in kilograms and height in meters.

    >>> calculate_bmi(70, 1.75)
    22.86
    >>> calculate_bmi(50, 1.6)
    19.53
    >>> calculate_bmi(0, 1.75)
    Traceback (most recent call last):
    ...
    ValueError: Weight and height must be positive numbers.
    >>> calculate_bmi(70, -1.75)
    Traceback (most recent call last):
    ...
    ValueError: Weight and height must be positive numbers.

    :Weight in kilograms
    :Height in meters
    :Calculated BMI value (float)
    :raises ValueError: If weight or height are non-positive
    """
    if weight_kg <= 0 or height_m <= 0:
        raise ValueError("Weight and height must be positive numbers.")

    bmi = weight_kg / (height_m ** 2)
    return round(bmi, 2)

# Testing the function with valid cases
print(calculate_bmi(70, 1.75))
print(calculate_bmi(50, 1.6))

# Testing invalid cases
for test_case in [(70, -1.75), (0, 1.75)]:
    try:
        print(calculate_bmi(*test_case))
    except ValueError as e:
        print(e)

22.86
19.53
Weight and height must be positive numbers.
Weight and height must be positive numbers.


### Step 2: Calculate BODE Score

In [12]:
def calculate_bode_score(bmi, fev_pct, dyspnea_level, distance_m):
    """
    Compute the BODE score based on the provided parameters.

    :Body Mass Index (float)
    :FEV1 percentage (float)
    :Descriptive level of dyspnea (str)
    :Distance covered in meters during a 6-minute walk test (float)
    :return: Total BODE score (int)
    :raises ValueError: If inputs are invalid or outside expected ranges

    >>> calculate_bode_score(22.5, 60, "Breathless when hurrying or walking uphill", 270)
    4
    >>> calculate_bode_score(20.0, 30, "Too breathless to leave house or while dressing", 50)
    8
    >>> calculate_bode_score(21, 70, "Only breathless with strenuous exercise", 400)
    1
    >>> calculate_bode_score(25, 20, "Walks slower, stops for breath", 100)
    Traceback (most recent call last):
    ...
    ValueError: Invalid dyspnea level.
    """

    # Calculate BMI score
    bmi_score = 1 if bmi <= 21 else 0

    # Calculate FEV score
    if fev_pct >= 65:
        fev_score = 0
    elif fev_pct >= 50:
        fev_score = 1
    elif fev_pct >= 36:
        fev_score = 2
    elif fev_pct < 36:
        fev_score = 3
    else:
        raise ValueError("Invalid FEV percentage.")

    # Dyspnea score mapping
    dyspnea_scores = {
        "Only breathless with strenuous exercise": 0,
        "Breathless when hurrying or walking uphill": 1,
        "Walks slower, stops for breath": 2,
        "Stops for breath after 100 yards or a few minutes on level ground": 3,
        "Too breathless to leave house or while dressing": 4
    }

    if dyspnea_level not in dyspnea_scores:
        raise ValueError("Invalid dyspnea level.")

    dyspnea_score = dyspnea_scores[dyspnea_level]

    # Calculate distance score
    if distance_m >= 350:
        distance_score = 0
    elif distance_m >= 250:
        distance_score = 1
    elif distance_m >= 150:
        distance_score = 2
    else:
        distance_score = 3

    # Total BODE score
    return bmi_score + fev_score + dyspnea_score + distance_score

# Testing the function according to valid parameters
print(calculate_bode_score(22.5, 60, "Breathless when hurrying or walking uphill", 270))
print(calculate_bode_score(20.0, 30, "Too breathless to leave house or while dressing", 50))

3
11


### Step 3: Calculate BODE Risk

In [14]:
def bode_risk(score):
    """
    Calculate the 4-year survival risk percentage based on the provided BODE score.

    :BODE score (int)
    :return: 4-year survival risk percentage (int)
    :raises ValueError: If the score is not in the range of 0 to 10

    >>> bode_risk(2)
    80
    >>> bode_risk(4)
    67
    >>> bode_risk(6)
    57
    >>> bode_risk(8)
    18
    >>> bode_risk(-1)
    Traceback (most recent call last):
    ...
    ValueError: Score must be between 0 and 10.
    >>> bode_risk(11)
    Traceback (most recent call last):
    ...
    ValueError: Score must be between 0 and 10.
    """
    if not (0 <= score <= 10):
        raise ValueError("Score must be between 0 and 10.")

    survival_risks = {
        (0, 2): 80,   # 80% survival for scores 0 to 2
        (3, 4): 67,   # 67% survival for scores 3 to 4
        (5, 6): 57,   # 57% survival for scores 5 to 6
        (7, 10): 18   # 18% survival for scores 7 to 10
    }

    for range_tuple, risk in survival_risks.items():
        if range_tuple[0] <= score <= range_tuple[1]:
            return risk

    raise ValueError("Unexpected error in risk calculation.")

# Test Cases
print(bode_risk(2))
print(bode_risk(8))

80
18


### Step 4: Load Hospital Data

In [20]:
import csv
import json

def load_patient_data(file_path):
    """
    Load patient data from a CSV file.

    :param file_path: Path to the CSV file.
    :return: List of patient dictionaries.
    :raises ValueError: If required columns are missing or data conversion fails.
    """
    required_columns = {'NAME', 'HEIGHT_M', 'WEIGHT_KG', 'fev_pct', 'dyspnea_description', 'distance_in_meters', 'hospital'}

    with open(file_path, mode='r') as file:
        reader = csv.DictReader(file)
        missing_cols = required_columns - set(reader.fieldnames)
        if missing_cols:
            raise ValueError(f"Missing columns: {missing_cols}")

        patient_data = []
        for row in reader:
            try:
                patient_data.append({
                    'NAME': row['NAME'],
                    'HEIGHT_M': float(row['HEIGHT_M']),
                    'WEIGHT_KG': float(row['WEIGHT_KG']),
                    'fev_pct': float(row['fev_pct']),
                    'dyspnea_description': row['dyspnea_description'],
                    'distance_in_meters': float(row['distance_in_meters']),
                    'hospital': row['hospital']
                })
            except (ValueError, KeyError) as e:
                print(f"Skipping {row.get('NAME', 'Unknown')} due to error: {e}")

    return patient_data

def load_hospital_info(json_path):
    """
    Load hospital information from a JSON file.

    :param json_path: Path to the JSON file.
    :return: Dictionary of hospital information.
    :raises ValueError: If JSON decoding fails.
    """
    with open(json_path, 'r') as json_file:
        try:
            return json.load(json_file)
        except json.JSONDecodeError as e:
            raise ValueError(f"Error decoding JSON: {e}")

# Load data
try:
    patients = load_patient_data('/content/patient.csv')
    print(f"Loaded {len(patients)} patients from '/content/patient.csv'.")
except ValueError as e:
    print(f"Error loading patient data: {e}")

try:
    hospitals = load_hospital_info('/content/hospitals.json')
    print(f"Loaded {len(hospitals)} hospitals from '/content/hospitals.json'.")
except ValueError as e:
    print(f"Error loading hospital data: {e}")

Loaded 1000 patients from '/content/patient.csv'.
Loaded 3 hospitals from '/content/hospitals.json'.


### Step 5: Main business logic

Call BODE Score, BODE Risk functions for each patient.

For each hospital, calculate Avg BODE score and Avg BODE risk and count the number of cases for each hospital.

In [25]:
import csv
import json

# Step 1: Define file paths
patient_csv = "/content/patient.csv"
hospital_json = "/content/hospitals.json"

# Step 2: Calculate BMI
def calculate_bmi(weight_kg, height_m):
    if weight_kg <= 0 or height_m <= 0:
        raise ValueError("Weight and height must be positive numbers.")
    return weight_kg / (height_m ** 2)

# Step 3: Calculate BODE Score and Risk
def calculate_bode_score(patient):
    bmi = calculate_bmi(patient['WEIGHT_KG'], patient['HEIGHT_M'])
    bmi_score = 0 if bmi > 21 else 1

    # FEV1% score
    fev_score = 0 if patient['fev_pct'] >= 65 else (1 if patient['fev_pct'] >= 50 else (2 if patient['fev_pct'] >= 36 else 3))

    # Dyspnea score
    dyspnea_mapping = {
        "Only breathless with strenuous exercise": 0,
        "Breathless when hurrying or walking uphill": 1,
        "Walks slower, stops for breath": 2,
        "Stops for breath after 100 yards or a few minutes on level ground": 3,
        "Too breathless to leave house or while dressing": 4
    }
    dyspnea_score = dyspnea_mapping.get(patient['dyspnea_description'], 0)

    # Distance score
    distance_score = 0 if patient['distance_in_meters'] >= 350 else (1 if patient['distance_in_meters'] >= 250 else (2 if patient['distance_in_meters'] >= 150 else 3))

    # Total BODE score and risk
    bode_score = bmi_score + fev_score + dyspnea_score + distance_score
    bode_risk = {0: 80, 1: 80, 2: 67, 3: 67, 4: 57, 5: 57, 6: 18}.get(bode_score, 18)

    return bode_score, bode_risk

# Step 4: Load patient data from CSV
def load_patient_data(filename):
    with open(filename, mode='r') as file:
        csv_reader = csv.DictReader(file)
        headers = csv_reader.fieldnames
        required_columns = ['NAME', 'HEIGHT_M', 'WEIGHT_KG', 'fev_pct', 'dyspnea_description', 'distance_in_meters', 'hospital']
        for column in required_columns:
            if column not in headers:
                raise ValueError(f"Missing required column: {column}")

        return [
            {**row,
             'HEIGHT_M': float(row['HEIGHT_M']),
             'WEIGHT_KG': float(row['WEIGHT_KG']),
             'fev_pct': float(row['fev_pct']),
             'distance_in_meters': float(row['distance_in_meters'])}
            for row in csv_reader
        ]

# Step 5: Load hospital data from JSON
def load_hospital_data(filename):
    try:
        with open(filename, mode='r') as file:
            return json.load(file)
    except FileNotFoundError:
        raise ValueError(f"File not found: {filename}")
    except json.JSONDecodeError:
        raise ValueError(f"Invalid JSON format in file: {filename}")

# Step 6: Process patients and hospitals
def process_patients_and_hospitals(patient_data):
    patient_results = []
    hospital_aggregates = {}

    for patient in patient_data:
        bode_score, bode_risk = calculate_bode_score(patient)
        patient_results.append([patient['NAME'], patient['hospital'], bode_score, bode_risk])

        hospital_id = patient['hospital']
        if hospital_id not in hospital_aggregates:
            hospital_aggregates[hospital_id] = {'total_bode_score': 0, 'total_bode_risk': 0, 'num_patients': 0}

        hospital_aggregates[hospital_id]['total_bode_score'] += bode_score
        hospital_aggregates[hospital_id]['total_bode_risk'] += bode_risk
        hospital_aggregates[hospital_id]['num_patients'] += 1

    return patient_results, [
        [hospital_id, aggregates['total_bode_score'] / aggregates['num_patients'],
         aggregates['total_bode_risk'] / aggregates['num_patients'], aggregates['num_patients']]
        for hospital_id, aggregates in hospital_aggregates.items()
    ]

# Step 7: Write results to CSV files
def write_csv(filename, data, headers=None):
    with open(filename, 'w', newline='') as csvfile:
        writer = csv.writer(csvfile)
        if headers:
            writer.writerow(headers)
        writer.writerows(data)

# Step 8: Load data, process it, and save results
patient_data = load_patient_data(patient_csv)
hospital_data = load_hospital_data(hospital_json)

patient_results, hospital_output_list = process_patients_and_hospitals(patient_data)

write_csv("patient_output.csv", patient_results, headers=["PATIENT_NAME", "HOSPITAL", "BODE_SCORE", "BODE_RISK"])
write_csv("hospital_output.csv", hospital_output_list, headers=["HOSPITAL", "AVG_BODE_SCORE", "AVG_BODE_RISK", "NUM_PATIENTS"])

# Output for verification
print("Patient Results:")
for row in patient_results[:25]:
    print(row)

print("\nHospital Results:")
for row in hospital_output_list[:6]:
    print(row)

Patient Results:
['Vanessa Roberts', "ST.LUKE'S", 1, 80]
['Christopher Fox', 'SAINT LOUIS UNIVERSITY', 3, 67]
['Benjamin Johnston', 'BJC', 1, 80]
['Christopher Hernandez', 'MISSOURI BAPTIST', 1, 80]
['Valerie Burch', 'BJC WEST COUNTY', 0, 80]
['Heather Hart', 'SAINT LOUIS UNIVERSITY', 3, 67]
['Ronald Cobb', "ST.MARY'S", 4, 57]
['Austin French', 'SAINT LOUIS UNIVERSITY', 6, 18]
['Mary Leonard', 'BJC', 5, 57]
['Mrs. Nicole Smith', "ST.MARY'S", 3, 67]
['Ashley Warren', 'BJC', 5, 57]
['Jeffrey Jacobson', 'BJC WEST COUNTY', 4, 57]
['Angela Bauer', 'BJC WEST COUNTY', 3, 67]
['Jerry Rogers', 'BJC', 2, 67]
['Lisa Beck', 'BJC', 2, 67]
['Bryan Pena', 'SAINT LOUIS UNIVERSITY', 4, 57]
['Jessica Henderson', 'SAINT LOUIS UNIVERSITY', 1, 80]
['Daniel Mitchell', 'MISSOURI BAPTIST', 3, 67]
['Melanie Graham', 'BJC', 1, 80]
['Deborah Jimenez', 'MISSOURI BAPTIST', 3, 67]
['Kathryn Rasmussen', 'BJC WEST COUNTY', 1, 80]
['Brian Leon', 'BJC', 3, 67]
['Robert Walker', 'MISSOURI BAPTIST', 5, 57]
['Drew Case', 