<a href="https://colab.research.google.com/github/saikirantony/assignments/blob/main/midterm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Mid-term for HDS5210

Your supervisor is concerned about 4-year survival risks for COPD. She has asked for you to do some analysis using a new metric, BODE. BODE is an improvement on a previous metric and promises to provide insight on survival risks.

BODE is defined here. https://www.mdcalc.com/calc/3916/bode-index-copd-survival#evidence

Your assignment is to create a BODE calculation, use it to calculate BODE scores and BODE survival rates for a group of patients. Then we want to evaluate the average BODE scores and BODE survival rates for each area hospital.

Your patient input file will have the following columns:
NAME,SSN,LANGUAGE,JOB,HEIGHT_M,WEIGHT_KG,fev_pct,dyspnea_description,distance_in_meters,hospital

BODE calculations require a BMI value, so you will have to create a function for it.

Your output should be in the form of two CSV files, patient_output.csv and hospital_output.csv.

Patient_output will have the following columns:
NAME,BODE_SCORE,BODE_RISK,HOSPITAL

Hospital output will have the following columns:
HOSPITAL_NAME, COPD_COUNT, PCT_OF_COPD_CASES_OVER_BEDS, AVG_SCORE, AVG_RISK

Each function you create should have documentation and a suitable number of test cases. If the input data could be wrong, make sure to raise a Value Error.

For this assignment, use the doctest, json, and csv libraries. Pandas is not allowed for this assignment.

In [55]:
import doctest
import json
import csv

### Step 1: Calculate BMI

In [56]:
def compute_body_mass_index(mass_kg, height_meters):
    """
    Calculates Body Mass Index (BMI).

    Args:
        mass_kg (float): Body mass in kilograms.
        height_meters (float): Height in meters.

    Returns:
        float: The calculated BMI, rounded to two decimal places.

    Raises:
        ValueError: If height_meters or mass_kg is less than or equal to 0.
    """
    if height_meters <= 0 or mass_kg <= 0:
        raise ValueError("Height and weight must be positive numbers.")
    bmi_value = mass_kg / (height_meters ** 2)
    return round(bmi_value, 2)


### Step 2: Calculate BODE Score

In [57]:
def normalize_dyspnea_description(description):
    """
    These are used because the dataset contains these values in the dyspnea_description

    Normalize variations of dyspnea descriptions to fit known categories.

    >>> normalize_dyspnea_description("STOPS AFTER A FEW MINUTES")
    'Severe breathlessness'

    >>> normalize_dyspnea_description("WHEN HURRYING")
    'Moderate breathlessness'
    """
    description = description.upper().strip()
    if "STOPS AFTER A FEW MINUTES" in description:
        return "Severe breathlessness"
    elif "WHEN HURRYING" in description:
        return "Moderate breathlessness"
    elif "UNABLE TO LEAVE HOME" in description:
        return "Severe breathlessness"
    elif "SLOWER THAN PEERS" in description:
        return "Moderate breathlessness"
    elif "WALKING UPHILL" in description:
        return "Moderate breathlessness"
    elif "ONLY STRENUOUS EXERCISE" in description:
        return "Mild breathlessness"
    elif "BREATHLESS WHEN DRESSING" in description:
        return "Severe breathlessness"
    elif "STOPS WHEN WALKING AT PACE" in description:
        return "Severe breathlessness"
    elif "STOPS AFTER 100 YARDS" in description:
        return "Severe breathlessness"
    return description

def calculate_bode_score(bmi, fev_pct, dyspnea_description, distance_in_meters):
    """
    Calculate the BODE score based on BMI, FEV1 percentage, dyspnea description, and distance in meters.

    >>> calculate_bode_score(22, 70, 'ONLY STRENUOUS EXERCISE', 400)
    1
    >>> calculate_bode_score(18, 40, 'STOPS WHEN WALKING AT PACE', 200)
    8
    """
    bode_score = 0

    # Calculate BMI score
    if bmi > 21:
        bode_score += 0
    else:
        bode_score += 1

    # Calculate FEV1 score
    if fev_pct >= 65:
        bode_score += 0
    elif 50 <= fev_pct < 65:
        bode_score += 1
    elif 36 <= fev_pct < 50:
        bode_score += 2
    else:
        bode_score += 3

    # driscribing dyspnea description and maping it to a score
    dyspnea_description = normalize_dyspnea_description(dyspnea_description)
    dyspnea_mapping = {
        "No breathlessness": 0,
        "Mild breathlessness": 1,
        "Moderate breathlessness": 2,
        "Severe breathlessness": 3,
    }

    dyspnea_score = dyspnea_mapping.get(dyspnea_description, None)
    if dyspnea_score is None:
        print(f"Invalid dyspnea description: {dyspnea_description}")
        raise ValueError("Invalid dyspnea description.")

    bode_score += dyspnea_score

    # Calculating the distance walked score
    if distance_in_meters > 350:
        bode_score += 0
    elif 250 <= distance_in_meters <= 350:
        bode_score += 1
    elif 150 <= distance_in_meters < 250:
        bode_score += 2
    else:
        bode_score += 3

    return bode_score

### Step 3: Calculate BODE Risk

In [58]:
def calculate_bode_risk(bode_score):
    # Determining the survival rate based on the BODE score
    if 0 <= bode_score <= 2:
        survival_rate = 80  # 80% survival, 20% risk
    elif 3 <= bode_score <= 4:
        survival_rate = 67  # 67% survival, 33% risk
    elif 5 <= bode_score <= 6:
        survival_rate = 57  # 57% survival, 43% risk
    elif 7 <= bode_score <= 10:
        survival_rate = 18  # 18% survival, 82% risk
    else:
        raise ValueError(f"Invalid BODE score: {bode_score}")

    # Calculate the risk percentage
    risk_percentage = 100 - survival_rate
    return risk_percentage


### Step 4: Load Hospital Data

In [59]:
import json

def load_hospital_data(json_file):
    """
    Load hospital data from a JSON file.

    Parameters:
    json_file (str): The path to the JSON file containing hospital data.

    Returns:
    dict or None: The data loaded from the JSON file, or None if an error occurs.

    Raises:
    FileNotFoundError: If the specified file does not exist.
    json.JSONDecodeError: If the file is not a valid JSON.
    """
    try:
        with open(json_file, 'r') as file:
            data = json.load(file)
            return data
    except FileNotFoundError:
        print(f"Error: The file '{json_file}' was not found.")
        return None
    except json.JSONDecodeError:
        print(f"Error: The file '{json_file}' is not a valid JSON.")
        return None
    except Exception as e:
        print(f"An unexpected error occurred: {e}")
        return None


In [60]:
import csv

def load_dyspnea_descriptions(csv_file):
    """
    Load unique dyspnea descriptions from a CSV file.

    Parameters:
    csv_file (str): The path to the CSV file containing patient data.

    Returns:
    set: A set of unique dyspnea descriptions.

    Raises:
    FileNotFoundError: If the specified file does not exist.
    """
    dyspnea_descriptions = set()

    try:
        with open(csv_file, 'r') as csvfile:
            reader = csv.DictReader(csvfile)
            for row in reader:
                # Check if the 'dyspnea_description' key exists in the row
                if 'dyspnea_description' in row:
                    dyspnea_descriptions.add(row['dyspnea_description'])
                else:
                    print("Warning: 'dyspnea_description' column is missing in some rows.")
    except FileNotFoundError:
        print(f"Error: The file '{csv_file}' was not found.")

    return dyspnea_descriptions

# read dyspnea descriptions from the CSV file
descriptions = load_dyspnea_descriptions('patient.csv')

# Print the different types of dyspnea descriptions
print("Different types of dyspnea descriptions in the dataset:")
for description in descriptions:
    print(description)

Different types of dyspnea descriptions in the dataset:
WALKING UPHILL
BREATHLESS WHEN DRESSING
STOPS AFTER A FEW MINUTES
UNABLE TO LEAVE HOME
STOPS WHEN WALKING AT PACE
STOPS AFTER 100 YARDS
WHEN HURRYING
ONLY STRENUOUS EXERCISE
SLOWER THAN PEERS


### Step 5: Main business logic

Call BODE Score, BODE Risk functions for each patient.

For each hospital, calculate Avg BODE score and Avg BODE risk and count the number of cases for each hospital.

In [61]:
patient_csv = "patient.csv"
hospital_json = "hospitals.json"

patient_output_file = "patient_output.csv"
hospital_output_file = "hospital_output.csv"

# Load hospital data
hospital_data = load_hospital_data(hospital_json)

# Initialize the hospital metrics dictionary using the hospital names from the JSON data
hospital_metrics = {}
for entry in hospital_data:
    # Iterate over the hospitals list within the entry
    for hospital in entry['hospitals']:
        hospital_metrics[hospital['name']] = {
            'total_bode_score': 0,
            'total_risk': 0,
            'copd_count': 0,
            'beds': hospital['beds']
        }

patient_results = []

# Read patient data from the CSV file
with open(patient_csv, 'r') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        name = row['NAME']
        ssn = row['SSN']
        language = row['LANGUAGE']
        job = row['JOB']
        height_m = float(row['HEIGHT_M'])
        weight_kg = float(row['WEIGHT_KG'])
        fev_pct = float(row['fev_pct'])
        dyspnea_description = row['dyspnea_description']
        distance_in_meters = float(row['distance_in_meters'])
        hospital_name = row['hospital']

        # Calculate BMI, BODE score, and BODE risk
        bmi = compute_body_mass_index(weight_kg, height_m)
        bode_score = calculate_bode_score(bmi, fev_pct, dyspnea_description, distance_in_meters)
        bode_risk = calculate_bode_risk(bode_score)

        # Add patient results
        patient_results.append([name, bode_score, bode_risk, hospital_name])

        # Update hospital metrics
        if hospital_name in hospital_metrics:
            hospital_metrics[hospital_name]['total_bode_score'] += bode_score
            hospital_metrics[hospital_name]['total_risk'] += 1
            hospital_metrics[hospital_name]['copd_count'] += 1

hospital_output_list = []

# Calculate hospital metrics
for hospital_name, metrics in hospital_metrics.items():
    copd_count = metrics['copd_count']
    avg_bode_score = metrics['total_bode_score'] / copd_count if copd_count > 0 else 0
    avg_bode_risk = metrics['total_risk'] / copd_count if copd_count > 0 else 0
    pct_of_copd_cases = (copd_count / metrics['beds']) * 100 if metrics['beds'] > 0 else 0
    hospital_output_list.append([hospital_name, copd_count, pct_of_copd_cases, avg_bode_score, avg_bode_risk])

# Write Patient_output.csv
with open(patient_output_file, 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(["NAME", "BODE_SCORE", "BODE_RISK", "HOSPITAL"])
    writer.writerows(patient_results)

# Write Hospital_output.csv
with open(hospital_output_file, 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(["HOSPITAL_NAME", "COPD_COUNT", "PCT_OF_COPD_CASES_OVER_BEDS", "AVG_SCORE", "AVG_RISK"])
    writer.writerows(hospital_output_list)
