<a href="https://colab.research.google.com/github/saikirantony/assignments/blob/main/Copy_of_midterm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Mid-term for HDS5210

Your supervisor is concerned about 4-year survival risks for COPD. She has asked for you to do some analysis using a new metric, BODE. BODE is an improvement on a previous metric and promises to provide insight on survival risks.

BODE is defined here. https://www.mdcalc.com/calc/3916/bode-index-copd-survival#evidence

Your assignment is to create a BODE calculation, use it to calculate BODE scores and BODE survival rates for a group of patients. Then we want to evaluate the average BODE scores and BODE survival rates for each area hospital.

Your patient input file will have the following columns:
NAME,SSN,LANGUAGE,JOB,HEIGHT_M,WEIGHT_KG,fev_pct,dyspnea_description,distance_in_meters,hospital

BODE calculations require a BMI value, so you will have to create a function for it.

Your output should be in the form of two CSV files, patient_output.csv and hospital_output.csv.

Patient_output will have the following columns:
NAME,BODE_SCORE,BODE_RISK,HOSPITAL

Hospital output will have the following columns:
HOSPITAL_NAME, COPD_COUNT, PCT_OF_COPD_CASES_OVER_BEDS, AVG_SCORE, AVG_RISK

Each function you create should have documentation and a suitable number of test cases. If the input data could be wrong, make sure to raise a Value Error.

For this assignment, use the doctest, json, and csv libraries. Pandas is not allowed for this assignment.

In [None]:
import doctest
import json
import csv

### Step 1: Calculate BMI

In [None]:
def compute_body_mass_index(mass_kg, height_meters):
    """
    Calculates Body Mass Index (BMI).

    Args:
        mass_kg (float): Body mass in kilograms.
        height_meters (float): Height in meters.

    Returns:
        float: The calculated BMI, rounded to two decimal places.

    Raises:
        ValueError: If height_meters or mass_kg is less than or equal to 0.
    """
    if height_meters <= 0 or mass_kg <= 0:
        raise ValueError("Height and weight must be positive numbers.")
    bmi_value = mass_kg / (height_meters ** 2)
    return round(bmi_value, 2)


### Step 2: Calculate BODE Score

In [None]:
def calculate_bode_index(bmi_val, fev_percentage, breathlessness, walk_distance_m):
    # BMI score calculation
    if bmi_val >= 21:
        bmi_score = 0
    else:
        bmi_score = 1

    # FEV1 percentage score calculation
    if fev_percentage >= 65:
        fev1_score = 0
    elif 50 <= fev_percentage < 65:
        fev1_score = 1
    elif 36 <= fev_percentage < 50:
        fev1_score = 2
    else:
        fev1_score = 3

    # Dyspnea score calculation
    dyspnea_scores = {
        "only strenuous exercise": 0,
        "when hurrying or walking up a slight hill": 1,
        "slower than peers": 2,
        "stops after 100 yards": 3,
        "unable to leave home": 4
    }

    breathlessness = breathlessness.lower().strip()

    if breathlessness in dyspnea_scores:
        dyspnea_score = dyspnea_scores[breathlessness]
    else:
        raise ValueError(f"Invalid dyspnea description: {breathlessness}")

    # 6-minute walk distance score calculation
    if walk_distance_m >= 350:
        walk_score = 0
    elif 250 <= walk_distance_m < 350:
        walk_score = 1
    elif 150 <= walk_distance_m < 250:
        walk_score = 2
    else:
        walk_score = 3

    # Total BODE score calculation
    total_bode_score = bmi_score + fev1_score + dyspnea_score + walk_score
    return total_bode_score



### Step 3: Calculate BODE Risk

In [None]:
def calculate_bode_risk(bode_score):
    # Determine the survival rate based on the BODE score
    if 0 <= bode_score <= 2:
        survival_rate = 80  # 80% survival, 20% risk
    elif 3 <= bode_score <= 4:
        survival_rate = 67  # 67% survival, 33% risk
    elif 5 <= bode_score <= 6:
        survival_rate = 57  # 57% survival, 43% risk
    elif 7 <= bode_score <= 10:
        survival_rate = 18  # 18% survival, 82% risk
    else:
        raise ValueError(f"Invalid BODE score: {bode_score}")

    # Calculate the risk percentage
    risk_percentage = 100 - survival_rate
    return risk_percentage


### Step 4: Load Hospital Data

In [None]:
from google.colab import files
uploaded = files.upload()


Saving hospitals (1).json to hospitals (1).json
Saving patient (1).csv to patient (1).csv


In [None]:
def load_hospital_data(json_file_path):
    try:
        print(f"Trying to open file: {json_file_path}")
        # Open the JSON file and load its contents
        with open(json_file_path, 'r') as file:
            hospital_data = json.load(file)
            print("Loaded Hospital Data:", hospital_data)

        # Check if the data is a list (as expected for hospital systems)
        if not isinstance(hospital_data, list):
            raise ValueError("Hospital data should be a list of systems")

        # Prepare a dictionary to store hospital information
        hospital_info = {}
        for system in hospital_data:
            system_name = system.get('system')
            # Loop through each hospital in the system and collect info
            for hospital in system.get('hospitals', []):
                name = hospital.get('name')
                beds = hospital.get('beds')
                # Store the hospital's system and bed count in the dictionary
                hospital_info[name] = {'system': system_name, 'beds': beds}

        # Return the compiled hospital info
        return hospital_info

    # Handle cases where the file is not found
    except FileNotFoundError:
        raise FileNotFoundError(f"The file {json_file_path} was not found.")

    # Handle cases where the file content is not valid JSON
    except json.JSONDecodeError:
        raise ValueError(f"Failed to decode JSON from {json_file_path}")

    # Handle any other exceptions that might occur
    except Exception as e:
        raise RuntimeError(f"An error occurred while loading the hospital data: {e}")

# Step 3: Specify the path to the uploaded file
# Replace 'hospitals (1).json' with the actual name of your uploaded file
hospital_json = 'hospitals (1).json'

# Step 4: Load the hospital data from the specified JSON file
hospital_info = load_hospital_data(hospital_json)

# Step 5: Print the loaded hospital information
print(hospital_info)



Trying to open file: hospitals (1).json
Loaded Hospital Data: [{'system': 'BJC', 'hospitals': [{'name': 'BJC', 'beds': 2000}, {'name': 'BJC WEST COUNTY', 'beds': 1000}, {'name': 'MISSOURI BAPTIST', 'beds': 800}]}, {'system': 'SSM', 'hospitals': [{'name': 'SAINT LOUIS UNIVERSITY', 'beds': 1000}, {'name': "ST.MARY'S", 'beds': 500}]}, {'system': "ST.LUKE'S", 'hospitals': [{'name': "ST.LUKE'S", 'beds': 800}]}]
{'BJC': {'system': 'BJC', 'beds': 2000}, 'BJC WEST COUNTY': {'system': 'BJC', 'beds': 1000}, 'MISSOURI BAPTIST': {'system': 'BJC', 'beds': 800}, 'SAINT LOUIS UNIVERSITY': {'system': 'SSM', 'beds': 1000}, "ST.MARY'S": {'system': 'SSM', 'beds': 500}, "ST.LUKE'S": {'system': "ST.LUKE'S", 'beds': 800}}


In [None]:
import json
import csv

# Step 1: Load Hospital Data from JSON File
def load_hospital_data(json_file_path):
    try:
        print(f"Trying to open file: {json_file_path}")

        # Open the JSON file and load the hospital data
        with open(json_file_path, 'r') as file:
            hospital_data = json.load(file)
            print("Loaded Hospital Data:", hospital_data)

        # Check if the hospital data is a list (as expected)
        if not isinstance(hospital_data, list):
            raise ValueError("Hospital data should be a list of systems")

        # Create a dictionary to store hospital info
        hospital_info = {}
        for system in hospital_data:
            system_name = system.get('system')

            # Loop through each hospital in the system
            for hospital in system.get('hospitals', []):
                name = hospital.get('name')
                beds = hospital.get('beds')
                # Store hospital name, system, and bed count
                hospital_info[name] = {'system': system_name, 'beds': beds, 'patients': []}

        return hospital_info

    except FileNotFoundError:
        raise FileNotFoundError(f"The file {json_file_path} was not found.")
    except json.JSONDecodeError:
        raise ValueError(f"Failed to decode JSON from {json_file_path}")
    except Exception as e:
        raise RuntimeError(f"An error occurred while loading the hospital data: {e}")

# Step 2: Process Patient Data from CSV File
def process_patient_data(patient_csv, hospital_info):
    patient_results = []  # To store patient information
    hospital_patient_counts = {}  # To count patients per hospital

    try:
        # Open the patient CSV file and read it
        with open(patient_csv, 'r') as file:
            reader = csv.DictReader(file)
            print("CSV Headers:", reader.fieldnames)

            # Loop through each row (patient) in the CSV file
            for row in reader:
                print("Row Data:", row)

                if 'HOSPITAL' in row:
                    hospital_name = row['HOSPITAL']

                    # Check if the hospital exists in the hospital info dictionary
                    if hospital_name in hospital_info:
                        bode_score = float(row.get('BODE_SCORE', 0))  # Get BODE score
                        bode_risk = float(row.get('BODE_RISK', 0))    # Get BODE risk

                        # Append patient data to the corresponding hospital
                        hospital_info[hospital_name]['patients'].append({'bode_score': bode_score, 'bode_risk': bode_risk})

                        # Store patient results in the list
                        patient_results.append({
                            'HOSPITAL': hospital_name,
                            'BODE_SCORE': bode_score,
                            'BODE_RISK': bode_risk,
                        })

                        # Count the number of patients per hospital
                        if hospital_name not in hospital_patient_counts:
                            hospital_patient_counts[hospital_name] = 0
                        hospital_patient_counts[hospital_name] += 1
                    else:
                        print(f"Warning: Hospital '{hospital_name}' not found in hospital data")
                else:
                    print("Warning: 'HOSPITAL' key not found in row:", row)

    except FileNotFoundError:
        raise FileNotFoundError(f"The file {patient_csv} was not found.")
    except Exception as e:
        raise RuntimeError(f"An error occurred while processing patient data: {e}")

    return patient_results, hospital_patient_counts

# Step 3: Calculate Hospital Statistics
def calculate_hospital_statistics(hospital_info):
    output_data = []  # To store the statistics for each hospital

    # Loop through each hospital in hospital info
    for hospital_name, data in hospital_info.items():
        patient_count = len(data['patients'])  # Number of patients
        total_beds = data['beds']  # Total number of beds in the hospital

        # Calculate the average BODE score and BODE risk
        avg_bode_score = sum(p['bode_score'] for p in data['patients']) / patient_count if patient_count > 0 else 0
        avg_bode_risk = sum(p['bode_risk'] for p in data['patients']) / patient_count if patient_count > 0 else 0

        # Calculate the percentage of COPD cases over available beds
        pct_copd_cases_over_beds = (patient_count / total_beds) * 100 if total_beds > 0 else 0

        # Append the calculated statistics to the output list
        output_data.append({
            'HOSPITAL_NAME': hospital_name,
            'COPD_COUNT': patient_count,
            'PCT_OF_COPD_CASES_OVER_BEDS': pct_copd_cases_over_beds,
            'AVG_SCORE': avg_bode_score,
            'AVG_RISK': avg_bode_risk
        })

    return output_data

# Step 4: Write Hospital Statistics to a CSV File
def write_hospital_output(output_data, output_file):
    with open(output_file, 'w', newline='') as file:
        writer = csv.DictWriter(file, fieldnames=output_data[0].keys())
        writer.writeheader()
        writer.writerows(output_data)

# Step 5: Execute the functions with your file paths
hospital_json = 'hospitals (1).json'  # Use the correct file path
patient_csv = 'patient (1).csv'  # Use the correct file path
hospital_output_csv = 'hospital_output.csv'  # Output file for the results

# Load hospital data
hospital_info = load_hospital_data(hospital_json)

# Process patient data and update hospital info
process_patient_data(patient_csv, hospital_info)

# Calculate statistics based on hospital info
hospital_statistics = calculate_hospital_statistics(hospital_info)

# Write the hospital statistics to an output CSV file
write_hospital_output(hospital_statistics, hospital_output_csv)

print("Hospital data processed and output written to CSV successfully.")


Trying to open file: hospitals (1).json
Loaded Hospital Data: [{'system': 'BJC', 'hospitals': [{'name': 'BJC', 'beds': 2000}, {'name': 'BJC WEST COUNTY', 'beds': 1000}, {'name': 'MISSOURI BAPTIST', 'beds': 800}]}, {'system': 'SSM', 'hospitals': [{'name': 'SAINT LOUIS UNIVERSITY', 'beds': 1000}, {'name': "ST.MARY'S", 'beds': 500}]}, {'system': "ST.LUKE'S", 'hospitals': [{'name': "ST.LUKE'S", 'beds': 800}]}]
CSV Headers: ['NAME', 'SSN', 'LANGUAGE', 'JOB', 'HEIGHT_M', 'WEIGHT_KG', 'fev_pct', 'dyspnea_description', 'distance_in_meters', 'hospital']
Row Data: {'NAME': 'Vanessa Roberts', 'SSN': '295-82-3703', 'LANGUAGE': 'Belarusian', 'JOB': 'Teacher English as a foreign language', 'HEIGHT_M': '1.72', 'WEIGHT_KG': '90.28', 'fev_pct': '57.73', 'dyspnea_description': 'STOPS AFTER A FEW MINUTES', 'distance_in_meters': '367.9', 'hospital': "ST.LUKE'S"}
Row Data: {'NAME': 'Christopher Fox', 'SSN': '286-30-9664', 'LANGUAGE': 'Macedonian', 'JOB': 'Local government officer', 'HEIGHT_M': '1.64', 'WEI

### Step 5: Main business logic

Call BODE Score, BODE Risk functions for each patient.

For each hospital, calculate Avg BODE score and Avg BODE risk and count the number of cases for each hospital.

In [None]:
import json
import csv

# Function to load hospital data from JSON file
def load_hospital_data(json_file_path):
    try:
        with open(json_file_path, 'r') as file:
            hospital_data = json.load(file)
        # Create a dictionary to store hospital info (key: hospital name, value: data)
        hospital_info = {}  # Initialize an empty dictionary
        for system in hospital_data:
            for hospital in system.get('hospitals', []):
                hospital_info[hospital['name']] = {
                    'system': system.get('system'),
                    'beds': hospital.get('beds'),
                    'patients': []  # Initialize an empty list for patients
                }
        return hospital_info  # Return the dictionary
    except FileNotFoundError:
        print(f"File not found: {json_file_path}")
        return {}  # Return empty dictionary if file not found
    except json.JSONDecodeError:
        print(f"Error decoding JSON: {json_file_path}")
        return {}  # Return empty dictionary if JSON error

# Function to process patient data and link to hospital info
def process_patient_data(patient_csv_file, hospital_info):
    patient_results = []  # To store patient data
    hospital_patient_counts = {}  # To count patients per hospital

    try:
        # Open the patient CSV file and read it
        with open(patient_csv_file, 'r') as file:
            reader = csv.DictReader(file)

            # Loop through each row (patient) in the CSV file
            for row in reader:
                hospital_name = row.get('HOSPITAL', '').strip()
                bode_score = float(row.get('BODE_SCORE', 0))
                bode_risk = float(row.get('BODE_RISK', 0))

                if hospital_name in hospital_info:
                    # Append patient data to the corresponding hospital
                    hospital_info[hospital_name]['patients'].append({'bode_score': bode_score, 'bode_risk': bode_risk})

                    # Add to patient results
                    patient_results.append({
                        'HOSPITAL': hospital_name,
                        'BODE_SCORE': bode_score,
                        'BODE_RISK': bode_risk
                    })

                    # Track patient counts by hospital
                    if hospital_name not in hospital_patient_counts:
                        hospital_patient_counts[hospital_name] = 0
                    hospital_patient_counts[hospital_name] += 1
                else:
                    print(f"Warning: Hospital '{hospital_name}' not found in hospital info.")

    except FileNotFoundError:
        print(f"File not found: {patient_csv_file}")
    except Exception as e:
        print(f"An error occurred while processing patient data: {e}")

    return patient_results, hospital_patient_counts

# Function to calculate statistics for each hospital
def calculate_hospital_statistics(hospital_info):
    hospital_statistics = []  # To store hospital statistics

    for hospital_name, data in hospital_info.items():
        patient_count = len(data['patients'])
        total_beds = data['beds']

        avg_bode_score = sum(p['bode_score'] for p in data['patients']) / patient_count if patient_count > 0 else 0
        avg_bode_risk = sum(p['bode_risk'] for p in data['patients']) / patient_count if patient_count > 0 else 0

        # Calculate the percentage of COPD cases compared to total beds
        pct_copd_cases_over_beds = (patient_count / total_beds) * 100 if total_beds > 0 else 0

        hospital_statistics.append({
            'HOSPITAL_NAME': hospital_name,
            'COPD_COUNT': patient_count,
            'AVG_BODE_SCORE': avg_bode_score,
            'AVG_BODE_RISK': avg_bode_risk,
            'PCT_OF_COPD_CASES_OVER_BEDS': pct_copd_cases_over_beds,
            'TOTAL_BEDS': total_beds
        })

    return hospital_statistics

# Function to write data to a CSV file
def write_to_csv(output_file, data, fieldnames):
    with open(output_file, 'w', newline='') as csvfile:
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(data)
# Main logic
hospital_json = "hospitals (1).json"  # Path to hospital JSON file # Changed this line!
patient_csv = "patient (1).csv"  # Path to patient CSV file
patient_output_file = "patient_output.csv"  # Output file for patient data
hospital_output_file = "hospital_output.csv"  # Output file for hospital statistics

###
# Your logic here
hospital_info = load_hospital_data(hospital_json)
patient_results, hospital_patient_counts = process_patient_data(patient_csv, hospital_info)
hospital_statistics = calculate_hospital_statistics(hospital_info)

# Write Patient_output.csv
with open(patient_output_file, 'w', newline='') as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=['HOSPITAL', 'BODE_SCORE', 'BODE_RISK'])
    writer.writeheader()
    writer.writerows(patient_results)

# Write Hospital_output.csv
with open(hospital_output_file, 'w', newline='') as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=hospital_statistics[0].keys())
    writer.writeheader()
    writer.writerows(hospital_statistics)

print("Patient and hospital statistics written to CSV files successfully.")



Patient and hospital statistics written to CSV files successfully.
