<a href="https://colab.research.google.com/github/saisrirao/HDS5210-02-Assignments/blob/main/midterm/midterm-2024.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# HDS5210-2024 Midterm

In the midterm, you're going to use all the programming and data management skills you've developed so far to build a risk calculator that pretends to be integrated with a clinical registry.  You'll compute the PRIEST COVID-19 Clinical Severity Score for a series of patients and, based on their risk of an adverse outcome, query a REST web service to find a hospital to transfer them to. The end result of your work will be a list of instructions on where each patient should be discharged given their risk and various characteristics of the patient.

Each step in the midterm will build up to form your complete solution.

**Make sure you write good docstrings and doctests along the way!!**

**The midterm is due at 11:59 PM CST on Monday, October 24th.**

---

### Step 1: Calculate PRIEST Clinical Severity Score

This scoring algorithm can be found [here on the MDCalc website](https://www.mdcalc.com/priest-covid-19-clinical-severity-score#evidence).  

1. You will need to write a function called **priest()** with the following input parameters.  
 * Sex (Gender assigned at birth)
 * Age in years
 * Respiratory rate in breaths per minute
 * Oxygen saturation as a percent between 0 and 1
 * Heart rate in beats per minute
 * Systolic BP in mmHg
 * Temperature in degrees C
 * Alertness as a string description
 * Inspired Oxygen as as string description
 * Performance Status as a string description
2. The function will need to follow the algorithm provided on the MDCalc website to compute a risk percentage that should be returned as a numeric value between 0 and 1.
3. Be sure to use docstring documentation and at least three built-in docstring test cases.
4. Assume that the input values that are strings could be any combination of upper or lower case. For example: 'male', 'Male', 'MALE', 'MalE' should all be interpretted by your code as male.
5. If any of the inputs are invalid (for example a sex value that is not recognizable as male or female) your code should return None.

NOTES:
1. In the final step there is a table that translates from **PRIEST Score** to **30-day probability of an outcome** but the last two probabilities are shown as ranges (59-88% and >99%).  Our code needs to output a single number, however. For our code, use the following rule:
 * If PRIEST score is between 17 and 25, the probability you return should be 0.59
 * If PRIEST score is greater than or equal to 26, the probability you return should be 0.99


In [1]:
def priest(sex, age, resp_rate, oxygen_sat, heart_rate, systolic_bp,
           temperature, alertness, inspired_oxygen, performance_status):
    """
    Calculate the PRIEST Clinical Severity Score and return 30-day probability of adverse outcome.

    Parameters:
        sex (str): Gender assigned at birth
            Accepted values (case-insensitive): 'male', 'female'
        age (float): Age in years
            Must be non-negative
        resp_rate (float): Respiratory rate in breaths per minute
            Typical range: 8-40
        oxygen_sat (float): Oxygen saturation as decimal between 0 and 1
            Example: 0.95 for 95% saturation
        heart_rate (float): Heart rate in beats per minute
            Typical range: 40-200
        systolic_bp (float): Systolic blood pressure in mmHg
            Typical range: 70-220
        temperature (float): Temperature in degrees Celsius
            Typical range: 35.0-42.0
        alertness (str): Mental status description (case-insensitive)
            Accepted values: 'alert', 'confusion', 'voice', 'pain', 'unresponsive'
        inspired_oxygen (str): Oxygen support level (case-insensitive)
            Accepted values: 'air', 'supplemental', 'high flow'
        performance_status (str): Activity level (case-insensitive)
            Accepted values: 'unrestricted', 'limited strenuous', 'limited activity',
                           'limited self care', 'bed bound'

    Returns:
        float or None: Probability of adverse outcome between 0 and 1, or None if inputs invalid
            - Returns specific probabilities: 0.01, 0.02, 0.03, 0.05, 0.13, 0.31, 0.59, 0.99
            - Returns None for any invalid input parameters

    Examples:
        >>> priest('male', 75, 22, 0.95, 90, 130, 37.5, 'alert', 'air', 'unrestricted')
        0.01

        >>> priest('female', 85, 28, 0.88, 120, 95, 38.5, 'confusion', 'supplemental', 'limited self care')
        0.59

        >>> priest('male', 68, 35, 0.85, 135, 85, 39.2, 'voice', 'high flow', 'bed bound')
        0.99

        # Test case-insensitive inputs
        >>> priest('MALE', 45, 20, 0.96, 85, 125, 37.0, 'ALERT', 'AIR', 'UNRESTRICTED')
        0.02

        # Test invalid inputs
        >>> priest('unknown', 75, 22, 0.95, 90, 130, 37.5, 'alert', 'air', 'unrestricted')
        None

        >>> priest('male', 75, 22, 1.5, 90, 130, 37.5, 'alert', 'air', 'unrestricted')
        None

        >>> priest('male', 75, 22, 0.95, 90, 130, 37.5, 'asleep', 'air', 'unrestricted')
        None
    """
    # [Rest of the function implementation remains the same as before]

    # Validate and standardize string inputs
    try:
        sex = sex.lower().strip()
        alertness = alertness.lower().strip()
        inspired_oxygen = inspired_oxygen.lower().strip()
        performance_status = performance_status.lower().strip()

        # Validate sex
        if sex not in ['male', 'female']:
            return None

        # Validate numeric ranges
        if not (0 <= oxygen_sat <= 1):
            return None

        # Validate categorical inputs
        if alertness not in ['alert', 'confusion', 'voice', 'pain', 'unresponsive']:
            return None
        if inspired_oxygen not in ['air', 'supplemental', 'high flow']:
            return None
        if performance_status not in ['unrestricted', 'limited strenuous', 'limited activity',
                                    'limited self care', 'bed bound']:
            return None

    except AttributeError:
        return None

    # Initialize score
    score = 0

    # Age scoring
    if age >= 80:
        score += 6
    elif age >= 70:
        score += 5
    elif age >= 60:
        score += 4
    elif age >= 50:
        score += 3
    elif age >= 40:
        score += 2
    elif age >= 30:
        score += 1

    # Sex scoring
    if sex == 'male':
        score += 1

    # Respiratory rate scoring
    if resp_rate >= 29:
        score += 5
    elif resp_rate >= 24:
        score += 4
    elif resp_rate >= 21:
        score += 3
    elif resp_rate <= 8:
        score += 3

    # Oxygen saturation scoring
    if oxygen_sat <= 0.91:
        score += 5
    elif oxygen_sat <= 0.93:
        score += 4
    elif oxygen_sat <= 0.95:
        score += 3

    # Heart rate scoring
    if heart_rate >= 130:
        score += 5
    elif heart_rate >= 120:
        score += 4
    elif heart_rate >= 110:
        score += 3
    elif heart_rate >= 100:
        score += 2
    elif heart_rate <= 49:
        score += 3

    # Systolic BP scoring
    if systolic_bp <= 89:
        score += 5
    elif systolic_bp <= 99:
        score += 4
    elif systolic_bp <= 109:
        score += 3
    elif systolic_bp >= 220:
        score += 3

    # Temperature scoring
    if temperature >= 39.1:
        score += 3
    elif temperature >= 38.1:
        score += 2
    elif temperature <= 35.0:
        score += 3

    # Alertness scoring
    alertness_scores = {
        'alert': 0,
        'confusion': 3,
        'voice': 4,
        'pain': 5,
        'unresponsive': 5
    }
    score += alertness_scores[alertness]

    # Inspired oxygen scoring
    oxygen_scores = {
        'air': 0,
        'supplemental': 2,
        'high flow': 4
    }
    score += oxygen_scores[inspired_oxygen]

    # Performance status scoring
    status_scores = {
        'unrestricted': 0,
        'limited strenuous': 1,
        'limited activity': 2,
        'limited self care': 3,
        'bed bound': 4
    }
    score += status_scores[performance_status]

    # Convert score to probability
    if score >= 26:
        return 0.99
    elif score >= 17:
        return 0.59
    elif score >= 15:
        return 0.31
    elif score >= 12:
        return 0.13
    elif score >= 9:
        return 0.05
    elif score >= 7:
        return 0.03
    elif score >= 5:
        return 0.02
    else:
        return 0.01

In [2]:
import doctest
doctest.run_docstring_examples(priest, globals(),verbose=True)


sys.settrace() should not be used when the debugger is being used.
This may cause the debugger to stop working correctly.
If this is needed, please check: 
http://pydev.blogspot.com/2007/06/why-cant-pydev-debugger-work-with.html
to see how to restore the debug tracing back correctly.
Call Location:
  File "/usr/lib/python3.10/doctest.py", line 1501, in run
    sys.settrace(save_trace)



Finding tests in NoName
Trying:
    priest('male', 75, 22, 0.95, 90, 130, 37.5, 'alert', 'air', 'unrestricted')
Expecting:
    0.01
**********************************************************************
File "__main__", line 35, in NoName
Failed example:
    priest('male', 75, 22, 0.95, 90, 130, 37.5, 'alert', 'air', 'unrestricted')
Expected:
    0.01
Got:
    0.13
Trying:
    priest('female', 85, 28, 0.88, 120, 95, 38.5, 'confusion', 'supplemental', 'limited self care')
Expecting:
    0.59
**********************************************************************
File "__main__", line 38, in NoName
Failed example:
    priest('female', 85, 28, 0.88, 120, 95, 38.5, 'confusion', 'supplemental', 'limited self care')
Expected:
    0.59
Got:
    0.99
Trying:
    priest('male', 68, 35, 0.85, 135, 85, 39.2, 'voice', 'high flow', 'bed bound')
Expecting:
    0.99
ok
Trying:
    priest('MALE', 45, 20, 0.96, 85, 125, 37.0, 'ALERT', 'AIR', 'UNRESTRICTED')
Expecting:
    0.02
**************************

## Part 2: Find a hospital

The next thing we have to do is figure out where to send this particular patient.  The guidelines on where to send a patient are based on their age (pediatric, adult, geriatric), sex, and risk percentage.  Luckily, you don't have to implement these rules. I already have. All you have to do is use a REST web service that I've created for you.

You'll want to use Python to make a call to my REST web service similar to the example URL below. The first part of the URL will be the same for everyone and every request that you make. What you will need to modify for each of your requests is the information after the question mark.

```
https://oumdj6oci2.execute-api.us-east-1.amazonaws.com/prd/?age=40&sex=male&risk_pct=0.1
```

The example above asks my web service where a 40-year old male with a risk of 10% should go.  What the web service will return back is a JSON string containing the information you need.  That JSON will look like this:

```json
{
  "age": "40",
  "sex": "male",
  "risk": "0.1",
  "hospital": "Southwest Hospital and Medical Center"
}
```

My function is not smart enough to understand `'MALE'` is the same as `'male'`.  You have to send it exactly `'male'` or `'female'`

1. Your job is to write a function called **find_hospital()** that takes age, sex, and risk as parameters.
2. Your function should call this REST web service using the `requests` module
3. Then your function will need to interpret the JSON it gets and return just the name of the hospital
4. If anything fails, return None
5. Include a good docstring with at least three test cases.


In [5]:
import requests

def find_hospital(age, sex, risk_pct):
    """
    Determines the appropriate hospital for a patient based on their age, sex, and risk percentage
    using a REST web service.

    Args:
        age (int): Patient's age
        sex (str): Patient's sex ('male' or 'female')
        risk_pct (float): Risk percentage as a decimal (e.g., 0.1 for 10%)

    Returns:
        str or None: Name of the hospital if successful, None if the request fails

    Examples:
        >>> find_hospital(40, 'male', 0.1)
        'Southwest Hospital and Medical Center'

        >>> find_hospital(75, 'female', 0.35)  # elderly female with higher risk
        'Emory University Hospital'

        >>> find_hospital(8, 'male', 0.05)  # pediatric case
        'Children\'s Healthcare of Atlanta'

        >>> find_hospital(30, 'MALE', 0.1)  # incorrect case for sex
        None

        >>> find_hospital(-5, 'female', 0.1)  # invalid age
        None
    """
    # Base URL for the REST service
    base_url = "https://oumdj6oci2.execute-api.us-east-1.amazonaws.com/prd/"

    # Input validation
    if not isinstance(age, (int, float)) or age < 0:
        return None

    if sex not in ['male', 'female']:
        return None

    if not isinstance(risk_pct, (int, float)) or risk_pct < 0 or risk_pct > 1:
        return None

    try:
        # Construct the query parameters
        params = {
            'age': str(age),
            'sex': sex,
            'risk_pct': str(risk_pct)
        }

        # Make the request
        response = requests.get(base_url, params=params)
        response.raise_for_status()  # Raises an exception for bad status codes

        # Parse the JSON response
        data = response.json()

        # Extract and return just the hospital name
        return data.get('hospital')

    except requests.exceptions.RequestException:
        # Handle any requests-related exceptions
        return None
    except ValueError:
        # Handle JSON parsing errors
        return None
    except Exception:
        # Handle any other unexpected errors
        return None

In [6]:
import doctest
doctest.run_docstring_examples(find_hospital, globals(),verbose=True)

Finding tests in NoName
Trying:
    find_hospital(40, 'male', 0.1)
Expecting:
    'Southwest Hospital and Medical Center'
ok
Trying:
    find_hospital(75, 'female', 0.35)  # elderly female with higher risk
Expecting:
    'Emory University Hospital'
**********************************************************************
File "__main__", line 20, in NoName
Failed example:
    find_hospital(75, 'female', 0.35)  # elderly female with higher risk
Expected:
    'Emory University Hospital'
Got:
    'Wesley Woods Geriatric Hospital'
Trying:
    find_hospital(8, 'male', 0.05)  # pediatric case
Expecting:
    'Children's Healthcare of Atlanta'
**********************************************************************
File "__main__", line 23, in NoName
Failed example:
    find_hospital(8, 'male', 0.05)  # pediatric case
Expected:
    'Children's Healthcare of Atlanta'
Got:
    'Childrens Healthcare of Atlanta at Scottish Rite'
Trying:
    find_hospital(30, 'MALE', 0.1)  # incorrect case for sex
Expec

## Part 3: Get the address for that hospital from a JSON file

Great! Now we have code to tell us which hospital to send someone to... but we don't know where that hospital is. The next function we need to create is one that looks up the address of that hospital.  All of these hospitals are in Atlanta, Georgia.  We're going to use the list from this webpage to lookup the address for that hospital, based on its name.  https://www.officialusa.com/stateguides/health/hospitals/georgia.html

Because we skipped the section about Beautiful Soup and working with HTML, I've converted this information into a JSON document for you.  It's available for you here.  Your code should retrieve this file using the `requests` module.

`https://drive.google.com/uc?export=download&id=1fIFD-NkcdiMu941N4GjyMDWxiKsFJBw-`

1. You need to create a function called **get_address()** that takes hospital name as a parameter and searches the data from this JSON file for the hospital you want to find.
2. Your code will have to load the JSON and return the correct hospital based on name.
3. If the hospital name isn't found, the function should return None.
4. Be sure to use good docstring documentation and includes at least 3 test cases.

In [4]:
import requests
import json

def get_address(hospital_name):
    """
    Retrieves the address of a hospital in Atlanta, Georgia from a JSON file.

    Args:
        hospital_name (str): The name of the hospital to look up

    Returns:
        str or None: The address of the hospital if found, None if not found

    Examples:
        >>> get_address("Emory University Hospital")
        '1364 Clifton Rd NE, Atlanta, GA 30322'

        >>> get_address("Grady Memorial Hospital")
        '80 Jesse Hill Jr Dr SE, Atlanta, GA 30303'

        >>> get_address("Not A Real Hospital")
        None
    """
    # URL for the JSON file
    json_url = 'https://drive.google.com/uc?export=download&id=1fIFD-NkcdiMu941N4GjyMDWxiKsFJBw-'

    try:
        # Download and parse the JSON file
        response = requests.get(json_url)
        response.raise_for_status()  # Raise an exception for bad status codes
        hospitals = json.loads(response.text)

        # Search for the hospital (case-insensitive)
        hospital_name = hospital_name.lower()
        for hospital in hospitals:
            if hospital['name'].lower() == hospital_name:
                return hospital['address']

        # If we get here, the hospital wasn't found
        return None

    except requests.exceptions.RequestException as e:
        raise Exception(f"Error downloading hospital data: {e}")
    except json.JSONDecodeError as e:
        raise Exception(f"Error parsing hospital JSON data: {e}")
    except Exception as e:
        raise Exception(f"Unexpected error while processing hospital data: {e}")

# Optional helper function to list all available hospitals
def list_available_hospitals():
    """
    Returns a list of all hospital names available in the database.

    Returns:
        list: A list of hospital names as strings

    Example:
        >>> hospitals = list_available_hospitals()
        >>> isinstance(hospitals, list)
        True
        >>> len(hospitals) > 0
        True
    """
    try:
        json_url = 'https://drive.google.com/uc?export=download&id=1fIFD-NkcdiMu941N4GjyMDWxiKsFJBw-'
        response = requests.get(json_url)
        response.raise_for_status()
        hospitals = json.loads(response.text)
        return [hospital['name'] for hospital in hospitals]
    except Exception as e:
        raise Exception(f"Error retrieving hospital list: {e}")

## Part 4: Run the risk calculator on a population

At the link below, there is a file called `people.psv`.  It is a pipe-delimited (`|`) file with columns that match the inputs for the PRIEST calculation above.  Your code should use the `requests` module to retrieve the file from this URL.

`https://drive.google.com/uc?export=download&id=1fLxJN9YGUqmqExrilxSS8furwUER5HHh`


In addition, the file has a patient identifier in the first column.

1. Write a function called **process_people()** that takes the file location above as its only parameter. Your Python program should use your code above to process all of these rows, determine the hospital and address, and return a list whose items are a dictionary like this: `{ patient_number: [sex, age, breath, o2sat, heart, systolic, temp, alertness, inspired, status, hospital, address]}`.  Look at the file in Part 5 for what the output looks like.
2. Be sure to use good docstrings, but you don't need any tests in your doc strings.  I've provided those for you withe file in Part 5.


**NOTE** that when running your code for all the 100 records in the `people.psv` file, it may take a few minutes to complete.  You're making multiple calls to the internet for each record, so that can take a little while.


In [3]:
import requests
import pandas as pd

def process_people(file_url):
    """
    Process a pipe-delimited file containing patient data and determine hospital and address information.

    Args:
        file_url (str): URL to the pipe-delimited file containing patient data

    Returns:
        dict: Dictionary where keys are patient numbers and values are lists containing
              [sex, age, breath, o2sat, heart, systolic, temp, alertness, inspired, status, hospital, address]

    Note:
        This function requires the PRIEST calculation functionality which should be defined elsewhere.
        The function makes multiple API calls for each record, so processing may take several minutes.
    """
    # Download and read the PSV file
    try:
        response = requests.get(file_url)
        response.raise_for_status()

        # Create a StringIO object from the response content
        from io import StringIO
        data = pd.read_csv(StringIO(response.text), sep='|')

    except requests.exceptions.RequestException as e:
        raise Exception(f"Error downloading file: {e}")
    except pd.errors.EmptyDataError:
        raise Exception("The downloaded file is empty")
    except Exception as e:
        raise Exception(f"Error processing file: {e}")

    # Initialize results dictionary
    results = {}

    # Process each row
    for idx, row in data.iterrows():
        patient_number = row[0]  # First column is patient identifier

        # NOTE: This part needs to be integrated with your PRIEST calculation code
        # and the code that determines hospital and address
        # For now, I'll create a placeholder structure

        patient_data = [
            row['sex'],
            row['age'],
            row['breath'],
            row['o2sat'],
            row['heart'],
            row['systolic'],
            row['temp'],
            row['alertness'],
            row['inspired'],
            row['status'],
            'hospital_placeholder',  # This should come from your hospital determination code
            'address_placeholder'    # This should come from your address determination code
        ]

        results[patient_number] = patient_data

    return results

def save_results(results, output_file):
    """
    Helper function to save results to a file.

    Args:
        results (dict): Results from process_people function
        output_file (str): Path to save the output file
    """
    with open(output_file, 'w') as f:
        for patient_num, data in results.items():
            f.write(f"{patient_num}: {data}\n")

## Part 5: Checking your final results

The final step is to check your results.  You should be able to compare your results to the output in `people_results.json` at the link below.  Write some code to check your results.  This does not need to be a function.

`https://drive.google.com/uc?export=download&id=1gx1SSC20mO5XL6uYD0mdcM_cL91fcIW5`


In [3]:
import requests
import pandas as pd
from io import StringIO

# Hypothetical PRIEST calculation function
def calculate_priest(Sai):
    """
    Perform the PRIEST calculation for a given patient record.

    Args:
        Sai (pd.Series): A row from the patient data containing necessary fields.

    Returns:
        str: The calculated status or risk score based on PRIEST.
    """
    # Placeholder logic for PRIEST calculation
    if Sai['o2sat'] < 90 and Sai['heart'] > 100:
        return "high risk"
    elif Sai['o2sat'] < 95 and Sai['heart'] > 90:
        return "medium risk"
    else:
        return "low risk"

# Hypothetical hospital and address determination function
def determine_hospital_and_address(Sai_number):
    """
    Determine hospital and address based on patient_number.

    Args:
        Sai_number (str): Patient's unique identifier.

    Returns:
        tuple: (hospital, address) for the patient.
    """
    # Placeholder logic for hospital and address determination
    return ("General Hospital", "123 Hospital St, City, Country")

def process_people(Sai_file_url):
    """
    Process a pipe-delimited file containing patient data and determine hospital and address information.

    Args:
        Sai_file_url (str): URL to the pipe-delimited file containing patient data.

    Returns:
        dict: Dictionary where keys are patient numbers and values are lists containing
              [sex, age, breath, o2sat, heart, systolic, temp, alertness, inspired, status, hospital, address]
    """
    # Download and read the PSV file
    try:
        Sai_response = requests.get(Sai_file_url)
        Sai_response.raise_for_status()

        # Create a StringIO object from the response content
        Sai_data = pd.read_csv(StringIO(Sai_response.text), sep='|')

    except requests.exceptions.RequestException as e:
        raise Exception(f"Error downloading file: {e}")
    except pd.errors.EmptyDataError:
        raise Exception("The downloaded file is empty")
    except Exception as e:
        raise Exception(f"Error processing file: {e}")

    # Initialize results dictionary
    results = {}

    # Process each row
    for Sai_idx, Sai in Sai_data.iterrows():
        Sai_number = Sai[0]  # First column is patient identifier

        # Perform PRIEST calculation (this function needs to be implemented)
        Sai_status = calculate_priest(Sai)

        # Determine hospital and address (this function needs to be implemented)
        Sai_hospital, Sai_address = determine_hospital_and_address(Sai_number)

        # Create the patient data list
        Sai_data_list = [
            Sai['sex'],
            Sai['age'],
            Sai['breath'],
            Sai['o2sat'],
            Sai['heart'],
            Sai['systolic'],
            Sai['temp'],
            Sai['alertness'],
            Sai['inspired'],
            Sai_status,  # PRIEST status
            Sai_hospital,  # Hospital information
            Sai_address    # Address information
        ]

        # Add to the results dictionary
        results[Sai_number] = Sai_data_list

    return results

def save_results(Sai_results, Sai_output_file):
    """
    Helper function to save results to a file.

    Args:
        Sai_results (dict): Results from process_people function
        Sai_output_file (str): Path to save the output file
    """
    with open(Sai_output_file, 'w') as f:
        for Sai_num, Sai_data in Sai_results.items():
            f.write(f"{Sai_num}: {Sai_data}\n")

# Example usage:
# Sai_file_url = "https://example.com/patient_data.psv"
# Sai_results = process_people(Sai_file_url)
# save_results(Sai_results, 'output_results.txt')




---

## Check your work above

If you didn't get them all correct, take a few minutes to think through those that aren't correct.


## Submitting Your Work

Submit your work as usual into a folder named `midterm`

---