The library that I chose is PyTest, a library that allows assertions/testing of functions. As a Data Science major, I've been in classes
like CS1420 and CS2420 (currently in CS3500), which make heavy use of testing functionality. I find that tests are very helpful in a plethora of ways, and PyTest has some really great functionality built into the library. Here is the link to its documentation: [Link text](https://docs.pytest.org/en/stable/).

Pytest allows for unit testing in python, which I find to be a powerful tool. Below is a basic introduction to some key functionalities of PyTest.

Advantages of PyTest:
    1. It is great at handling small unit tests
    2. It can parametrize tests, allowing all test cases to be stored in one space. This makes it readable and reusable.
    3. Detailed assertions tell the user exactly what's wrong (expected output vs. actual output, if incorrect).
    4. Test marking allows the user to fail tests on purpose, skip tests, and skip tests if some condition happens.
Limitations:
    1. Pytest isn't compatable with other languages, so multi-language programs can't use it.
    2. For projects that require really high level details, pytest doesn't usually suffice.
    3. Pytest requires python 3.8+, so anything written before these versions (anything with legacy code) is not compatible.
    4. It is a fairly slow, so large testing suites can take a long time to execute.

In [3]:
import pytest as pt
import pandas as pd
import ipytest
data = pd.read_csv("diabetic_data.csv")

# Creating a new column to showcase the assert functionality
def add_full_name(df):
    df['gender_and_race'] = df['gender'] + ' ' + df['race']
    return df

@pt.fixture
def sample_df():
    return data.head()
# This tests that the expected column of "gender_and_race" is created into the style that we want
# When called from a terminal using pytest, it will use the pytest runner. This block runs this function like a normal python function.
def test_add_full_name(sample_df):
    new_data_frame = add_full_name(sample_df)
    expected_full_names = [
        'Female Caucasian',
        'Female Caucasian',
        'Female AfricanAmerican',
        'Male Caucasian',
        'Male Caucasian',
    ]
    assert list(new_data_frame['gender_and_race']) == expected_full_names, \
        f"Expected full names {expected_full_names} but got {list(new_data_frame['gender_and_race'])}"
    print("Test passed!")


# Run the test
ipytest.run('-v')
#The warning below is becuase data.head() passes in a copy of the first 5 elements, rather than the actual first 5 elements. This is okay for this example.

platform win32 -- Python 3.13.0, pytest-8.3.3, pluggy-1.5.0 -- C:\Users\jacob\AppData\Local\Programs\Python\Python313\python.exe
cachedir: .pytest_cache
rootdir: C:\Users\jacob\OneDrive\Desktop\BMI6018\Pytest
plugins: anyio-4.6.2.post1
[1mcollecting ... [0mcollected 1 item

t_0364b4c1e070419aafc64c58d03ffabc.py::test_add_full_name [32mPASSED[0m[32m                             [100%][0m

t_0364b4c1e070419aafc64c58d03ffabc.py::test_add_full_name
  A value is trying to be set on a copy of a slice from a DataFrame.
  Try using .loc[row_indexer,col_indexer] = value instead
  
  See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
    df['gender_and_race'] = df['gender'] + ' ' + df['race']



<ExitCode.OK: 0>

In [4]:
# This tests that the expected column of "gender_and_race" is created into the style that we want
def test_add_full_name(sample_df):
    new_data_frame = add_full_name(sample_df)
    expected_full_names = [
        'Female AfricanAmerican',
        'Female Caucasian',
        'Female AfricanAmerican',
        'Male Caucasian',
        'Male Caucasian',
    ]
    assert list(new_data_frame['gender_and_race']) == expected_full_names, \
        f"Expected full names {expected_full_names} but got {list(new_data_frame['gender_and_race'])}"
    print("Test passed!")

ipytest.run('-v')

platform win32 -- Python 3.13.0, pytest-8.3.3, pluggy-1.5.0 -- C:\Users\jacob\AppData\Local\Programs\Python\Python313\python.exe
cachedir: .pytest_cache
rootdir: C:\Users\jacob\OneDrive\Desktop\BMI6018\Pytest
plugins: anyio-4.6.2.post1
[1mcollecting ... [0mcollected 1 item

t_0364b4c1e070419aafc64c58d03ffabc.py::test_add_full_name [31mFAILED[0m[31m                             [100%][0m

[31m[1m_______________________________________ test_add_full_name ________________________________________[0m

sample_df =    encounter_id  patient_nbr             race  gender      age weight  \
0       2278392      8222157        Caucasian...        NO          Male Caucasian  
4      Ch          Yes         NO          Male Caucasian  

[5 rows x 51 columns]

    [0m[94mdef[39;49;00m [92mtest_add_full_name[39;49;00m(sample_df):[90m[39;49;00m
        new_data_frame = add_full_name(sample_df)[90m[39;49;00m
        expected_full_names = [[90m[39;49;00m
            [33m'[39;49;00m[

<ExitCode.TESTS_FAILED: 1>

Above, you can see how easy it is to spot the difference between expected and actual outputs. One limitation of this is that data sets are not this small, so this method is really only useful for small tests like this.

In [16]:

# Function to calculate the mean of the `time_in_hospital` column
def calculate_mean_time_in_hospital(df, gender=None):
    if gender:
        df = df[df['gender'] == gender]
    return df['time_in_hospital'].mean()

# Parameterized test cases
@pt.mark.parametrize("gender, expected_mean", [
    (None, 1.8),  # Overall mean
    ('Female', 2.0),  # Mean for Female patients
    ('Male', 1.5),  # Mean for Male patients
])
def test_calculate_mean_time_in_hospital(gender, expected_mean, sample_df):
    result = calculate_mean_time_in_hospital(sample_df, gender)
    assert result == pt.approx(expected_mean), \
        f"Expected mean {expected_mean} but got {result}"

# Fixture to provide sample dataframe
@pt.fixture
def sample_df():
    return data.head()

Failing and skipping

In [15]:
# .xfail and .fail will both fail tests with a given reason. .xfail also doesn't allow code after it to be ran. The documentation says that
# "it is implemented internally by raising an exception."
@pt.xfail(reason="Not implemented yet")
def test_quicksort(array):
    pythonSortedArray = array.sort()
    userSortedArray = array.quickSort()
    assert pythonSortedArray == userSortedArray, f"expected {pythonSortedArray} but was {userSortedArray}"

XFailed: Not implemented yet

The test below is going to fail automatically because it's marked to fail. This is good because we don't want quicksort to be tested, as it's not implemented.

In [14]:
@pt.fixture
def array():
    return[5, 4, 3, 2, 1]

def quickSort(array):
    pass
    
@pt.fail(reason="Not implemented yet")
def test_quicksort(array):
    pythonSortedArray = array.sort()
    userSortedArray = quickSort(array)
    assert pythonSortedArray == userSortedArray, f"expected {pythonSortedArray} but was {userSortedArray}"

pytest.run('-v')

Failed: Not implemented yet

In [13]:
"""
I base this test off of my CS3500 class. We created a spreasheet class that contained formulas, and at one point we changed the public API of the
formula class and as a result lost a lot of our original tests. This skip functionality is really nice because we could keep the core idea of the test
in code, and then come back to it when we need to test the code.
"""
@pt.mark.skipif('evaluate' not in globals() or not callable(globals()['evaluate']), reason="deprecated test")
def test_evaluate_formula(formula):
    formula.evaluate(some_callback_function)

ipytest.run("-v")

platform win32 -- Python 3.13.0, pytest-8.3.3, pluggy-1.5.0
rootdir: C:\Users\jacob\OneDrive\Desktop\BMI6018\Pytest
plugins: anyio-4.6.2.post1
collected 3 items

t_0364b4c1e070419aafc64c58d03ffabc.py [31mF[0m[32m.[0m[33ms[0m[31m                                                    [100%][0m

[31m[1m_______________________________________ test_add_full_name ________________________________________[0m

sample_df =    encounter_id  patient_nbr             race  gender      age weight  \
0       2278392      8222157        Caucasian...        NO          Male Caucasian  
4      Ch          Yes         NO          Male Caucasian  

[5 rows x 51 columns]

    [0m[94mdef[39;49;00m [92mtest_add_full_name[39;49;00m(sample_df):[90m[39;49;00m
        new_data_frame = add_full_name(sample_df)[90m[39;49;00m
        expected_full_names = [[90m[39;49;00m
            [33m'[39;49;00m[33mFemale AfricanAmerican[39;49;00m[33m'[39;49;00m,[90m[39;49;00m
            [33m'[39;49;

<ExitCode.TESTS_FAILED: 1>

Raises

In [11]:
# Often times when testing, you want to ensure that an exception is thrown in certain cases. For example, we would expect division by zero to raise an
# exception.

# Function to test
def divide(a, b):
    return a / b

# Test function
def test_division_by_zero():
    with pt.raises(ZeroDivisionError, match="division by zero"):
        divide(1, 0)
    print('exception was raised!')

test_division_by_zero()
ipytest.run("-v")

exception was raised!
platform win32 -- Python 3.13.0, pytest-8.3.3, pluggy-1.5.0
rootdir: C:\Users\jacob\OneDrive\Desktop\BMI6018\Pytest
plugins: anyio-4.6.2.post1
collected 2 items

t_0364b4c1e070419aafc64c58d03ffabc.py [31mF[0m[32m.[0m[31m                                                     [100%][0m

[31m[1m_______________________________________ test_add_full_name ________________________________________[0m

sample_df =    encounter_id  patient_nbr             race  gender      age weight  \
0       2278392      8222157        Caucasian...        NO          Male Caucasian  
4      Ch          Yes         NO          Male Caucasian  

[5 rows x 51 columns]

    [0m[94mdef[39;49;00m [92mtest_add_full_name[39;49;00m(sample_df):[90m[39;49;00m
        new_data_frame = add_full_name(sample_df)[90m[39;49;00m
        expected_full_names = [[90m[39;49;00m
            [33m'[39;49;00m[33mFemale AfricanAmerican[39;49;00m[33m'[39;49;00m,[90m[39;49;00m
            

<ExitCode.TESTS_FAILED: 1>

In [10]:
# Function to test
def divide(a, b):
    return a / b

# Test function
def test_division_by_zero():
    with pt.raises(ZeroDivisionError, match="division by zero"):
        divide(1, 2)
    print('exception was raised!')

# This print statement was never called because an exception was not raised.
ipytest.run("-v")

platform win32 -- Python 3.13.0, pytest-8.3.3, pluggy-1.5.0
rootdir: C:\Users\jacob\OneDrive\Desktop\BMI6018\Pytest
plugins: anyio-4.6.2.post1
collected 2 items

t_0364b4c1e070419aafc64c58d03ffabc.py [31mF[0m[31mF[0m[31m                                                     [100%][0m

[31m[1m_______________________________________ test_add_full_name ________________________________________[0m

sample_df =    encounter_id  patient_nbr             race  gender      age weight  \
0       2278392      8222157        Caucasian...        NO          Male Caucasian  
4      Ch          Yes         NO          Male Caucasian  

[5 rows x 51 columns]

    [0m[94mdef[39;49;00m [92mtest_add_full_name[39;49;00m(sample_df):[90m[39;49;00m
        new_data_frame = add_full_name(sample_df)[90m[39;49;00m
        expected_full_names = [[90m[39;49;00m
            [33m'[39;49;00m[33mFemale AfricanAmerican[39;49;00m[33m'[39;49;00m,[90m[39;49;00m
            [33m'[39;49;00m[33mF

<ExitCode.TESTS_FAILED: 1>

In [17]:
"""
To run the testing suit, we use ipytest. After running ipytest, the results of each test that is found is printed below. There are 5 cases that pass, 
and then 1 case that fails. This is the expected behavior, as I wanted to show that tests will fail if there is a problem with the test. The failure message
is very descriptive, and it even shows the error where I used a copy of the dataframe instead of the actual data frame. I found the error accidentally,
but it showcases even more the power of PyTest.
"""
import ipytest
ipytest.autoconfig()
ipytest.run('-v')

platform win32 -- Python 3.13.0, pytest-8.3.3, pluggy-1.5.0
rootdir: C:\Users\jacob\OneDrive\Desktop\BMI6018\Pytest
plugins: anyio-4.6.2.post1
collected 6 items

t_0364b4c1e070419aafc64c58d03ffabc.py [31mF[0m[32m.[0m[33ms[0m[32m.[0m[32m.[0m[32m.[0m[31m                                                 [100%][0m

[31m[1m_______________________________________ test_add_full_name ________________________________________[0m

sample_df =    encounter_id  patient_nbr             race  gender      age weight  \
0       2278392      8222157        Caucasian...        NO          Male Caucasian  
4      Ch          Yes         NO          Male Caucasian  

[5 rows x 51 columns]

    [0m[94mdef[39;49;00m [92mtest_add_full_name[39;49;00m(sample_df):[90m[39;49;00m
        new_data_frame = add_full_name(sample_df)[90m[39;49;00m
        expected_full_names = [[90m[39;49;00m
            [33m'[39;49;00m[33mFemale AfricanAmerican[39;49;00m[33m'[39;49;00m,[90m[39;49;00m

<ExitCode.TESTS_FAILED: 1>