# Getting started

---
<h3> <font color = 'maroon'>
    <i> Estimated completion time: 11 minutes </i> </font> </h3>

---

This package is meant to handle patient data. Let's walk through an example of how to use this package
with some toy data since real patient data is probably protected health information.

Once you've installed the package following the instructions in `Installation`, you're ready to get started.
To begin with, we'll import the ``akiFlagger`` module.

### Installation

In [13]:
!pip install akiFlagger
import akiFlagger
print(akiFlagger.__version__)

# https://github.com/isaranwrap/StandardizingAKI/blob/master/PyPkg/src/akiFlagger.py

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
1.0.10


### Imports

In [14]:
from akiFlagger import AKIFlagger, generate_toy_data

### Let's start off by creating some toy data. 

The flagger comes with a built-in generator of a toy dataset to demonstrate how it works. Simply call the `generate_toy_data()` function. By default, the toy dataset has 100 patients, but let's initialize ours with 1000 patients.

In [15]:
toy = generate_toy_data(num_patients=1000)
print('Toy dataset shape: {}'.format(toy.shape))
toy.head()

Successfully generated toy data!

Toy dataset shape: (9115, 4)


Unnamed: 0,patient_id,inpatient,time,creatinine
0,12732,False,2020-05-25 10:52:19,1.24
1,12732,False,2020-05-25 22:52:19,1.52
2,12732,False,2020-05-27 22:52:19,1.03
3,12732,False,2020-05-28 16:52:19,1.0
4,12732,True,2020-05-29 16:52:19,1.08


### Tip!
------------
In order to calculate AKI, the flagger expects a dataset with certain columns in it. Depending on the type of computation you are interested in, your dataset will need to have different columns. Here's a brief rundown of the necessary columns. 

* *Rolling-window*: **patient_id**, **inpatient/outpatient**, **creatinine**, and **time** 

    
* *Historical Baseline*: **patient_id**, **inpatient/outpatient**, **creatinine**, and **time**


* *eGFR-imputed baseline creatinine*: **age** and **sex** (female or not).

------------
By default, the naming system is as follows:

<h3 align='center'>
    <span style="color:#eb726f">

**patient_id &#8594; 'mrn'** <p>

**inpatient/outpatient &#8594; 'inpatient'** <p>
    
<~~!~~> **admission &#8594; 'admission'** <p>

**creatinine &#8594; 'creatinine'** <p>
    
**time &#8594; 'time'** <p>
    </span>
    <hr>
</h3> 

If you have different names for your columns, you **_must_ specify them.** The toy dataset's name for `creatinine` is *'creat'* so you can see where in the flagger the alternate name is specified.

### Example: Rolling-window

------------

The next code block runs the flagger and returns those patients who satisfy the AKI conditions according to the [KDIGO guidelines](https://kdigo.org/guidelines/) for change in creatinine values<font color = 'purple'>*</font> by the rolling-window definition, categorized as follows:


*Stage 1:* $(1)$ $50\% \uparrow$ in creatinine in $ \le 7 $ days OR $(2)$ $0.3\, mg/dL \uparrow $  in creatinine in $ \le 48$ hours

*Stage 2:* $100\% \uparrow$ (or doubling of) in creatinine in $ \le 7 $ days

*Stage 3:* $200\% \uparrow$ (or tripling of) in creatinine in $ \le 7 $ days

[comment]: <> (<font color = 'purple'> *Except for the automatic stage 3 from creatinine > 4.0 mg/dL</font>)

In [16]:
flagger = AKIFlagger(rolling_window = True, creatinine = 'creatinine')
out = flagger.returnAKIpatients(toy)
out = out[['inpatient', 'creatinine', 'aki']] # Select the relevant columns
out.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,inpatient,creatinine,aki
patient_id,time,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
12732,2020-05-25 10:52:19,False,1.24,0
12732,2020-05-25 22:52:19,False,1.52,0
12732,2020-05-27 22:52:19,False,1.03,0
12732,2020-05-28 16:52:19,False,1.0,0
12732,2020-05-29 16:52:19,True,1.08,0


**Note:** When initializing the flagger we specify the AKI-calculation method we are interested in (`rolling_window`) as well as the name for our creatinine column (`creat`) which didn't match the default string of `creatinine`. 

In [17]:
aki_counts = out.aki.value_counts()
print('AKI counts')
print('----------')
print('No AKI: {}\nStage 1: {}\nStage 2: {}\nStage 3: {}'.format(aki_counts[0], aki_counts[1], aki_counts[2], aki_counts[3]))

AKI counts
----------
No AKI: 5592
Stage 1: 2082
Stage 2: 858
Stage 3: 583


### Example: Historical Baseline Trumping

------------

Next, we'll run the flagger to calculate a baseline creatinine value using the **_median_ outpatient creatinine values from 365 to 7 days prior to admission**. Then, we'll run the same KDIGO criterion (except for the 0.3 increase) comparing the creatinine value to baseline creatinine.

In [18]:
flagger = AKIFlagger(HB_trumping = True)
out = flagger.returnAKIpatients(toy)
out.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,inpatient,creatinine,aki
patient_id,time,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
12732,2020-05-25 10:52:19,False,1.24,0
12732,2020-05-25 22:52:19,False,1.52,0
12732,2020-05-27 22:52:19,False,1.03,0
12732,2020-05-28 16:52:19,False,1.0,0
12732,2020-05-29 16:52:19,True,1.08,0


### eGFR-based imputation of baseline creatinine
----
By default the toy dataset only has patient values $\pm$ 5 days from the admission date, and because the baseline creatinine value calculates using values from 365 to 7 days prior, you'll notice that it didn't flag a single row as having AKI. Normally, of course, patients won't have times restricted to just $\pm$ 5 days, but this is a good opportunity to showcase one of the flagger features: the **eGFR-based imputation of baseline creatinine**.



The [CKD-EPI equation](https://www.niddk.nih.gov/health-information/professionals/clinical-tools-patient-management/kidney-disease/laboratory-evaluation/glomerular-filtration-rate/estimating) shown below developed via spline analysis by *Inker et al, 2021* is current equation used to estimate GFR using endogenous markers such as serum creatinine and cystatin C. The full article including descriptions of the derivation analysis can be found [here](https://www.nejm.org/doi/full/10.1056/NEJMoa2102953).

\begin{equation}
GFR = 142 \times min(S_{cr} / \kappa, 1)^{\alpha} \times max(S_{cr} / \kappa, 1)^{-1.200} \times 0.9938^{Age} \times (1 + 0.012 f)
\end{equation}
where:

- $GFR$ $(\frac{mL/min}{1.73m^2})$ is the glomerular filtration rate
- $S_{cr}$ $(\frac{mg}{dL})$ is the serum creatinine
- $\kappa$ (unitless) is 0.7 for females and 0.9 for males
- $\alpha$ (unitless) is -0.241 for females and -0.302 for males
- $f$ is 1 if female, 0 if male
- Age in years

When baseline creatinine is missing, as recommended by the [ADQI](https://ccforum.biomedcentral.com/articles/10.1186/cc2872) (Acute Dialysis Quality Initiative) workgroup, the baseline creatinine can be estimated assuming an eGFR of 75 ml/min per 1.72$m^2$. As such, based on the above equation we can assume a GFR of 75 and use the age and sex to determine an estimate for the baseline creatinine. Theory aside, simply pass `eGFR_impute = True` into the flagger and this will add values where the patient was missing outpatient values prior to admission.

**Note:** The toy dataset doesn't come with demographic information by default, but simply passing `include_demographic_info=True` adds in the age and sex columns. We need to specify that sex is female in the flagger as well. 

In [19]:
toy = generate_toy_data(num_patients=100, include_demographic_info = True)
toy.head()

Successfully generated toy data!



Unnamed: 0,patient_id,age,female,inpatient,time,creatinine
0,12732,64.5,True,False,2020-02-24 17:42:42,1.45
1,12732,64.5,True,False,2020-02-25 11:42:42,1.59
2,12732,64.5,True,False,2020-02-26 05:42:42,1.46
3,12732,64.5,True,False,2020-02-26 11:42:42,1.33
4,12732,64.5,True,True,2020-02-29 05:42:42,1.52


In [20]:
flagger = AKIFlagger(HB_trumping = True, #back_calculate = True,
                     eGFR_impute = True, sex = 'female')
out = flagger.returnAKIpatients(toy)
out = out[['inpatient', 'age', 'female', 'creatinine', 'aki']] # This just orders the columns to match the initial order
out.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,inpatient,age,female,creatinine,aki
patient_id,time,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
12732,2020-02-24 17:42:42,False,64.5,True,1.45,0
12732,2020-02-25 11:42:42,False,64.5,True,1.59,0
12732,2020-02-26 05:42:42,False,64.5,True,1.46,0
12732,2020-02-26 11:42:42,False,64.5,True,1.33,0
12732,2020-02-29 05:42:42,True,64.5,True,1.52,1


## Additional features & common use cases
---
That about does it! For most use cases, you will just need to specify the AKI definition methodology (_i.e._, `rolling_window` or `HB_trumping` or `eGFR_impute`) and the AKI-column will be returned. There are a slew of other features, some of which are listed below. For a full listing of the features and appropriate use cases, see the `Documentation` at [akiflagger.readthedocs.io](https://akiflagger.readthedocs.io/en/latest/).

---

<h3> $\rightarrow$ Adding  padding to the rolling window (52 hour & 172 hour windows, instead, for example)  </h3>

It's often the case that you want to add some padding to the window to account for variations occurring on the floor. The parameters `pad1time` and `pad2time` allow you to add just this padding to the initial windows of 48 and 172 hours. In fact, if you wanted a window of 36 hours, you could even set `pad1time = '-12hours'`; this is one way in which you could modify the rolling window. 

<h3> $\rightarrow$ Working with different column names</h3>

As an additional example, the patient identifier will often come in as *'PAT_MRN_ID'* or *'PAT_ENC_CSN_ID'* (or something of the sort) if it is coming from a typical clinical data warehouse/repository. Accordingly, these should be passed in as options to the flagger. 

<h3> $\rightarrow$ Adding in rolling-window minimum creatinines  </h3>

To add in the baseline creatinine, simply pass the flag `add_baseline_creat = True` to the flagger. Note that the baseline creatinine is not defined for outpatient measurements. Baseline creatinine can be thought of as the "resting" creatinine before coming into the hospital, so it doesn't make much sense to define the baseline creatinine outside of a hospital visit. 

<h3> $\rightarrow$ Adding in baseline creatinine  </h3>

To add in the baseline creatinine, simply pass the flag `add_baseline_creat = True` to the flagger. Note that the baseline creatinine is not defined for outpatient measurements. Baseline creatinine can be thought of as the "resting" creatinine before coming into the hospital, so it doesn't make much sense to define the baseline creatinine outside of a hospital visit. 

<h3> $\rightarrow$ Bare-bones dataset  </h3>

As stated above, the bare minimum columns necessary for the flagger to run are the **patient_id, inpatient/outpatient, time,** and **creatinine**. In this case, any other columns used in intermediate steps will be imputed (admission, for example).


In [21]:
# Example 0: Adding 4-hour padding to windows
padding = '4hours'
flagger = AKIFlagger(RM_window = True, padding=padding)
example0 = flagger.returnAKIpatients(toy)
example0[example0.aki > 0].head(3)

Unnamed: 0_level_0,Unnamed: 1_level_0,age,female,inpatient,creatinine,aki
patient_id,time,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
19845,2020-05-13 18:02:54,57.4,True,True,1.01,1
19845,2020-05-15 18:02:54,57.4,True,True,0.89,1
13264,2020-01-10 05:16:57,62.6,False,False,0.48,1


In [22]:
# Example 1: Working with different column names 
dataframe = toy.rename(columns = {'patient_id': 'PAT_MRN_ID', 'creatinine':'CREATININE',
                                  'age': 'AGE', 'female': 'SEX', 'inpatient': 'INPATIENT',
                                  'admission': 'ADMISSION', 'time': 'TIME'})
flagger = AKIFlagger(rolling_window = True, patient_id = 'PAT_MRN_ID', encounter_id = 'PAT_ENC_CSN_ID', 
                     inpatient = 'INPATIENT', admission = 'ADMISSION', time = 'TIME', creatinine = 'CREATININE')
example1 = flagger.returnAKIpatients(dataframe)

In [23]:
# Example 2: Adding in rolling-window minima
flagger = AKIFlagger(rolling_window = True, add_min_creat = True)
example2 = flagger.returnAKIpatients(toy)
example2.head(3)

Unnamed: 0_level_0,Unnamed: 1_level_0,age,female,inpatient,creatinine,min_creat52,min_creat172,aki
patient_id,time,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
12732,2020-02-24 17:42:42,64.5,True,False,1.45,1.45,1.45,0
12732,2020-02-25 11:42:42,64.5,True,False,1.59,1.45,1.45,0
12732,2020-02-26 05:42:42,64.5,True,False,1.46,1.45,1.45,0


In [25]:
# Example 3: Bare-bones dataset
barebones = toy.loc[:,['patient_id', 'inpatient', 'time', 'creatinine']]
print('Barebones head:')
print(barebones.head())
flagger = AKIFlagger(rolling_window = True)
example4 = flagger.returnAKIpatients(barebones)
example4[example4.aki > 0].head(3)

Barebones head:
   patient_id  inpatient                time  creatinine
0       12732      False 2020-02-24 17:42:42        1.45
1       12732      False 2020-02-25 11:42:42        1.59
2       12732      False 2020-02-26 05:42:42        1.46
3       12732      False 2020-02-26 11:42:42        1.33
4       12732       True 2020-02-29 05:42:42        1.52


Unnamed: 0_level_0,Unnamed: 1_level_0,inpatient,creatinine,aki
patient_id,time,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
19845,2020-05-13 18:02:54,True,1.01,1
19845,2020-05-15 18:02:54,True,0.89,1
13264,2020-01-10 05:16:57,False,0.48,1
