# Getting started

---
<h3> <font color = 'maroon'>
    <i> Estimated completion time: 10 minutes </i> </font> </h3>

---

This package is meant to handle patient data. Let's walk through an example of how to use this package
with some toy data since real patient data is probably protected health information.

Once you've installed the package following the instructions in `Installation`, you're ready to get started.
To begin with, we'll import the ``akiFlagger`` module as well as the trifecta ``pandas``, ``numpy``, and ``matplotlib``.

### Installation

In [10]:
!pip install akiFlagger



### Imports

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import akiFlagger
print(akiFlagger.__version__)
from akiFlagger import AKIFlagger, generate_toy_data

0.1.0


### Let's start off by creating some toy data. 

The flagger comes with a built-in generator of a toy dataset to demonstrate how it works. Simply call the `generate_toy_data()` function. By default, the toy dataset has 100 patients, but let's initialize ours with 1000 patients.

In [3]:
toy = generate_toy_data(num_patients=1000)
print('Toy dataset shape: {}'.format(toy.shape))
toy.head()

Successfully generated toy data!

Toy dataset shape: (9111, 6)


Unnamed: 0,mrn,enc,inpatient,admission,time,creat
0,12732,41002,False,2020-05-29 16:52:19,2020-05-24 16:52:19,1.523256
1,12732,30477,False,2020-05-29 16:52:19,2020-05-24 22:52:19,0.812085
2,12732,30477,False,2020-05-29 16:52:19,2020-05-25 04:52:19,1.093111
3,12732,41002,False,2020-05-29 16:52:19,2020-05-25 16:52:19,0.98967
4,12732,30477,False,2020-05-29 16:52:19,2020-05-25 22:52:19,1.082264


### Tip!
------------
In order to calculate AKI, the flagger expects a dataset with certain columns in it. Depending on the type of computation you are interested in, your dataset will need to have different columns. Here's a brief rundown of the necessary columns. 

* *Rolling-window*: **patient_id**, **inpatient/outpatient**, **time**, and **creatinine** 

    
* *Back-calculate*: **patient_id**, **inpatient/outpatient**, **time**, and **creatinine**


* *eGFR-imputed baseline creatinine*: **age**, **sex** (female or not), and **race** (black or not).

------------
By default, the naming system is as follows:

<h3 align='center'>
    <span style="color:#eb726f">

**patient_id &#8594; 'mrn'** <p>
    
**encounter_id &#8594; 'enc'** <p>

**inpatient/outpatient &#8594; 'inpatient'** <p>
    
**admission &#8594; 'admission'** <p>

**creatinine &#8594; 'creatinine'** <p>
    
**time &#8594; 'time'** <p>
    </span>
    <hr>
</h3> 

If you have different names for your columns, you **_must_ specify them.** The toy dataset's name for `creatinine` is *'creat'* so you can see where in the flagger the alternate name is specified.

### Example: Rolling-window

------------

The next code block runs the flagger and returns those patients who satisfy the AKI conditions according to the KDIGO guidelines by the rolling-window definition, categorized as follows:


*Stage 1:* $(1)$ $50\% \uparrow$ in creatinine in $ < 7 $ days OR $(2)$ $0.3 \uparrow $  in creatinine in $ < 48$ hours

*Stage 2:* $100\% \uparrow$ (or doubling of) in creatinine in $ < 7 $ days

*Stage 2:* $200\% \uparrow$ (or tripling of) in creatinine in $ < 7 $ days

In [4]:
%%time
flagger = AKIFlagger(rolling_window = True, creatinine = 'creat')
rw = flagger.returnAKIpatients(toy)
rw = rw[['mrn', 'enc', 'inpatient', 'admission', 'time', 'creat', 'rw']] # This just orders the columns to match the initial order
rw.head()

CPU times: user 5.93 s, sys: 48.1 ms, total: 5.98 s
Wall time: 6.05 s


Unnamed: 0,mrn,enc,inpatient,admission,time,creat,rw
0,12732,41002,False,2020-05-29 16:52:19,2020-05-24 16:52:19,1.523256,0
1,12732,30477,False,2020-05-29 16:52:19,2020-05-24 22:52:19,0.812085,0
2,12732,30477,False,2020-05-29 16:52:19,2020-05-25 04:52:19,1.093111,0
3,12732,41002,False,2020-05-29 16:52:19,2020-05-25 16:52:19,0.98967,0
4,12732,30477,False,2020-05-29 16:52:19,2020-05-25 22:52:19,1.082264,0


**Note:** When initializing the flagger we specify the AKI-calculation method we are interested in (`rolling_window`) as well as the name for our creatinine column (`creat`) which didn't match the default string of `creatinine`. 

In [5]:
aki_counts = rw.rw.value_counts()
print('AKI counts')
print('----------')
print('No AKI: {}\nStage 1: {}\nStage 2: {}\nStage 3: {}'.format(aki_counts[0], aki_counts[1], aki_counts[2], aki_counts[3]))

AKI counts
----------
No AKI: 6528
Stage 1: 1576
Stage 2: 609
Stage 3: 398


### Example: Back-calculation

------------

Next, we'll run the flagger to "back-calculate" AKI; that is, using the **_median_ outpatient creatinine values from 365 to 7 days prior to admission** to impute a baseline creatinine value. Then, we'll run the same KDIGO criterion (except for the 0.3 increase) comparing the creatinine value to baseline creatinine.

In [6]:
%%time
flagger = AKIFlagger(back_calculate = True, creatinine = 'creat')
bc = flagger.returnAKIpatients(toy)
bc.head()

CPU times: user 2.24 s, sys: 15.9 ms, total: 2.26 s
Wall time: 2.27 s


Unnamed: 0,enc,time,mrn,inpatient,admission,creat,bc
0,41002,2020-05-24 16:52:19,12732,False,2020-05-29 16:52:19,1.523256,False
1,30477,2020-05-24 22:52:19,12732,False,2020-05-29 16:52:19,0.812085,False
2,30477,2020-05-25 04:52:19,12732,False,2020-05-29 16:52:19,1.093111,False
3,41002,2020-05-25 16:52:19,12732,False,2020-05-29 16:52:19,0.98967,False
4,30477,2020-05-25 22:52:19,12732,False,2020-05-29 16:52:19,1.082264,False


### eGFR-based imputation of baseline creatinine
----
Actually, by default the toy dataset only has patient values $\pm$ 5 days from the admission date, and because the baseline creatinine value calculates using values from 365 to 7 days prior, you'll notice that it didn't flag a single row as having AKI. Normally, of course, patients won't have times restricted to just $\pm$ 5 days, but this is a good opportunity to showcase one of the flagger features: the **eGFR-based imputation of baseline creatinine**.


\begin{equation}
GFR = 141 \times min(S_{cr} / \kappa, 1)^{\alpha} \times max(S_{cr} / \kappa, 1)^{-1.209} \times 0.993^{Age} \times (1 + 0.018 f) \times ( 1 + 0.159 b)
\end{equation}
where:

- $GFR$ $(\frac{mL/min}{1.73m^2})$ is the glomerular filtration rate
- $S_{cr}$ $(\frac{mg}{dL})$ is the serum creatinine
- $\kappa$ (unitless) is 0.7 for females and 0.9 for males
- $\alpha$ (unitless) is -0.329 for females and -0.411 for males
- $f$ is 1 if female, 0 if male
- $b$ is 1 if black, 0 if another race

The following equation is known as the [CKD-EPI equation](https://www.niddk.nih.gov/health-information/professionals/clinical-tools-patient-management/kidney-disease/laboratory-evaluation/glomerular-filtration-rate/estimating); developed via spline analysis by *Levey et. Al, 2009*. The full paper, along with the derived constants, can be found [here](https://pubmed.ncbi.nlm.nih.gov/19414839/).

The idea is as follows: based on the above equation, we assume a GFR of 75 and then use the age, sex, and race to determine an estimate for the baseline creatinine. Theory aside, simply pass `eGFR_impute = True` into the flagger and this will add values where the patient was missing outpatient values 365 to 7 days prior to admission.

**Note:** The toy dataset doesn't come with demographic information by default, but simply passing `include_demographic_info=True` adds in the age, race, and sex columns. We need to specify that sex is female & race is black in the flagger as well. 

In [7]:
%%time
toy = generate_toy_data(num_patients=1000, include_demographic_info = True)
flagger = AKIFlagger(back_calculate = True, creatinine = 'creat',
                    eGFR_impute = True, sex = 'female', race = 'black')
bc = flagger.returnAKIpatients(toy)
bc = bc[['mrn', 'enc', 'inpatient', 'admission', 'time', 'creat', 'bc']] # This just orders the columns to match the initial order
bc.head()

Successfully generated toy data!

CPU times: user 3.89 s, sys: 28.7 ms, total: 3.92 s
Wall time: 3.95 s


Unnamed: 0,mrn,enc,inpatient,admission,time,creat,bc
0,12732,41002,False,2020-05-29 16:52:19,2020-05-24 16:52:19,0.542048,False
1,12732,30477,False,2020-05-29 16:52:19,2020-05-25 22:52:19,1.403329,False
2,12732,30477,False,2020-05-29 16:52:19,2020-05-26 16:52:19,1.215593,False
3,12732,30477,False,2020-05-29 16:52:19,2020-05-26 22:52:19,0.789235,False
4,12732,30477,False,2020-05-29 16:52:19,2020-05-27 10:52:19,0.917523,False


## Additional features & common use cases
---
That about does it! For most use cases, you will just need to specify `rolling-window` or `back-calculate` and the AKI-column will be returned. There are a slew of other features, some of which are listed below. For a full listing of the features and appropriate use cases, see the `Documentation` at [akiflagger.readthedocs.io](https://akiflagger.readthedocs.io/en/latest/).

---

<h3> $\rightarrow$ Working with different column names</h3>

As an additional example, the patient identifier will often come in as *'PAT_MRN_ID'* or *'PAT_ENC_CSN_ID'* (or something of the sort) if it is coming from a typical clinical data warehouse/repository. Accordingly, these should be passed in as options to the flagger. 

<h3> $\rightarrow$ Adding in rolling-window minimum creatinines  </h3>

To add in the baseline creatinine, simply pass the flag `add_baseline_creat = True` to the flagger. Note that the baseline creatinine is not defined for outpatient measurements. Baseline creatinine can be thought of as the "resting" creatinine before coming into the hospital, so it doesn't make much sense to define the baseline creatinine outside of a hospital visit. 

<h3> $\rightarrow$ Adding in baseline creatinine  </h3>

To add in the baseline creatinine, simply pass the flag `add_baseline_creat = True` to the flagger. Note that the baseline creatinine is not defined for outpatient measurements. Baseline creatinine can be thought of as the "resting" creatinine before coming into the hospital, so it doesn't make much sense to define the baseline creatinine outside of a hospital visit. 

<h3> $\rightarrow$ Bare-bones dataset  </h3>

As stated above, the bare minimum columns necessary for the flagger to run are the **patient_id, inpatient/outpatient, time,** and **creatinine**. In this case, any other columns used in intermediate steps will be imputed (admission, for example).

In [8]:
# Example 1: Working with different column names 
dataframe = toy.rename(columns = {'mrn': 'PAT_MRN_ID', 'enc': 'PAT_ENC_CSN_ID', 'creat':'CREATININE',
                                  'age': 'AGE', 'female': 'SEX', 'black': 'RACE', 'inpatient': 'INPATIENT',
                                  'admission': 'ADMISSION', 'time': 'TIME'})
flagger = AKIFlagger(rolling_window = True, patient_id = 'PAT_MRN_ID', encounter_id = 'PAT_ENC_CSN_ID', 
                     inpatient = 'INPATIENT', admission = 'ADMISSION', time = 'TIME', creatinine = 'CREATININE')
example1 = flagger.returnAKIpatients(dataframe)

# Example 2: Adding in rolling-window minima
flagger = AKIFlagger(rolling_window = True, creatinine = 'creat', add_min_creat = True)
example2 = flagger.returnAKIpatients(toy)

# Example 3: Adding in baseline creatinine 
flagger = AKIFlagger(rolling_window = True, back_calculate = True, #Specifying both calculation methods
                     patient_id = 'PAT_MRN_ID', encounter_id = 'PAT_ENC_CSN_ID', inpatient = 'INPATIENT', #Specifying col names
                     age = 'AGE', sex = 'SEX', race = 'RACE', time = 'TIME', admission = 'ADMISSION', creatinine = 'CREATININE',#Specifying col names
                     eGFR_impute = True, add_baseline_creat = True) #Specifying additional columns to add
example3 = flagger.returnAKIpatients(dataframe)
example3 = example3[['PAT_MRN_ID', 'PAT_ENC_CSN_ID', 'INPATIENT', 'AGE', 'SEX', 'RACE', 'ADMISSION', 'TIME', 'CREATININE', 'baseline_creat', 'rw', 'bc']]
example3.head(20)

Unnamed: 0,PAT_MRN_ID,PAT_ENC_CSN_ID,INPATIENT,AGE,SEX,RACE,ADMISSION,TIME,CREATININE,baseline_creat,rw,bc
0,19724,10043,False,61.3687,True,False,2020-01-13 05:03:20,2020-01-08 05:03:20,1.628688,,0,False
1,19724,10043,False,61.3687,True,False,2020-01-13 05:03:20,2020-01-09 17:03:20,1.2417,,0,False
2,19724,10043,False,61.3687,True,False,2020-01-13 05:03:20,2020-01-10 23:03:20,1.322671,,0,False
3,19724,10043,False,61.3687,True,False,2020-01-13 05:03:20,2020-01-11 05:03:20,1.482045,,0,False
4,19724,10043,False,61.3687,True,False,2020-01-13 05:03:20,2020-01-11 11:03:20,1.569201,,1,False
5,19724,10043,False,61.3687,True,False,2020-01-13 05:03:20,2020-01-12 11:03:20,1.225677,,0,False
6,19724,10043,False,61.3687,True,False,2020-01-13 05:03:20,2020-01-12 23:03:20,0.739676,0.83833,0,False
7,19724,10043,True,61.3687,True,False,2020-01-13 05:03:20,2020-01-16 23:03:20,0.743025,0.83833,0,False
8,17521,10106,False,54.3445,True,False,2020-05-16 03:04:19,2020-05-13 03:04:19,1.084961,,0,False
9,17521,10106,False,54.3445,True,False,2020-05-16 03:04:19,2020-05-14 21:04:19,0.905758,,0,False


In [9]:
# Example 4: Bare-bones dataset
barebones = toy.loc[:,['mrn', 'inpatient', 'time', 'creat']]
print('The spar:')
print(barebones.head())
flagger = AKIFlagger(rolling_window = True, creatinine = 'creat')
example4 = flagger.returnAKIpatients(barebones)
example4.head(20)

The spar:
     mrn  inpatient                time     creat
0  12732      False 2020-05-24 16:52:19  0.542048
1  12732      False 2020-05-25 22:52:19  1.403329
2  12732      False 2020-05-26 16:52:19  1.215593
3  12732      False 2020-05-26 22:52:19  0.789235
4  12732      False 2020-05-27 10:52:19  0.917523


Unnamed: 0,enc,time,mrn,inpatient,creat,admission,rw
0,1.0,2020-05-24 16:52:19,12732.0,False,0.542048,2020-05-31 10:52:19,0
1,1.0,2020-05-25 22:52:19,12732.0,False,1.403329,2020-05-31 10:52:19,2
2,1.0,2020-05-26 16:52:19,12732.0,False,1.215593,2020-05-31 10:52:19,2
3,1.0,2020-05-26 22:52:19,12732.0,False,0.789235,2020-05-31 10:52:19,0
4,1.0,2020-05-27 10:52:19,12732.0,False,0.917523,2020-05-31 10:52:19,1
5,1.0,2020-05-31 10:52:19,12732.0,True,0.968307,2020-05-31 10:52:19,1
6,1.0,2020-05-31 16:52:19,12732.0,True,0.994704,2020-05-31 10:52:19,0
7,1.0,2020-05-31 22:52:19,12732.0,True,1.162822,2020-05-31 10:52:19,0
8,1.0,2020-06-01 10:52:19,12732.0,True,1.057039,2020-05-31 10:52:19,0
9,1.0,2020-06-02 04:52:19,12732.0,True,1.151936,2020-05-31 10:52:19,0
