-
Notifications
You must be signed in to change notification settings - Fork 1
/
GettingStarted.rst
304 lines (176 loc) · 12.2 KB
/
GettingStarted.rst
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
Getting started
===============
This package is meant to handle patient data. Let's walk through an example of how to use this package
with some toy data since real patient data is probably protected health information.
Once you've installed the package following the instructions in `Installation`, you're ready to get started.
To begin with, we'll import the ``akiFlagger`` module.
.. option:: Python
.. code-block:: python
import akiFlagger
print(akiFlagger.__version__)
from akiFlagger import AKIFlagger, generate_toy_data
>> '1.0.8.0'
.. option:: R
.. code-block:: R
library(akiFlagger)
?returnAKIpatients
> ℹ Rendering development documentation for 'returnAKIpatients'
Let's start off by creating some toy data.
------------------------------------------
.. option:: Python
The flagger comes with a built-in generator of a toy dataset to demonstrate how it works. Simply call the `generate_toy_data()` function. By default, the toy dataset has 100 patients, but let's initialize ours with 1000 patients.
.. code-block:: python
toy = generate_toy_data(num_patients=1000)
print('Toy dataset shape: {}'.format(toy.shape))
>> Successfully generated toy data!
Toy dataset shape: (9094, 6)
The toy dataset comes with columns for the patient identifier, whether the measurement was taken in an inpatient or outpatient setting, the creatinine measurement and time at which the measurement was taken. ``toy.head()`` should yield something like this:
.. csv-table::
:file: ../doc_csvs/python/toy_head.csv
.. option:: R
The R package comes with a built-in dataset, `toy`. The toy dataset comes with columns for the patient identifier, inpatient, the creatinine measurement and the time at which the measurement was taken. ``head(toy)`` should yield something like this:
.. csv-table::
:file: ../doc_csvs/r/toy_headR.csv
.. admonition:: Tip!
In order to calculate AKI, the flagger expects a dataset with certain columns in it. Depending on the type of computation you are interested in, your dataset will need to have different columns. Here's a brief rundown of the necessary columns.
* *Rolling Minimum Window*: **patient_id**, **inpatient**, **time**, and **creatinine**
* *Historical Baseline Trumping*: **patient_id**, **inpatient**, **time**, and **creatinine**
* *Baseline Creatinine Imputation*: **age** and **sex** (which defaults to female).
------------
By default, the naming system is as follows:
**patient_id → 'patient_id'**
**inpatient/outpatient → 'inpatient'**
**creatinine → 'creatinine'**
**time → 'time'**
If you have different names for your columns, you **must specify them.**
Example: Rolling Minimum Window
-------------------------------
The next code block runs the flagger and returns those patients who satisfy the AKI conditions according to the `KDIGO guidelines <https://kdigo.org/guidelines/>`_ for change in creatinine values by the rolling-window definition, categorized as follows:
*Stage 1:* **(1)** 50% ↑ in creatinine in <= 7 days OR **(2)** 0.3 mg/dL ↑ in creatinine in <= 48 hours
*Stage 2:* 100\% ↑ (or doubling of) in creatinine in <= 7 days
*Stage 3:* 200\% ↑ (or tripling of) in creatinine in <= 7 days
.. option:: Python
.. code-block:: python
flagger = AKIFlagger()
out = flagger.returnAKIpatients(toy)
out = out.reset_index() # By default, the returned output has the patient_id and time as hierarchical indices
out.head()
We can take a look at what our dataframe looks like. ``out.head()`` yields this:
.. csv-table::
:file: ../doc_csvs/python/rw_out.csv
Notice that the dataframe looks exactly the same as we inputted into the flagger save an extra column added, `aki`. This column has values of either 0, 1, 2, or 3, depending on which stage AKI the flagger found.
The flagger runs on a row-wise basis, meaning that each row is checked for the increase in creatinine. Should, for example, a patient meet the criterion multiple times within a single encounter, the flagger will flag each measurement as a case of AKI.
.. warning::
The column names specified within the flagger should match the dataset exactly. The full list of acceptable names can be found
in the *returnAKIpatients()* function in the :ref:`genindex` section. For certain cases, the flagger understands special names. For example,
`sex = 'male'` will autoconvert the sex column from female to male. But you still need to have a column named `male` in your data frame, otherwise an error will occur.
We can take a look at what the flagger flagged as AKI. ``out[out.aki > 0].head()`` should give a list of some patients which were flagged. From that, we can subset the dataset on any given patient:
.. code-block:: python
out[out.aki > 0].head() # this will give the rows which were marked as AKI by the flagger
out[out.patient_id == 19845] # from that, we can find which patients were flagged with AKI
.. csv-table::
:file: ../doc_csvs/python/rw_flagged.csv
Notice how as we would expect, when the creatinine more than tripled from 0.34 to 1.12, the flagger correctly identified it as Stage 3 AKI.
You can even look at aggregate counts if you wanted as follows (but don't take the numbers too seriously, of course, because this is toy data):
.. code-block:: python
aki_counts = out.aki.value_counts()
print('AKI counts')
print('----------')
print('No AKI: {}\nStage 1: {}\nStage 2: {}\nStage 3: {}'.format(aki_counts[0], aki_counts[1], aki_counts[2], aki_counts[3]))
>> AKI counts
----------
No AKI: 571
Stage 1: 211
Stage 2: 99
Stage 3: 70
You can play around with the output of the ``returnAKIpatients()`` function in-depth to get a better understanding of how the flagger is operating. There are even optional parameters such as ``add_min_creat = True`` within the flagger which includes some of the intermediate steps the flagger is generating along to calculate AKI.
Next, we'll take a look at an example of the other AKI-calculation method, the back-calculation method.
.. option:: R
.. code-block:: R
library(akiFlagger)
out <- returnAKIpatients(toy)
head(out)
We can take a look at what the flagger returns. ``head(out)`` should return:
.. csv-table::
:file: ../doc_csvs/r/rw_out.csv
Notice that the dataframe looks exactly the same as we inputted into the flagger save an extra column added, `aki`. This column has values of either 0, 1, 2, or 3, depending on which stage AKI the flagger found.
The flagger runs on a row-wise basis, meaning that each row is checked for the increase in creatinine. Should, for example, a patient meet the criterion multiple times within a single encounter, the flagger will flag each measurement as a case of AKI.
.. warning::
The patient dataset you input should have minimally these columns: ``patient_id``, ``inpatient``, ``time``, and ``creatinine``. If you are interested in demographic-based imputation,
you'll also want to include the ``age`` and ``sex`` columns.
We can take a look at what the flagger flagged as AKI. ``head(out[out$aki > 0])`` should give a list of some patients which were flagged. From that, we can subset the dataset on any given patient:
.. code-block:: R
head(out[out$aki > 0])
out[out$patient_id == 13264]
.. csv-table::
:file: ../doc_csvs/r/rw_flagged.csv
Notice how as we would expect, when the creatinine more than tripled from 0.1 to 0.72, the flagger correctly identified it as Stage 3 AKI. Additionally,
row 11 was flagged as stage 1 because that was a greater than 50% increase from 0.27 and row 12 was flagged because it was a greater than 100% increase from 0.27. Even though
the flagger is performing a row-wise computation, it is comparing the current creatinine value with the minimum in the past ``window1`` hours (defaults to 48 hours).
You can look at aggregate counts if you wanted as follows (but don't take the numbers too seriously, of course, because this is toy data):
.. code-block:: R
table(out$aki)
>> 0 1 2 3
1001 44 19 14
Example: Historical Baseline Trumping
-------------------------------------
Next, we'll run the flagger to "back-calculate" AKI; that is, using the **median outpatient creatinine values from 365 to 7 days prior to admission** to impute a baseline creatinine value. Then, we'll run the same KDIGO criterion (except for the 0.3 increase) comparing the creatinine value to baseline creatinine.
.. option:: Python
.. code-block:: python
flagger = AKIFlagger(HB_trumping = True, add_baseline_creat = True)
out = flagger.returnAKIpatients(toy)
out.head()
.. csv-table::
:file: ../doc_csvs/python/bc_out.csv
.. option:: R
.. code-block:: R
out <- returnAKIpatients(toy, HB_trumping = T, add_baseline_creat = T)
head(out)
.. csv-table::
:file: ../doc_csvs/r/bc_out.csv
Actually, by default the toy dataset only has patient values :math:`\pm` 5 days from the admission date, and because the baseline creatinine value calculates using values from 365 to 7 days prior, you'll notice that the flagger reverts to the rolling window definition.
This is important: in the absence of available baseline creatinine values, the flagger defaults to a rolling minimum comparison. Indeed, most of the checking for AKI occurs outside of period of hospitalization.
Normally, of course, patients won't have times restricted to just :math:`\pm` 5 days, but this is a good opportunity to showcase one of the flagger features: the **eGFR-based imputation of baseline creatinine**.
The following equation is known as the `CKD-EPI equation <https://www.kidney.org/content/ckd-epi-creatinine-equation-2021>`_ .
.. math::
\begin{equation}
GFR = 142 \times min(S_{cr} / \kappa, 1)^{\alpha} \times max(S_{cr} / \kappa, 1)^{-1.200} \times 0.9938^{Age} \times (1 + 0.012 f)
\end{equation}
where:
- :math:`GFR` :math:`(\frac{mL/min}{1.73m^2})` is the glomerular filtration rate
- :math:`S_{cr}` :math:`(\frac{mg}{dL})` is the serum creatinine
- :math:`\kappa` (unitless) is 0.7 for females and 0.9 for males
- :math:`\alpha` (unitless) is -0.241 for females and -0.302 for males
- :math:`f` is 1 if female, 0 if male
The idea is as follows: based on the above equation, we assume a GFR of 75 and then use the age and sex of the patient to determine an estimate for the baseline creatinine. Theory aside, simply pass ``eGFR_impute = True`` into the flagger and this will add values where the patient was missing outpatient values 365 to 7 days prior to admission.
.. option:: Python
**Note:** The toy dataset doesn't come with demographic information by default, but simply passing ``include_demographic_info = True`` adds in a column for the age and sex variables.
.. code-block:: python
toy = generate_toy_data(num_patients=100, include_demographic_info = True)
toy.head()
.. csv-table::
:file: ../doc_csvs/python/toy_demo.csv
.. code-block:: python
flagger = AKIFlagger(HB_trumping = True, eGFR_impute = True, add_baseline_creat = True,
sex = 'female')
out = flagger.returnAKIpatients(toy)
out.head()
.. csv-table::
:file: ../doc_csvs/python/egfr_out.csv
.. option:: R
There are actually two toy datasets that come with the packages: ``toy`` and ``toy.demo``. ``toy.demo`` is the toy dataframe with demographic information added in. As such, all we have to do is run
.. code-block:: R
out <- returnAKIpatients(toy.demo, HB_trumping = T, eGFR_impute = F)
head(out)
.. csv-table::
:file: ../doc_csvs/r/egfr_out.csv
That about does it for the basics! There are a slew of other features, some of which are listed in the `Additional Features` section. For a full listing of the features and appropriate use cases, see the `Documentation` at `akiflagger.readthedocs.io <https://akiflagger.readthedocs.io/en/latest/>`_.
Example: Baseline Creatinine Imputation
---------------------------------------
.. option:: Python
.. code-block:: python
flagger = AKIFlagger(HB_trumping = True, eGFR_impute = True, add_baseline_creat = True)
out = flagger.returnAKIpatients(toy)
.. option:: R
.. code-block:: R
out <- returnAKIpatients(toy.demo, HB_trumping = T, eGFR_impute = T)