Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect time grid in urine24 #61

Open
prockenschaub opened this issue Apr 11, 2024 · 0 comments
Open

Incorrect time grid in urine24 #61

prockenschaub opened this issue Apr 11, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@prockenschaub
Copy link
Collaborator

urine24 maps all observed urine output onto a grid of interval-sized steps. If there was no output observed for a given interval period, the function uses fill_gaps to fill in those periods. Unfortunately, fill_gaps does not account for win_tbl and produces an incorrect result. See the following reprex.

library(ricu)
#> ℹ Loading ricu
#> 
#> ── ricu 0.6.0 ──────────────────────────────────────────────────────────────────
#> 
#> The following data sources are configured to be attached:
#> (the environment variable `RICU_SRC_LOAD` controls this)
#> 
#> ✔ mimic: 26 of 26 tables available
#> ✔ mimic_demo: 25 of 25 tables available
#> ✔ eicu: 31 of 31 tables available
#> ✔ eicu_demo: 31 of 31 tables available
#> ✔ hirid: 5 of 5 tables available
#> ✔ aumc: 7 of 7 tables available
#> ✔ miiv: 31 of 31 tables available
#> ✖ sic: 7 of 8 tables available
#> 
#> ────────────────────────────────────────────────────────────────────────────────

res = load_concepts("urine", src = "mimic_demo")
#> ── Loading 1 concept ───────────────────────────────────────────────────────────
#> • urine  ◯ removed 20 (0.22%) of rows due to `NA` values  ◯ removed 1 (0.01%) of rows due to out of range entries  ◯ not all units are in [mL]: NA (0.08%)
#> ────────────────────────────────────────────────────────────────────────────────

# Code as currently used in  `urine24` function --------------------------
limits = collapse(res)
class(limits) # <-- is win_tbl
#> [1] "win_tbl"    "ts_tbl"     "id_tbl"     "data.table" "data.frame"

# Take the example of patient 204201. Their observations range from hour -1 to 
# hour 14. 
res[icustay_id == 204201]
#> # A `ts_tbl`: 14 ✖ 3
#> # Id var:     `icustay_id`
#> # Units:      `urine` [mL]
#> # Index var:  `charttime` (1 hours)
#>    icustay_id charttime urine
#>         <int> <drtn>    <dbl>
#> 1      204201 -1 hours   1500
#> 2      204201  0 hours   1000
#> 3      204201  1 hours    250
#> 4      204201  2 hours    300
#> 5      204201  4 hours    360
#> 6      204201  5 hours    100
#> 7      204201  6 hours    100
#> 8      204201  7 hours    260
#> 9      204201  9 hours    350
#> 10     204201 10 hours     60
#> 11     204201 11 hours    160
#> 12     204201 12 hours    160
#> 13     204201 13 hours    160
#> 14     204201 14 hours    200

# In limits, this is encoded as a start of observation at hour -1 and a duration of 
# observation of 15 hours (-1 + 15 = 14). This duration is confusingly named "end".
limits[icustay_id == 204201]
#> # A `win_tbl`:  1 ✖ 3
#> # Id var:       `icustay_id`
#> # Index var:    `start` (1 hours)
#> # Duration var: `end`
#>   icustay_id start    end
#>        <int> <drtn>   <drtn>
#> 1     204201 -1 hours 15 hours

# However, in `fill_gaps`, end is not interpreted as a duration but as an 
# absolute point in time (as can be seen in the artificially added row at charttime == 15).
filled_cur = fill_gaps(res, limits = limits)
filled_cur[icustay_id == 204201]
#> # A `ts_tbl`: 17 ✖ 3
#> # Id var:     `icustay_id`
#> # Units:      `urine` [mL]
#> # Index var:  `charttime` (1 hours)
#>    icustay_id charttime urine
#>         <int> <drtn>    <dbl>
#> 1      204201 -1 hours   1500
#> 2      204201  0 hours   1000
#> 3      204201  1 hours    250
#> 4      204201  2 hours    300
#> 5      204201  3 hours     NA
#> 6      204201  4 hours    360
#> 7      204201  5 hours    100
#> 8      204201  6 hours    100
#> 9      204201  7 hours    260
#> 10     204201  8 hours     NA
#> 11     204201  9 hours    350
#> 12     204201 10 hours     60
#> 13     204201 11 hours    160
#> 14     204201 12 hours    160
#> 15     204201 13 hours    160
#> 16     204201 14 hours    200
#> 17     204201 15 hours     NA    <--- this row was artificially added by fill_gaps



# This should be fixed, which can be done as follows ----------------------

limits = collapse(res, as_win_tbl = FALSE) 
class(limits) # <-- is returned as ts_tbl
#> [1] "id_tbl"     "data.table" "data.frame"
#> attr(,"previous")
#> [1] "ts_tbl"     "id_tbl"     "data.table" "data.frame"
limits = as_ts_tbl(limits, index_var = "start", interval = interval(res)) # <-- turn into ts_tbl

# The `end` column in limits now encodes an absolute time and not a duration
limits[icustay_id == 204201]
#> # A `ts_tbl`: 1 ✖ 3
#> # Id var:     `icustay_id`
#> # Index var:  `start` (1 hours)
#>   icustay_id start    end
#>        <int> <drtn>   <drtn>
#> 1     204201 -1 hours 14 hours

# `fill_gaps` now also works correctly and does not add artificial rows
filled_upd = fill_gaps(res, limits = limits)
filled_upd[icustay_id == 204201]
#> # A `ts_tbl`: 16 ✖ 3
#> # Id var:     `icustay_id`
#> # Units:      `urine` [mL]
#> # Index var:  `charttime` (1 hours)
#>    icustay_id charttime urine
#>         <int> <drtn>    <dbl>
#> 1      204201 -1 hours   1500
#> 2      204201  0 hours   1000
#> 3      204201  1 hours    250
#> 4      204201  2 hours    300
#> 5      204201  3 hours     NA
#> 6      204201  4 hours    360
#> 7      204201  5 hours    100
#> 8      204201  6 hours    100
#> 9      204201  7 hours    260
#> 10     204201  8 hours     NA
#> 11     204201  9 hours    350
#> 12     204201 10 hours     60
#> 13     204201 11 hours    160
#> 14     204201 12 hours    160
#> 15     204201 13 hours    160
#> 16     204201 14 hours    200

Created on 2024-04-11 with reprex v2.1.0

The problem is not just additional rows but also complete omission of patients. This happens if observation starts late and duration of observation is short, which makes it look as if the patient was observed backward in time. See patient 228977 for an example.

res[icustay_id == 228977]
#> # A `ts_tbl`: 3 ✖ 3
#> # Id var:     `icustay_id`
#> # Units:      `urine` [mL]
#> # Index var:  `charttime` (1 hours)
#>   icustay_id charttime urine
#>        <int> <drtn>    <dbl>
#> 1     228977 13 hours     25
#> 2     228977 16 hours     25
#> 3     228977 19 hours     25

limits[icustay_id == 228977]
#> # A `win_tbl`:  1 ✖ 3
#> # Id var:       `icustay_id`
#> # Index var:    `start` (1 hours)
#> # Duration var: `end`
#>   icustay_id start    end
#>        <int> <drtn>   <drtn>
#> 1     228977 13 hours 6 hours

filled_cur[icustay_id == 228977]
#> # A `ts_tbl`: 0 ✖ 3
#> # Id var:     `icustay_id`
#> # Units:      `urine` [mL]
#> # Index var:  `charttime` (1 hours)
#> # ℹ 3 variables: icustay_id <int>, charttime <drtn>, urine <dbl>

filled_upd[icustay_id == 228977]
#> # A `ts_tbl`: 7 ✖ 3
#> # Id var:     `icustay_id`
#> # Units:      `urine` [mL]
#> # Index var:  `charttime` (1 hours)
#>   icustay_id charttime urine
#>        <int> <drtn>    <dbl>
#> 1     228977 13 hours     25
#> 2     228977 14 hours     NA
#> 3     228977 15 hours     NA
#> 4     228977 16 hours     25
#> 5     228977 17 hours     NA
#> 6     228977 18 hours     NA
#> 7     228977 19 hours     25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant