Reproducibility gap in ICU pathogen prediction benchmarks (MIMIC-IV) #1154
netanelcyber
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi PyHealth team,
I have been working on an open-source ICU pathogen prediction pipeline using MIMIC-IV:
https://github.com/netanelcyber/PenuX
During development we noticed something that may be relevant for the broader clinical ML community:
Observation
Many ICU prediction pipelines report strong AUROC values, but relatively few evaluate:
In our experiments, relatively small preprocessing choices produced unexpectedly large differences in:
This raises a broader reproducibility question:
Potential contribution ideas
I would be interested in contributing PyHealth-compatible examples for:
Technical direction
Current experiments include combinations of:
Questions for maintainers/community
I would also appreciate feedback from others working on:
Project:
https://github.com/netanelcyber/PenuX
Thanks again for building PyHealth — it has been extremely useful for rapid experimentation in clinical ML.
Beta Was this translation helpful? Give feedback.
All reactions