[ACL 2023] Clinical Note Owns its Hierarchy: Multi-Level Hypergraph Neural Networks for Patient-Level Representation Learning
- CUDA=11.3
- cuDNN=8.2.0
- python=3.9.12
- pandas=1.4.2
- torch=1.11.0
- torch_geometric=2.1.0
We follow MIMIC-III Benchmark (Harutyunyan et al.) for preprocess clinical notes.
The preprocessed NOTEEVENTS data for in-hospital-mortality
should be in data/DATA_RAW/in-hospital-mortality
, divided into two folders (train_note
and test_note
).
python graph_construction/prepare_notes/extract_cleaned_notes.py
python graph_construction/prepare_notes/create_hyper_df.py
extract_cleaned_notes.py
cleans clinical notes in data/DATA_RAW/in-hospital-mortality
, which results in column "Fixed TEXT" in each csv file. Word2vec token embeddings with 100 dimensions are created and saved in data/DATA_RAW/root/word2vec_100
.
create_hyper_df.py
creates dataframe from data/DATA_RAW/in-hospital-mortality
where each row represents each word. The results are stored in data/DATA_PRE/in-hospital-mortality
, divided into two folders (train_hyper
and test_hyper
).
python graph_construction/prepare_notes/PygNotesGraphDataset.py --split train
python graph_construction/prepare_notes/PygNotesGraphDataset.py --split test
PygNotesGraphDataset.py
creates multi-level hypergraphs with cutoff in data/IMDB_HCUT/in-hospital-mortality
.
python tmhgnn/train.py