## Step 1: data processing by pyhealth.datasets

In [1]:
from pyhealth.datasets import eICUDataset
dataset = eICUDataset(
        root="/srv/local/data/physionet.org/files/eicu-crd/2.0",
        tables=["diagnosis", "medication", "physicalExam"],
    )
dataset.stat()

  warn(f"Failed to load image Python extension: {e}")


INFO: Pandarallel will run on 64 workers.
INFO: Pandarallel will use Memory file system to transfer data between the main process and workers.


Parsing patients: 100%|███████████████| 166355/166355 [01:52<00:00, 1474.20it/s]


finish basic patient information parsing : 115.66474342346191s
finish parsing diagnosis : 232.50637197494507s
finish parsing medication : 529.3215832710266s
finish parsing physicalExam : 253.76233100891113s


Mapping codes: 100%|█████████████████| 166355/166355 [00:09<00:00, 17176.55it/s]



Statistics of base dataset (dev=False):
	- Dataset: eICUDataset
	- Number of patients: 166355
	- Number of visits: 200859
	- Number of visits per patient: 1.2074
	- Number of events per visit in diagnosis: 22.6781
	- Number of events per visit in medication: 23.3808
	- Number of events per visit in physicalExam: 45.8646



'\nStatistics of base dataset (dev=False):\n\t- Dataset: eICUDataset\n\t- Number of patients: 166355\n\t- Number of visits: 200859\n\t- Number of visits per patient: 1.2074\n\t- Number of events per visit in diagnosis: 22.6781\n\t- Number of events per visit in medication: 23.3808\n\t- Number of events per visit in physicalExam: 45.8646\n'

## Step 2: task processing by pyhealth.tasks

In [2]:
from pyhealth.tasks import length_of_stay_prediction_eicu_fn
los_dataset = dataset.set_task(length_of_stay_prediction_eicu_fn)
los_dataset.stat()

Generating samples for length_of_stay_prediction_eicu_fn: 100%|█| 166355/166355 


Statistics of sample dataset:
	- Dataset: eICUDataset
	- Task: length_of_stay_prediction_eicu_fn
	- Number of samples: 124564
	- Number of patients: 114473
	- Number of visits: 124564
	- Number of visits per patient: 1.0882
	- conditions:
		- Number of conditions per sample: 6.8110
		- Number of unique conditions: 1670
		- Distribution of conditions (Top-10): [('518.81', 21925), ('J96.00', 21451), ('I10', 17728), ('401.9', 16440), ('584.9', 14759), ('N17.9', 14548), ('486', 13704), ('J18.9', 13697), ('038.9', 12423), ('427.31', 12110)]
	- procedures:
		- Number of procedures per sample: 25.1875
		- Number of unique procedures: 461
		- Distribution of procedures (Top-10): [('notes/Progress Notes/Physical Exam/Physical Exam Obtain Options/Performed - Structured', 121653), ('notes/Progress Notes/Physical Exam/Physical Exam/Neurologic/GCS/Score/scored', 118406), ('notes/Progress Notes/Physical Exam/Physical Exam/Constitutional/Weight and I&O/Weight (kg)/Admission', 107202), ('notes/Progres

"Statistics of sample dataset:\n\t- Dataset: eICUDataset\n\t- Task: length_of_stay_prediction_eicu_fn\n\t- Number of samples: 124564\n\t- Number of patients: 114473\n\t- Number of visits: 124564\n\t- Number of visits per patient: 1.0882\n\t- conditions:\n\t\t- Number of conditions per sample: 6.8110\n\t\t- Number of unique conditions: 1670\n\t\t- Distribution of conditions (Top-10): [('518.81', 21925), ('J96.00', 21451), ('I10', 17728), ('401.9', 16440), ('584.9', 14759), ('N17.9', 14548), ('486', 13704), ('J18.9', 13697), ('038.9', 12423), ('427.31', 12110)]\n\t- procedures:\n\t\t- Number of procedures per sample: 25.1875\n\t\t- Number of unique procedures: 461\n\t\t- Distribution of procedures (Top-10): [('notes/Progress Notes/Physical Exam/Physical Exam Obtain Options/Performed - Structured', 121653), ('notes/Progress Notes/Physical Exam/Physical Exam/Neurologic/GCS/Score/scored', 118406), ('notes/Progress Notes/Physical Exam/Physical Exam/Constitutional/Weight and I&O/Weight (kg)/A

In [3]:
los_dataset.samples[0]

{'visit_id': '224606',
 'patient_id': '002-10009+193705',
 'conditions': [['785.52',
   'R65.21',
   '287.5',
   'D69.6',
   '205.10',
   'C92.10',
   '567.9',
   'K65.0']],
 'procedures': [['notes/Progress Notes/Physical Exam/Physical Exam/Neurologic/GCS/Score/scored',
   'notes/Progress Notes/Physical Exam/Physical Exam Obtain Options/Performed - Structured',
   'notes/Progress Notes/Physical Exam/Physical Exam/Constitutional/Vital Sign and Physiological Data/HR/HR Current',
   'notes/Progress Notes/Physical Exam/Physical Exam/Constitutional/Vital Sign and Physiological Data/HR/HR Lowest',
   'notes/Progress Notes/Physical Exam/Physical Exam/Constitutional/Vital Sign and Physiological Data/HR/HR Highest',
   'notes/Progress Notes/Physical Exam/Physical Exam/Constitutional/Vital Sign and Physiological Data/BP (systolic)/BP (systolic) Current',
   'notes/Progress Notes/Physical Exam/Physical Exam/Constitutional/Vital Sign and Physiological Data/BP (systolic)/BP (systolic) Lowest',
   '

In [13]:
from pyhealth.datasets import split_by_patient, get_dataloader

train_dataset, val_dataset, test_dataset = split_by_patient(
    los_dataset, [0.8, 0.1, 0.1]
)
train_dataloader = get_dataloader(train_dataset, batch_size=256, shuffle=True)
val_dataloader = get_dataloader(val_dataset, batch_size=256, shuffle=False)
test_dataloader = get_dataloader(test_dataset, batch_size=256, shuffle=False)

## Step 3: build the RETAIN model from pyhealth.models

In [6]:
from pyhealth.models import RETAIN

# STEP 3: define model
model = RETAIN(
    dataset=los_dataset,
    feature_keys=["conditions", "procedures", "drugs"],
    label_key="label",
    mode="multiclass",
    dropout = 0.8,
    embedding_dim = 256,
)

## Step 4: use pyhealth.trainer.Trainer to train the RETAIN model

In [14]:
from pyhealth.trainer import Trainer

# STEP 4: define trainer
trainer = Trainer(model=model, metrics=["accuracy", "cohen_kappa"])

trainer.train(
    train_dataloader=train_dataloader,
    val_dataloader=val_dataloader,
    epochs=3,
    monitor="cohen_kappa",
) # model is training ...

RETAIN(
  (embeddings): ModuleDict(
    (conditions): Embedding(1672, 256, padding_idx=0)
    (procedures): Embedding(463, 256, padding_idx=0)
    (drugs): Embedding(1413, 256, padding_idx=0)
  )
  (linear_layers): ModuleDict()
  (retain): ModuleDict(
    (conditions): RETAINLayer(
      (dropout_layer): Dropout(p=0.8, inplace=False)
      (alpha_gru): GRU(256, 256, batch_first=True)
      (beta_gru): GRU(256, 256, batch_first=True)
      (alpha_li): Linear(in_features=256, out_features=1, bias=True)
      (beta_li): Linear(in_features=256, out_features=256, bias=True)
    )
    (procedures): RETAINLayer(
      (dropout_layer): Dropout(p=0.8, inplace=False)
      (alpha_gru): GRU(256, 256, batch_first=True)
      (beta_gru): GRU(256, 256, batch_first=True)
      (alpha_li): Linear(in_features=256, out_features=1, bias=True)
      (beta_li): Linear(in_features=256, out_features=256, bias=True)
    )
    (drugs): RETAINLayer(
      (dropout_layer): Dropout(p=0.8, inplace=False)
      (al

Epoch 0 / 3:   0%|          | 0/390 [00:00<?, ?it/s]

--- Train epoch-0, step-390 ---
loss: 1.6861


Evaluation: 100%|███████████████████████████████| 49/49 [00:01<00:00, 25.14it/s]

--- Eval epoch-0, step-390 ---
accuracy: 0.3642
cohen_kappa: 0.1460
loss: 1.6052
New best cohen_kappa score (0.1460) at epoch-0, step-390






Epoch 1 / 3:   0%|          | 0/390 [00:00<?, ?it/s]

--- Train epoch-1, step-780 ---
loss: 1.6721


Evaluation: 100%|███████████████████████████████| 49/49 [00:01<00:00, 24.68it/s]

--- Eval epoch-1, step-780 ---
accuracy: 0.3628
cohen_kappa: 0.1586
loss: 1.6047
New best cohen_kappa score (0.1586) at epoch-1, step-780








Epoch 2 / 3:   0%|          | 0/390 [00:00<?, ?it/s]

--- Train epoch-2, step-1170 ---
loss: 1.6688


Evaluation: 100%|███████████████████████████████| 49/49 [00:02<00:00, 24.49it/s]

--- Eval epoch-2, step-1170 ---
accuracy: 0.3672
cohen_kappa: 0.1569
loss: 1.6010
Loaded best model





## Step 5: model evaluation with two methods

In [15]:
# evaluation method 1
result = trainer.evaluate(test_dataloader)
print (result)

# evaluation method 2
from pyhealth.metrics.multiclass import multiclass_metrics_fn

y_true, y_prob, loss = trainer.inference(test_dataloader)
multiclass_metrics_fn(
    y_true,
    y_prob,
    metrics=["accuracy", "cohen_kappa"]
)

Evaluation: 100%|███████████████████████████████| 49/49 [00:02<00:00, 24.32it/s]


{'accuracy': 0.35814177784922036, 'cohen_kappa': 0.16105721580856514, 'loss': 1.630508296343745}


Evaluation: 100%|███████████████████████████████| 49/49 [00:01<00:00, 25.25it/s]


{'accuracy': 0.35814177784922036, 'cohen_kappa': 0.16105721580856514}