# Drug Recommendation using MoleRec Model on MIMIC-IV Dataset

This notebook demonstrates how to use the MoleRec model for drug recommendation using the MIMIC-IV dataset. The model is implemented using PyHealth 2.0 framework.

MoleRec

# 1. Import Required Libraries

We'll now import the necessary Libraries and Classes to run our MoleRec model.

In [1]:
import os
import numpy as np
import pandas as pd
import torch
from pyhealth.datasets import (
    MIMIC4Dataset,
    split_by_patient,
    get_dataloader,
    )
from pyhealth.tasks import DrugRecommendationMIMIC4
from pyhealth.models import MoleRec
from pyhealth.trainer import Trainer

  import pkg_resources
  from .autonotebook import tqdm as notebook_tqdm


# 2. Load and Process the MIMIC-IV Data

We'll load the MIMIC-IV dataset using the PyHealth's built-in dataset loader to include paitents, admissions, diagnoses, procedures and prescriptions. 

In [2]:
MIIMIC4_PATH = "/srv/local/data/physionet.org/files/mimiciv/2.0/hosp"

base_dataset = MIMIC4Dataset(
    ehr_root="/srv/local/data/physionet.org/files/mimiciv/2.2/",
    ehr_tables=[
        "patients",
        "admissions",
        "diagnoses_icd",
        "procedures_icd",
        "labevents",
    ],
)

base_dataset.stat()

Memory usage Starting MIMIC4Dataset init: 522.9 MB
Initializing MIMIC4EHRDataset with tables: ['patients', 'admissions', 'diagnoses_icd', 'procedures_icd', 'labevents'] (dev mode: False)
Using default EHR config: /Users/kierankelly/PyHealth/pyhealth/datasets/configs/mimic4_ehr.yaml
Memory usage Before initializing mimic4_ehr: 522.9 MB
Duplicate table names in tables list. Removing duplicates.
Initializing mimic4_ehr dataset from /srv/local/data/physionet.org/files/mimiciv/2.2/ (dev mode: False)
Scanning table: labevents from /srv/local/data/physionet.org/files/mimiciv/2.2/hosp/labevents.csv.gz


FileNotFoundError: Neither path exists: /srv/local/data/physionet.org/files/mimiciv/2.2/hosp/labevents.csv.gz or /srv/local/data/physionet.org/files/mimiciv/2.2/hosp/labevents.csv

Generate sample dataset using built in PyHealth task DrugReccomendationMIMIC4. Split the dataset for training, validation and test usage. Create Data Loaders using PyHealth get_dataloader()

In [None]:
sample_dataset = base_dataset.set_task(
    DrugRecommendationMIMIC4(),
    num_workers=4,
    cache_path="./mimic4_drugrec_cache"
    )

sample_dataset.stat()

train_dataset, val_dataset, test_dataset = split_by_patient(
    sample_dataset,
    ratios=[0.8, 0.1, 0.1]
)

train_dataloader = get_dataloader(train_dataset, batch_size=32, shuffle=True)
val_dataloader = get_dataloader(val_dataset, batch_size=32, shuffle=False)
test_dataloader = get_dataloader(test_dataset, batch_size=32, shuffle=False)

# 3. Initialize and Configure MoleRec Model

Use PyHealths MoleRec library to set up the MoleRec model with appropriate hyperparameters for Drug Reccomendation. 

In [None]:
model = MoleRec(
    sample_dataset,
    feature_keys=["conditions", "procedures"],
    label_key="medications",
    mode="multilabel",
)

# 4. Train Model

Training the MoleRec model using MIMIC-IV data.

In [3]:
trainer = Trainer(
    model=model,
    metrics=["pr_auc_samples", "f1_samples",  "jaccard_samples"]
)

NameError: name 'model' is not defined

In [None]:
trainer.train(
    train_dataloader=train_dataloader,
    val_dataloader=val_dataloader,
    epochs=3,
    monitor="pr_auc_samples",
)

# 5. Evaluate Results

Evaluate and print results. 

In [None]:
results = trainer.evaluate(test_dataloader)

print("Test Results:")
print(results)

print(f"\nKey Metrics:")
print(f"PR-AUC: {results['pr_auc_samples']:.4f}")
print(f"F1 Score: {results['f1_samples']:.4f}")
print(f"Jaccard: {results['jaccard_samples']:.4f}")