<a href="https://colab.research.google.com/github/mingmcs/pyhealth/blob/week5/Tutorial_3_pyhealth_models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Preparation**
- install pyhealth alpha version 

In [None]:
!pip install pyhealth

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


### **Instruction on [pyhealth.models](https://pyhealth.readthedocs.io/en/latest/api/models.html)**
- **[README]**: In this package, we provide common deep learning models (e.g., RNN, CNN, Transformer) and special healthcare deep learning models, such as (e.g., RETAIN, SafeDrug, GAMENet). All except some special models (e.g., GAMENet, SafeDrug, MICRON are designed only for drug recommendation task) can be applied to all healthcare prediction tasks. Note that, we have provided two callable methods of each deep learning model:
  - Model, such as RNN, CNN, Transformer, RETAIN, **initialized by our dataset object**
  - ModelLayer (alternatively), such as RNNLayer, CNNLayer, TransformerLayer, RETAINLayer. Alternatively, **initialized by auxiliary information (specified for each layer)**.

- **[Arguments for Model]**:
  The arguments for each DL Model follows the arguments below.
    - `dataset`: this is the [pyhealth.dataset](https://pyhealth.readthedocs.io/en/latest/api/datasets.html) object. All model specific processing is based on this and is handled within the Model class.
    - `feature_keys`: a list of string-based table names, indicating that these tables will be used in the pipeline.
    - `label_key`: currently, we only support `label`, defined in task function.
    - `mode`: `multiclass`, `multilabel`, or `binary`.
    
- **[Arguments for the ModelLayer]**: 
Alternatively, if users do not want to use the [pyhealth.dataset](https://pyhealth.readthedocs.io/en/latest/api/datasets.html) object for initializing Model, then they can choose to call the ModelLayer by preparing inputs following the requirements. The inputs of each ModelLayer can be different (refer to [pyhealth.models](https://pyhealth.readthedocs.io/en/latest/api/models.html). For example, we list the arguments of a RNNLayer below:
  - `input_size`: input size of rnn
  - `hidden_size`: hidden size of rnn
  - `rnn_type`: type of rnn, e.g. GRU, LSTM
  - `num_layers`: number of rnn layers
  - `dropout`: dropout rate
  - `bidirectional`: whether to use bidirectional rnn


### **Step 1 & 2: Prepare datasets and task function**
- We use **OMOP dataset** for **readmission prediction** task. More details can refer to [Tutorial 1](https://colab.research.google.com/drive/18kbzEQAj1FMs_J9rTGX8eCoxnWdx4Ltn?usp=sharing) and [Tutorial 2](https://colab.research.google.com/drive/1r7MYQR_5yCJGpK_9I9-A10HmpupZuIN-?usp=sharing)

In [None]:
from pyhealth.datasets import MIMIC3Dataset
from pyhealth.tasks import readmission_prediction_mimic3_fn

dataset = MIMIC3Dataset(
        root="https://storage.googleapis.com/pyhealth/Synthetic_MIMIC-III/",
        tables=["DIAGNOSES_ICD", "PROCEDURES_ICD", "PRESCRIPTIONS"],
        code_mapping={"NDC": "ATC"},
        # develop mode is True enables small data load
        dev=True,
)

# specify which task
dataset = dataset.set_task(readmission_prediction_mimic3_fn)

Generating samples for readmission_prediction_mimic3_fn: 100%|██████████| 1000/1000 [00:00<00:00, 157722.11it/s]


In [None]:
# check the format of the first sample
dataset.samples[2]

{'visit_id': '100183',
 'patient_id': '175',
 'conditions': [['5990',
   '4280',
   '2851',
   '4240',
   '2749',
   '9982',
   'E8499',
   '42831',
   '34600']],
 'procedures': [['0040', '3931', '7769']],
 'drugs': [['N06DA02',
   'V06DC01',
   'B01AB01',
   'A06AA02',
   'R03AC02',
   'H03AA01',
   'J01FA09']],
 'label': 0}

### **Step 3 (Example): Using RETAIN or RETAIN Layer**
- Option 1: we choose to initialize the **pyhealth.models.RETAIN** model.
- Option 2: we choose to customize a new model with our **pyhealth.models.RETAINLayer**.


In [None]:
# option 1

from pyhealth.models import RETAIN

device = "cpu"

model = RETAIN(
    # argument 1: call the dataset
    dataset=dataset,
    # argument 2: use a subset of keys in the data sample format for features 
    # look up what are available for "feature_keys" and "label_keys" in dataset.samples[0]
    feature_keys=["conditions", "procedures"],
    # argument 3: use `label` for indicating the prediction label_key
    label_key="label",
    # argument 4: set the embedding dimension
    embedding_dim=128,
    # argument 5: what type of tasks, multiclass, multilabel, or binary?
    mode="binary",
)
model.to(device)

RETAIN(
  (embeddings): ModuleDict(
    (conditions): Embedding(303, 128, padding_idx=0)
    (procedures): Embedding(101, 128, padding_idx=0)
  )
  (linear_layers): ModuleDict()
  (retain): ModuleDict(
    (conditions): RETAINLayer(
      (dropout_layer): Dropout(p=0.5, inplace=False)
      (alpha_gru): GRU(128, 128, batch_first=True)
      (beta_gru): GRU(128, 128, batch_first=True)
      (alpha_li): Linear(in_features=128, out_features=1, bias=True)
      (beta_li): Linear(in_features=128, out_features=128, bias=True)
    )
    (procedures): RETAINLayer(
      (dropout_layer): Dropout(p=0.5, inplace=False)
      (alpha_gru): GRU(128, 128, batch_first=True)
      (beta_gru): GRU(128, 128, batch_first=True)
      (alpha_li): Linear(in_features=128, out_features=1, bias=True)
      (beta_li): Linear(in_features=128, out_features=128, bias=True)
    )
  )
  (fc): Linear(in_features=256, out_features=1, bias=True)
)

In [None]:
# option 2

from pyhealth.models import RETAINLayer
import torch.nn as nn

class NewModel(nn.Module):
    def __init__(
        self,
        input_size: int = 64,
        hidden_size: int = 128,
        num_layers: int = 1,
        dropout: float = 0.5,
    ):
        super(NewModel, self).__init__()

        # TODO: implement other module 1
        self.module1 = None

        # initilize the RNNLayer
        self.rnn = RETAINLayer(input_size, dropout)

        # TODO: implement other module 2
        self.module2 = None


    def forward(self, x):
        x = self.module1(x)
        # call the RNNLayer
        x = self.rnn(x)
        x = self.module2(x)
        return x

model = NewModel()

### **Step 4 (Example): Using Transformer model**
- Option 1: we choose to initialize the **pyhealth.models.Transformer** model.
- Option 2: we choose to customize a new model with our **pyhealth.models.TransformerLayer**.


In [None]:
# option 1

from pyhealth.models import Transformer

device = "cpu"

model = Transformer(
    # argument 1: call the dataset
    dataset=dataset,
    # argument 2: use a subset of keys in the data sample format for features 
    # look up what are available for "feature_keys" and "label_keys" in dataset.samples[0]
    feature_keys=["conditions", "procedures"],
    # argument 3: use `label` for indicating the prediction label_key
    label_key="label",
    # argument 4: set the embedding dimension
    embedding_dim=128,
    # argument 5: what type of tasks, multiclass, multilabel, or binary?
    mode="binary",
)
model.to(device)

Transformer(
  (embeddings): ModuleDict(
    (conditions): Embedding(303, 128, padding_idx=0)
    (procedures): Embedding(101, 128, padding_idx=0)
  )
  (linear_layers): ModuleDict()
  (transformer): ModuleDict(
    (conditions): TransformerLayer(
      (transformer): ModuleList(
        (0): TransformerBlock(
          (attention): MultiHeadedAttention(
            (linear_layers): ModuleList(
              (0): Linear(in_features=128, out_features=128, bias=False)
              (1): Linear(in_features=128, out_features=128, bias=False)
              (2): Linear(in_features=128, out_features=128, bias=False)
            )
            (output_linear): Linear(in_features=128, out_features=128, bias=False)
            (attention): Attention()
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (feed_forward): PositionwiseFeedForward(
            (w_1): Linear(in_features=128, out_features=512, bias=True)
            (w_2): Linear(in_features=512, out_features=128, 

In [None]:
# option 2: build your new model

from pyhealth.models import TransformerLayer
import torch.nn as nn

class NewModel(nn.Module):
    def __init__(
        self,
        input_size: int = 64,
        hidden_size: int = 128,
        num_layers: int = 1,
        dropout: float = 0.5,
    ):
        super(NewModel, self).__init__()

        # you can implement other modules here
        self.module1 = None

        # initilize the RNNLayer
        self.transformer = TransformerLayer(input_size, dropout)

        # you can implement other modules here
        self.module2 = None


    def forward(self, x):
        x = self.module1(x)
        # call the RNNLayer
        x = self.transformer(x)
        x = self.module2(x)
        return x

model = NewModel()

If you find it useful, please give us a star ⭐ (fork, and watch) at https://github.com/sunlabuiuc/PyHealth. 

Thanks very much for your support!