Skip to content

Conversation

@jhnwu3
Copy link
Collaborator

@jhnwu3 jhnwu3 commented Jan 26, 2026

This pull request introduces a new multimodal RNN model to the pyhealth library, enabling the handling of both sequential and non-sequential input features for clinical prediction tasks. It also provides a comprehensive example script for using the new model on the MIMIC-IV dataset for in-hospital mortality prediction. Additionally, the documentation and model API are updated to reflect these changes.

Major changes include:

New Model: MultimodalRNN

  • Added the MultimodalRNN class to pyhealth.models.rnn, which automatically distinguishes between sequential and non-sequential features and processes them appropriately (sequential features via RNN layers, non-sequential features via direct embedding and pooling). The model concatenates all feature representations for final prediction.
  • Exported MultimodalRNN in the pyhealth.models package init file for public use.
  • Updated the API documentation to include MultimodalRNN.

Example Usage

  • Added a detailed example script mortality_mimic4_multimodal_rnn.py demonstrating how to use the new MultimodalRNN model for mortality prediction with mixed feature types on the MIMIC-IV dataset. The script covers data loading, task setup, model training, evaluation, and sample predictions.

Improvements and Bug Fixes

  • Improved masking in RNN forward passes by using the absolute value before summing, ensuring correct mask computation even if embeddings sum to zero due to negative values.
  • Refactored imports and type annotations in pyhealth.models.rnn to support the new model and feature classification logic.

Comment on lines 250 to 252
# Use abs() before sum to catch edge cases where embeddings sum to 0
# despite being valid values (e.g., [1.0, -1.0])
mask = (torch.abs(x).sum(dim=-1) != 0).int()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think [0.0, 0.0] could still be a valid embedding valid even for non-padding token, though very unlikely. But this can be considered as a temporary fix for now i think.

We may want to add a TODO here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we may need to change our EmbeddingModel to be able to return a pad/mask tensor itself through the use of our processor vocabulary, and have it as a used return option here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I don't think the mask here really does very much here lol the more I think about it. It just assumes that somehow a 0 embedding from the EmbeddingModel is properly assigning 0 values to the sequence embeddings, etc.

@jhnwu3 jhnwu3 merged commit 7dfe6e4 into master Jan 27, 2026
1 check passed
@jhnwu3 jhnwu3 deleted the add/multimodal_RNN_clean branch January 27, 2026 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants