Add multimodal RNN support #797

jhnwu3 · 2026-01-26T22:25:29Z

This pull request introduces a new multimodal RNN model to the pyhealth library, enabling the handling of both sequential and non-sequential input features for clinical prediction tasks. It also provides a comprehensive example script for using the new model on the MIMIC-IV dataset for in-hospital mortality prediction. Additionally, the documentation and model API are updated to reflect these changes.

Major changes include:

New Model: MultimodalRNN

Added the MultimodalRNN class to pyhealth.models.rnn, which automatically distinguishes between sequential and non-sequential features and processes them appropriately (sequential features via RNN layers, non-sequential features via direct embedding and pooling). The model concatenates all feature representations for final prediction.
Exported MultimodalRNN in the pyhealth.models package init file for public use.
Updated the API documentation to include MultimodalRNN.

Example Usage

Added a detailed example script mortality_mimic4_multimodal_rnn.py demonstrating how to use the new MultimodalRNN model for mortality prediction with mixed feature types on the MIMIC-IV dataset. The script covers data loading, task setup, model training, evaluation, and sample predictions.

Improvements and Bug Fixes

Improved masking in RNN forward passes by using the absolute value before summing, ensuring correct mask computation even if embeddings sum to zero due to negative values.
Refactored imports and type annotations in pyhealth.models.rnn to support the new model and feature classification logic.

Logiquo · 2026-01-27T08:43:00Z

pyhealth/models/rnn.py

+            # Use abs() before sum to catch edge cases where embeddings sum to 0
+            # despite being valid values (e.g., [1.0, -1.0])
+            mask = (torch.abs(x).sum(dim=-1) != 0).int()


I think [0.0, 0.0] could still be a valid embedding valid even for non-padding token, though very unlikely. But this can be considered as a temporary fix for now i think.

We may want to add a TODO here.

Yeah, I think we may need to change our EmbeddingModel to be able to return a pad/mask tensor itself through the use of our processor vocabulary, and have it as a used return option here.

Because I don't think the mask here really does very much here lol the more I think about it. It just assumes that somehow a 0 embedding from the EmbeddingModel is properly assigning 0 values to the sequence embeddings, etc.

Add multimodal RNN support

f2a35e5

Logiquo reviewed Jan 27, 2026

View reviewed changes

Logiquo approved these changes Jan 27, 2026

View reviewed changes

add @todos for later

b0b29da

jhnwu3 merged commit 7dfe6e4 into master Jan 27, 2026
1 check passed

jhnwu3 deleted the add/multimodal_RNN_clean branch January 27, 2026 17:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multimodal RNN support #797

Add multimodal RNN support #797

Uh oh!

jhnwu3 commented Jan 26, 2026

Uh oh!

Logiquo Jan 27, 2026

Uh oh!

jhnwu3 Jan 27, 2026

Uh oh!

jhnwu3 Jan 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add multimodal RNN support #797

Add multimodal RNN support #797

Uh oh!

Conversation

jhnwu3 commented Jan 26, 2026

New Model: MultimodalRNN

Example Usage

Improvements and Bug Fixes

Uh oh!

Logiquo Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

jhnwu3 Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

jhnwu3 Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants