Skip to content

Add TupleTimeTextProcessor for Temporal Text Data#829

Merged
jhnwu3 merged 27 commits intosunlabuiuc:masterfrom
Rian354:master
Feb 9, 2026
Merged

Add TupleTimeTextProcessor for Temporal Text Data#829
jhnwu3 merged 27 commits intosunlabuiuc:masterfrom
Rian354:master

Conversation

@Rian354
Copy link
Copy Markdown
Contributor

@Rian354 Rian354 commented Feb 8, 2026

Add TupleTimeTextProcessor for Temporal Text Data

Summary/TLDR

Adds TupleTimeTextProcessor, a processor for handling clinical text paired with temporal information (time differences). Allows for modality routing in multimodal pipelines.

Key Features

  • Temporal text handling: Processes (List[str], List[float]) tuples
  • Automatic modality routing: type_tag enables automatic encoder selection
  • Clean interface: Returns (texts, time_tensor, modality_tag)
  • Registered processor: Available via string key "tuple_time_text"

I/O

Input:  Tuple[List[str], List[float]]
        - List[str]: Clinical text entries
        - List[float]: Time differences between entries

Output: Tuple[List[str], torch.Tensor, str]
        - List[str]: Same text entries (unmodified)
        - torch.Tensor: 1D float tensor of time differences
        - str: Type tag for modality routing (default: "note")

Files Added

  • pyhealth/processors/tuple_time_text_processor.py (107 loc)
  • tests/test_tuple_time_text_processor.py (1 test)
  • docs/api/processors/pyhealth.processors.TupleTimeTextProcessor.rst (docs)

Files Modified

  • pyhealth/processors/__init__.py (export added)
  • docs/api/processors.rst (documentation index)
  • examples/text_embedding_tutorial.ipynb (added example section)

Example Usage

from pyhealth.processors import TupleTimeTextProcessor

# Initialize processor with modality tag
processor = TupleTimeTextProcessor(type_tag="clinical_note")

# Process temporal text data
texts = ["Admission note", "Progress note", "Discharge summary"]
time_diffs = [0.0, 24.0, 72.0]  # hours since admission

texts_out, time_tensor, tag = processor.process((texts, time_diffs))
# texts_out: ["Admission note", "Progress note", "Discharge summary"]
# time_tensor: tensor([0., 24., 72.])
# tag: "clinical_note"

Testing

pytest tests/test_tuple_time_text_processor.py -v
# 1 passed

Rian354 and others added 27 commits December 8, 2025 03:08
- Created TupleTimeTextProcessor for (text, time_diff) tuples
- Handles temporal clinical text with automatic modality routing
- Comprehensive test suite (16 tests, all passing)
- Added processor documentation in docs/api/processors/
- Updated tutorial notebook with multimodal fusion examples
- Registered processor with 'tuple_time_text' string key
Copy link
Copy Markdown
Collaborator

@jhnwu3 jhnwu3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also pretty much lgtm, thanks!

@jhnwu3 jhnwu3 merged commit 91a0716 into sunlabuiuc:master Feb 9, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants