# PyHealth In-Hospital Mortality Prediction

Predict in-hospital mortality using EHR time series data from MIMIC-IV.

**Task:** Predict whether a patient will die during their ICU stay based on clinical measurements (vitals, labs) from the first 48 hours.

**Dataset:** MIMIC-IV (using same data as MedMod project, but EHR-only)

**Relation to MedMod:** This is the unimodal EHR baseline from our reproduction - implementing it cleanly with PyHealth.

**Author:** CS598 Deep Learning for Healthcare

In [1]:
# Import required packages
import torch
import numpy as np
import pandas as pd
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

PyTorch version: 2.5.1
CUDA available: True


## 1. Load MIMIC-IV Data

We'll use the preprocessed MIMIC-IV in-hospital mortality data from the mimic4extract pipeline (same as MedMod).

In [26]:
# Path to MIMIC-IV extracted data
data_root = Path(r"C:\Users\Rohan Suri\MedMod\MedMod-main\MedMod-main\mimic4extract\data\in-hospital-mortality")

# Check if data exists
if data_root.exists():
    print(f"✓ Found data at {data_root}")
    print(f"  - train/ folder: {(data_root / 'train').exists()}")
    print(f"  - test/ folder: {(data_root / 'test').exists()}")
else:
    print(f"✗ Data not found at {data_root}")
    print("Please ensure mimic4extract preprocessing is complete.")

✓ Found data at C:\Users\Rohan Suri\MedMod\MedMod-main\MedMod-main\mimic4extract\data\in-hospital-mortality
  - train/ folder: True
  - test/ folder: True


## 2. Create PyHealth-Compatible Dataset

Load the time series data and convert to PyHealth format.

In [None]:
#from pyhealth.datasets import SampleDataset
from pyhealth.data import Patient, Visit

def load_mimic4_mortality_samples(split='train', max_features=None):
    """Load MIMIC-IV mortality data and convert to PyHealth samples."""
    
    # Read listfile
    listfile = pd.read_csv(data_root / f"{split}_listfile.csv")
    
    samples = []
    data_dir = data_root / split
    all_feature_dims = []
    
    print(f"Loading {split} data from {len(listfile)} patients...")
    
    # First pass: determine max feature dimension
    if max_features is None:
        for idx, row in listfile.iterrows():
            ts_file = data_dir / row['stay']
            if ts_file.exists():
                ts_data = pd.read_csv(ts_file)
                numeric_cols = ts_data.select_dtypes(include=[np.number]).columns
                all_feature_dims.append(len(numeric_cols))
        max_features = max(all_feature_dims) if all_feature_dims else 17
        print(f"  Max feature dimension: {max_features}")
    
    # Second pass: load data with consistent dimensions
    for idx, row in listfile.iterrows():
        if idx % 1000 == 0:
            print(f"  Processed {idx}/{len(listfile)} samples...")
        
        # Load time series
        ts_file = data_dir / row['stay']
        if not ts_file.exists():
            continue
            
        # Read CSV and ensure numeric data only
        ts_data = pd.read_csv(ts_file)
        
        # Select only numeric columns
        numeric_cols = ts_data.select_dtypes(include=[np.number]).columns
        ts_numeric = ts_data[numeric_cols].apply(pd.to_numeric, errors='coerce')
        
        # Fill NaN with 0
        ts_numeric = ts_numeric.fillna(0)
        
        # Pad or trim to max_features dimension
        ts_array = ts_numeric.values.astype(np.float32)
        if ts_array.shape[1] < max_features:
            # Pad with zeros
            padding = np.zeros((ts_array.shape[0], max_features - ts_array.shape[1]), dtype=np.float32)
            ts_array = np.concatenate([ts_array, padding], axis=1)
        elif ts_array.shape[1] > max_features:
            # Trim
            ts_array = ts_array[:, :max_features]
        
        # Extract features
        features = {
            'vitals_mean': ts_array.mean(axis=0).tolist(),
            'vitals_std': ts_array.std(axis=0).tolist(),
            'vitals_min': ts_array.min(axis=0).tolist(),
            'vitals_max': ts_array.max(axis=0).tolist(),
        }
        
        # Create sample
        sample = {
            'patient_id': str(row['stay'].split('_')[0]),
            'visit_id': str(row['stay']),
            'features': features,
            'label': int(row['y_true']),
            'timeseries': ts_array,
            'n_features': max_features
        }
        
        samples.append(sample)
    
    print(f"✓ Loaded {len(samples)} samples from {split} split (feature_dim={max_features})")
    return samples, max_features

# Load train and test splits
all_train_samples, max_features = load_mimic4_mortality_samples('train')
test_samples, _ = load_mimic4_mortality_samples('test', max_features=max_features)

# Split training data into train (80%) and validation (20%)
from sklearn.model_selection import train_test_split

train_samples, val_samples = train_test_split(
    all_train_samples, 
    test_size=0.2, 
    random_state=42,
    stratify=[s['label'] for s in all_train_samples]  # Stratify by label to maintain class balance
)

print(f"\nDataset summary:")
print(f"  Train: {len(train_samples)} samples")
print(f"  Val: {len(val_samples)} samples")
print(f"  Test: {len(test_samples)} samples")
print(f"  Feature dimension: {max_features}")

# Check class balance
train_mortality_rate = np.mean([s['label'] for s in train_samples])
val_mortality_rate = np.mean([s['label'] for s in val_samples])
test_mortality_rate = np.mean([s['label'] for s in test_samples])
print(f"\nMortality rates:")
print(f"  Train: {train_mortality_rate:.3f}")
print(f"  Val: {val_mortality_rate:.3f}")
print(f"  Test: {test_mortality_rate:.3f}")

Loading train data from 29171 patients...


## 3. Create PyHealth Sample Dataset

Wrap our samples in PyHealth's SampleDataset format.

In [None]:
from torch.utils.data import Dataset, DataLoader
from torch.nn.utils.rnn import pad_sequence

class MortalityDataset(Dataset):
    """Simple dataset wrapper for mortality prediction."""
    
    def __init__(self, samples, use_timeseries=True):
        self.samples = samples
        self.use_timeseries = use_timeseries
        
    def __len__(self):
        return len(self.samples)
    
    def __getitem__(self, idx):
        sample = self.samples[idx]
        
        if self.use_timeseries:
            # Return time series for RNN/LSTM
            # Convert to float array and handle NaNs
            ts = np.array(sample['timeseries'], dtype=np.float32)
            ts = np.nan_to_num(ts, nan=0.0)  # Replace NaN with 0
            x = torch.from_numpy(ts)
        else:
            # Return aggregated features for MLP
            features = sample['features']
            feat_array = np.array(
                features['vitals_mean'] + 
                features['vitals_std'] + 
                features['vitals_min'] + 
                features['vitals_max'],
                dtype=np.float32
            )
            feat_array = np.nan_to_num(feat_array, nan=0.0)
            x = torch.from_numpy(feat_array)
        
        y = torch.FloatTensor([sample['label']])
        
        return x, y

def collate_fn(batch):
    """Custom collate function to handle variable-length sequences."""
    sequences, labels = zip(*batch)
    
    # Pad sequences to same length
    padded_sequences = pad_sequence(sequences, batch_first=True, padding_value=0.0)
    labels = torch.stack(labels)
    
    return padded_sequences, labels

# Create datasets
train_dataset = MortalityDataset(train_samples, use_timeseries=True)
val_dataset = MortalityDataset(val_samples, use_timeseries=True)
test_dataset = MortalityDataset(test_samples, use_timeseries=True)

# Create dataloaders with custom collate function
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=0, collate_fn=collate_fn)
val_loader = DataLoader(val_dataset, batch_size=64, shuffle=False, num_workers=0, collate_fn=collate_fn)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False, num_workers=0, collate_fn=collate_fn)

print("✓ Created dataloaders")
print(f"  Batch size: 64")
print(f"  Train batches: {len(train_loader)}")
print(f"  Val batches: {len(val_loader)}")
print(f"  Test batches: {len(test_loader)}")

✓ Created dataloaders
  Batch size: 64
  Train batches: 456
  Val batches: 0
  Test batches: 83


## 4. Build LSTM Model

Simple LSTM model for mortality prediction (matches MedMod unimodal baseline).

In [None]:
import torch.nn as nn

class LSTMMortalityModel(nn.Module):
    """LSTM model for mortality prediction."""
    
    def __init__(self, input_dim=76, hidden_dim=256, num_layers=2, dropout=0.3):
        super().__init__()
        
        self.lstm = nn.LSTM(
            input_dim, 
            hidden_dim, 
            num_layers,
            batch_first=True,
            dropout=dropout,
            bidirectional=True
        )
        
        # Bidirectional doubles the hidden dimension
        self.fc = nn.Linear(hidden_dim * 2, 1)
        self.sigmoid = nn.Sigmoid()
        
    def forward(self, x):
        # x shape: (batch, seq_len, features)
        lstm_out, (hidden, cell) = self.lstm(x)
        
        # Use last hidden state
        last_hidden = lstm_out[:, -1, :]
        
        # Predict
        logits = self.fc(last_hidden)
        probs = self.sigmoid(logits)
        
        return probs

# Initialize model with correct input dimension
model = LSTMMortalityModel(
    input_dim=max_features,  # Use actual feature dimension from data
    hidden_dim=256,
    num_layers=2,
    dropout=0.3
)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

print(model)
print(f"\nDevice: {device}")
print(f"Input dimension: {max_features}")
print(f"Parameters: {sum(p.numel() for p in model.parameters()):,}")

LSTMMortalityModel(
  (lstm): LSTM(18, 256, num_layers=2, batch_first=True, dropout=0.3, bidirectional=True)
  (fc): Linear(in_features=512, out_features=1, bias=True)
  (sigmoid): Sigmoid()
)

Device: cuda
Input dimension: 18
Parameters: 2,142,721


## 5. Train Model

In [None]:
from sklearn.metrics import roc_auc_score, average_precision_score, accuracy_score
import torch.optim as optim

# Training setup
criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=0.0001)

def evaluate(model, dataloader, device):
    """Evaluate model and return metrics."""
    model.eval()
    all_probs = []
    all_labels = []
    total_loss = 0
    
    with torch.no_grad():
        for x, y in dataloader:
            x, y = x.to(device), y.to(device)
            
            probs = model(x)
            loss = criterion(probs, y)
            
            all_probs.extend(probs.cpu().numpy())
            all_labels.extend(y.cpu().numpy())
            total_loss += loss.item()
    
    all_probs = np.array(all_probs).flatten()
    all_labels = np.array(all_labels).flatten()
    
    # Check if we have any samples
    if len(all_probs) == 0:
        print(f"WARNING: No samples in dataloader!")
        return {
            'loss': 0.0,
            'auroc': 0.0,
            'auprc': 0.0,
            'accuracy': 0.0
        }
    
    # Check if we have both classes
    if len(np.unique(all_labels)) < 2:
        print(f"WARNING: Only one class present in labels. Cannot compute AUROC/AUPRC.")
        accuracy = accuracy_score(all_labels, (all_probs > 0.5).astype(int))
        avg_loss = total_loss / len(dataloader)
        return {
            'loss': avg_loss,
            'auroc': 0.0,
            'auprc': 0.0,
            'accuracy': accuracy
        }
    
    auroc = roc_auc_score(all_labels, all_probs)
    auprc = average_precision_score(all_labels, all_probs)
    accuracy = accuracy_score(all_labels, (all_probs > 0.5).astype(int))
    avg_loss = total_loss / len(dataloader)
    
    return {
        'loss': avg_loss,
        'auroc': auroc,
        'auprc': auprc,
        'accuracy': accuracy
    }

# Training loop
num_epochs = 5 # Reduced for quicker testing, 50 is better for final runs
best_val_auroc = 0

print("Starting training...")
print("-" * 60)

for epoch in range(num_epochs):
    model.train()
    train_loss = 0
    
    for batch_idx, (x, y) in enumerate(train_loader):
        x, y = x.to(device), y.to(device)
        
        optimizer.zero_grad()
        probs = model(x)
        loss = criterion(probs, y)
        loss.backward()
        optimizer.step()
        
        train_loss += loss.item()
    
    # Evaluate
    train_metrics = evaluate(model, train_loader, device)
    val_metrics = evaluate(model, val_loader, device)
    
    print(f"Epoch {epoch+1}/{num_epochs}")
    print(f"  Train - Loss: {train_metrics['loss']:.4f}, AUROC: {train_metrics['auroc']:.4f}, AUPRC: {train_metrics['auprc']:.4f}")
    print(f"  Val   - Loss: {val_metrics['loss']:.4f}, AUROC: {val_metrics['auroc']:.4f}, AUPRC: {val_metrics['auprc']:.4f}")
    
    # Save best model
    if val_metrics['auroc'] > best_val_auroc:
        best_val_auroc = val_metrics['auroc']
        torch.save(model.state_dict(), 'best_mortality_model.pt')
        print(f"  ✓ Saved best model (AUROC: {best_val_auroc:.4f})")

print("\n✓ Training complete!")

Starting training...
------------------------------------------------------------


ValueError: Found array with 0 sample(s) (shape=(0,)) while a minimum of 1 is required.

## 6. Evaluate on Test Set

In [None]:
# Load best model
model.load_state_dict(torch.load('best_mortality_model.pt'))

# Evaluate on test set
test_metrics = evaluate(model, test_loader, device)

print("=" * 60)
print("FINAL TEST RESULTS")
print("=" * 60)
print(f"Test AUROC:    {test_metrics['auroc']:.4f}")
print(f"Test AUPRC:    {test_metrics['auprc']:.4f}")
print(f"Test Accuracy: {test_metrics['accuracy']:.4f}")
print(f"Test Loss:     {test_metrics['loss']:.4f}")
print("=" * 60)

## 7. Detailed Analysis

In [None]:
from sklearn.metrics import confusion_matrix, classification_report

# Get predictions
model.eval()
all_probs = []
all_labels = []

with torch.no_grad():
    for x, y in test_loader:
        x = x.to(device)
        probs = model(x)
        all_probs.extend(probs.cpu().numpy())
        all_labels.extend(y.numpy())

all_probs = np.array(all_probs).flatten()
all_labels = np.array(all_labels).flatten()
all_preds = (all_probs > 0.5).astype(int)

# Confusion matrix
cm = confusion_matrix(all_labels, all_preds)
print("Confusion Matrix:")
print(cm)
print()

# Classification report
print("Classification Report:")
print(classification_report(all_labels, all_preds, target_names=['Survive', 'Death']))

## Summary

This notebook demonstrated:

1. **Loading MIMIC-IV mortality data** - Same preprocessing as MedMod project
2. **Building LSTM model** - Bidirectional LSTM (matches MedMod unimodal baseline)
3. **Training with binary cross-entropy** - Standard mortality prediction setup
4. **Evaluation with AUROC/AUPRC** - Same metrics as MedMod paper

**Comparison to MedMod Project:**
- **MedMod unimodal baseline:** AUROC 0.822 (reported in paper)
- **This PyHealth implementation:** Should achieve similar AUROC (~0.80-0.82)
- **Key difference:** This uses PyHealth-style data loading but custom LSTM model

**Why this approach works:**
- ✅ Uses same MIMIC-IV data as MedMod
- ✅ Same task (in-hospital mortality)
- ✅ Similar model architecture (BiLSTM)
- ✅ Clean, reproducible code
- ✅ No multimodal complexity

**For PyHealth contribution:** This could be extended to use PyHealth's built-in MIMIC4Dataset loader instead of manual CSV loading.