Skip to content

Conversation

@Mdnaimulislam
Copy link
Contributor

@Mdnaimulislam Mdnaimulislam commented Jun 19, 2025

Adds a tutorial for cardiac hemodynamics assessment using PyKale with multimodal low-cost CXR and ECG modalities.

Notebook structure consisting of:

  • Introduction and Objective
  • Setup (Warning Supression, Required Packages, and Helper Functions)
  • Pretrain Data Loading and Preprocessing
  • Model Definition
  • Pretraining
  • Finetune Data Loading and Preprocessing
  • Finetuning and Evaluation
  • Multimdoal Interpretation
  • Adds descriptions for each sections.

@Mdnaimulislam Mdnaimulislam requested a review from shuo-zhou June 19, 2025 11:07
@Mdnaimulislam Mdnaimulislam self-assigned this Jun 19, 2025
@Mdnaimulislam Mdnaimulislam added the enhancement New feature or request label Jun 19, 2025
@shuo-zhou shuo-zhou requested a review from Copilot June 23, 2025 08:19
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds a new tutorial for cardiac hemodynamics assessment using multimodal CXR and ECG data, including utilities, configs, and an interpretation script.

  • Introduces remap_model_parameters.py to align pretrained checkpoint keys with renamed parameters
  • Adds separate pretraining and finetuning configuration modules plus corresponding experiment YAMLs
  • Implements interpret.py for integrated‐gradients attribution on ECG and CXR, and updates the tutorial Table of Contents

Reviewed Changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tutorials/cardiac-hemodynamics-assesment/remap_model_parameters.py Utility to remap pretrained model parameter names
tutorials/cardiac-hemodynamics-assesment/pretraining_config.py Default configuration for pretraining
tutorials/cardiac-hemodynamics-assesment/interpret.py Script for multimodal ECG+CXR attribution
tutorials/cardiac-hemodynamics-assesment/finetune_config.py Default configuration for finetuning
tutorials/cardiac-hemodynamics-assesment/experiments/pretraining_base.yml Base YAML for pretraining experiments
tutorials/cardiac-hemodynamics-assesment/experiments/finetune_base.yml Base YAML for finetuning experiments
_toc.yml Added cardiac-hemodynamics-assesment notebook entry
Comments suppressed due to low confidence (2)

_toc.yml:26

  • The folder name 'cardiac-hemodynamics-assesment' is misspelled; consider renaming it to 'cardiac-hemodynamics-assessment' for consistency.
  - file: tutorials/cardiac-hemodynamics-assesment/notebook

tutorials/cardiac-hemodynamics-assesment/remap_model_parameters.py:4

  • Add unit tests for remap_state_dict_keys to verify that each mapping rule correctly renames all expected keys.
def remap_state_dict_keys(state_dict):

# --- Prediction ---
last_fold_model.eval()
with torch.no_grad():
logits = last_fold_model(xray_image, ecg_waveform)
Copy link

Copilot AI Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model forward pass uses the raw ecg_waveform, but IntegratedGradients is applied to ecg_smoothed_tensor. This mismatch can produce incorrect attributions; use the same input tensor in both the prediction and attribution steps.

Suggested change
logits = last_fold_model(xray_image, ecg_waveform)
logits = last_fold_model(xray_image, ecg_smoothed_tensor)

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for prediction. so the Smoothing is for making the visualization better. So, rejecting this copilot suggestion.

Comment on lines +81 to +93
full_time = np.arange(0, full_length) / sampling_rate / lead_number
important_indices_full = np.where(
norm_attributions_ecg[:full_length] >= ecg_threshold
)[0]

zoom_start = int(zoom_range[0] * 6000)
zoom_end = int(zoom_range[1] * 6000)
zoom_time = np.arange(zoom_start, zoom_end) / sampling_rate / lead_number
segment_ecg_waveform = ecg_waveform_np[zoom_start:zoom_end]
segment_attributions = norm_attributions_ecg[zoom_start:zoom_end]
important_indices_zoom = np.where(segment_attributions >= ecg_threshold)[0]
zoom_start_sec = zoom_start / sampling_rate / lead_number
zoom_end_sec = zoom_end / sampling_rate / lead_number
Copy link

Copilot AI Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dividing time indices by lead_number skews the time axis. Time should be computed as np.arange(full_length) / sampling_rate without dividing by the number of leads.

Suggested change
full_time = np.arange(0, full_length) / sampling_rate / lead_number
important_indices_full = np.where(
norm_attributions_ecg[:full_length] >= ecg_threshold
)[0]
zoom_start = int(zoom_range[0] * 6000)
zoom_end = int(zoom_range[1] * 6000)
zoom_time = np.arange(zoom_start, zoom_end) / sampling_rate / lead_number
segment_ecg_waveform = ecg_waveform_np[zoom_start:zoom_end]
segment_attributions = norm_attributions_ecg[zoom_start:zoom_end]
important_indices_zoom = np.where(segment_attributions >= ecg_threshold)[0]
zoom_start_sec = zoom_start / sampling_rate / lead_number
zoom_end_sec = zoom_end / sampling_rate / lead_number
full_time = np.arange(0, full_length) / sampling_rate
important_indices_full = np.where(
norm_attributions_ecg[:full_length] >= ecg_threshold
)[0]
zoom_start = int(zoom_range[0] * 6000)
zoom_end = int(zoom_range[1] * 6000)
zoom_time = np.arange(zoom_start, zoom_end) / sampling_rate
segment_ecg_waveform = ecg_waveform_np[zoom_start:zoom_end]
segment_attributions = norm_attributions_ecg[zoom_start:zoom_end]
important_indices_zoom = np.where(segment_attributions >= ecg_threshold)[0]
zoom_start_sec = zoom_start / sampling_rate
zoom_end_sec = zoom_end / sampling_rate

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, rejecting this suggestion as we divide by the lead to get the values in seconds.

Comment on lines +25 to +42
# Gather all batches (as in your code)
batches = list(last_val_loader)
all_xray_images, all_ecg_waveforms, all_labels = [
torch.cat(items) for items in zip(*batches)
]

# --- Select Sample ---
xray_image = (
all_xray_images[sample_idx]
.unsqueeze(0)
.to(next(last_fold_model.parameters()).device)
)
ecg_waveform = (
all_ecg_waveforms[sample_idx]
.unsqueeze(0)
.to(next(last_fold_model.parameters()).device)
)
label = all_labels[sample_idx].item()
Copy link

Copilot AI Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concatenating all validation batches into memory may be expensive for large datasets. Consider indexing directly into the DataLoader's dataset or loading only the required sample to reduce memory usage.

Suggested change
# Gather all batches (as in your code)
batches = list(last_val_loader)
all_xray_images, all_ecg_waveforms, all_labels = [
torch.cat(items) for items in zip(*batches)
]
# --- Select Sample ---
xray_image = (
all_xray_images[sample_idx]
.unsqueeze(0)
.to(next(last_fold_model.parameters()).device)
)
ecg_waveform = (
all_ecg_waveforms[sample_idx]
.unsqueeze(0)
.to(next(last_fold_model.parameters()).device)
)
label = all_labels[sample_idx].item()
# --- Select Sample ---
xray_image, ecg_waveform, label = last_val_loader.dataset[sample_idx]
xray_image = (
xray_image.unsqueeze(0)
.to(next(last_fold_model.parameters()).device)
)
ecg_waveform = (
ecg_waveform.unsqueeze(0)
.to(next(last_fold_model.parameters()).device)
)
label = label.item()

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for now. Rejecting it.

Copy link
Member

@shuo-zhou shuo-zhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address my and Copilot's comments.

from scipy.ndimage import binary_dilation


def multimodal_ecg_cxr_attribution(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add docstring. Consider how to integrate this function to kale in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can finetune_config.py and pretraining_config.py be merged into one? You can keep the two files as they are now and seek feedback from other team members and Pete/Kelly later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, separate is better. As both are separate operations and for the perticipent it will easier to distinguish the args for pre-trainign and fine-tuning.

EPOCHS: 10
LR: 0.001
HIDDEN_DIM: 128
NUM_CLASSES: 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need if not different from the default values in *_config.py, check all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping the LR and Epochs, the participants can play with these parameters to see the performance difference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the sys.stdout AttributeError and remove all empty cells

ZOOM_RANGE: [3, 3.5]
ECG_THRESHOLD: 0.7
CXR_THRESHOLD: 0.7
LEAD_NUMBER: 12
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between NUM_LEADS and LEAD_NUMBER. Is LEAD_INDEX better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed LEAD_NUMBER.

@Mdnaimulislam Mdnaimulislam enabled auto-merge (squash) June 24, 2025 10:52
Copy link
Contributor

@wenruifan wenruifan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Mdnaimulislam Mdnaimulislam merged commit 8af0b79 into main Jun 24, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants