-
Notifications
You must be signed in to change notification settings - Fork 2
Add Heart tutorial #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds a new tutorial for cardiac hemodynamics assessment using multimodal CXR and ECG data, including utilities, configs, and an interpretation script.
- Introduces
remap_model_parameters.pyto align pretrained checkpoint keys with renamed parameters - Adds separate pretraining and finetuning configuration modules plus corresponding experiment YAMLs
- Implements
interpret.pyfor integrated‐gradients attribution on ECG and CXR, and updates the tutorial Table of Contents
Reviewed Changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tutorials/cardiac-hemodynamics-assesment/remap_model_parameters.py | Utility to remap pretrained model parameter names |
| tutorials/cardiac-hemodynamics-assesment/pretraining_config.py | Default configuration for pretraining |
| tutorials/cardiac-hemodynamics-assesment/interpret.py | Script for multimodal ECG+CXR attribution |
| tutorials/cardiac-hemodynamics-assesment/finetune_config.py | Default configuration for finetuning |
| tutorials/cardiac-hemodynamics-assesment/experiments/pretraining_base.yml | Base YAML for pretraining experiments |
| tutorials/cardiac-hemodynamics-assesment/experiments/finetune_base.yml | Base YAML for finetuning experiments |
| _toc.yml | Added cardiac-hemodynamics-assesment notebook entry |
Comments suppressed due to low confidence (2)
_toc.yml:26
- The folder name 'cardiac-hemodynamics-assesment' is misspelled; consider renaming it to 'cardiac-hemodynamics-assessment' for consistency.
- file: tutorials/cardiac-hemodynamics-assesment/notebook
tutorials/cardiac-hemodynamics-assesment/remap_model_parameters.py:4
- Add unit tests for
remap_state_dict_keysto verify that each mapping rule correctly renames all expected keys.
def remap_state_dict_keys(state_dict):
tutorials/cardiac-hemodynamics-assesment/remap_model_parameters.py
Outdated
Show resolved
Hide resolved
| # --- Prediction --- | ||
| last_fold_model.eval() | ||
| with torch.no_grad(): | ||
| logits = last_fold_model(xray_image, ecg_waveform) |
Copilot
AI
Jun 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model forward pass uses the raw ecg_waveform, but IntegratedGradients is applied to ecg_smoothed_tensor. This mismatch can produce incorrect attributions; use the same input tensor in both the prediction and attribution steps.
| logits = last_fold_model(xray_image, ecg_waveform) | |
| logits = last_fold_model(xray_image, ecg_smoothed_tensor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for prediction. so the Smoothing is for making the visualization better. So, rejecting this copilot suggestion.
| full_time = np.arange(0, full_length) / sampling_rate / lead_number | ||
| important_indices_full = np.where( | ||
| norm_attributions_ecg[:full_length] >= ecg_threshold | ||
| )[0] | ||
|
|
||
| zoom_start = int(zoom_range[0] * 6000) | ||
| zoom_end = int(zoom_range[1] * 6000) | ||
| zoom_time = np.arange(zoom_start, zoom_end) / sampling_rate / lead_number | ||
| segment_ecg_waveform = ecg_waveform_np[zoom_start:zoom_end] | ||
| segment_attributions = norm_attributions_ecg[zoom_start:zoom_end] | ||
| important_indices_zoom = np.where(segment_attributions >= ecg_threshold)[0] | ||
| zoom_start_sec = zoom_start / sampling_rate / lead_number | ||
| zoom_end_sec = zoom_end / sampling_rate / lead_number |
Copilot
AI
Jun 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dividing time indices by lead_number skews the time axis. Time should be computed as np.arange(full_length) / sampling_rate without dividing by the number of leads.
| full_time = np.arange(0, full_length) / sampling_rate / lead_number | |
| important_indices_full = np.where( | |
| norm_attributions_ecg[:full_length] >= ecg_threshold | |
| )[0] | |
| zoom_start = int(zoom_range[0] * 6000) | |
| zoom_end = int(zoom_range[1] * 6000) | |
| zoom_time = np.arange(zoom_start, zoom_end) / sampling_rate / lead_number | |
| segment_ecg_waveform = ecg_waveform_np[zoom_start:zoom_end] | |
| segment_attributions = norm_attributions_ecg[zoom_start:zoom_end] | |
| important_indices_zoom = np.where(segment_attributions >= ecg_threshold)[0] | |
| zoom_start_sec = zoom_start / sampling_rate / lead_number | |
| zoom_end_sec = zoom_end / sampling_rate / lead_number | |
| full_time = np.arange(0, full_length) / sampling_rate | |
| important_indices_full = np.where( | |
| norm_attributions_ecg[:full_length] >= ecg_threshold | |
| )[0] | |
| zoom_start = int(zoom_range[0] * 6000) | |
| zoom_end = int(zoom_range[1] * 6000) | |
| zoom_time = np.arange(zoom_start, zoom_end) / sampling_rate | |
| segment_ecg_waveform = ecg_waveform_np[zoom_start:zoom_end] | |
| segment_attributions = norm_attributions_ecg[zoom_start:zoom_end] | |
| important_indices_zoom = np.where(segment_attributions >= ecg_threshold)[0] | |
| zoom_start_sec = zoom_start / sampling_rate | |
| zoom_end_sec = zoom_end / sampling_rate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, rejecting this suggestion as we divide by the lead to get the values in seconds.
| # Gather all batches (as in your code) | ||
| batches = list(last_val_loader) | ||
| all_xray_images, all_ecg_waveforms, all_labels = [ | ||
| torch.cat(items) for items in zip(*batches) | ||
| ] | ||
|
|
||
| # --- Select Sample --- | ||
| xray_image = ( | ||
| all_xray_images[sample_idx] | ||
| .unsqueeze(0) | ||
| .to(next(last_fold_model.parameters()).device) | ||
| ) | ||
| ecg_waveform = ( | ||
| all_ecg_waveforms[sample_idx] | ||
| .unsqueeze(0) | ||
| .to(next(last_fold_model.parameters()).device) | ||
| ) | ||
| label = all_labels[sample_idx].item() |
Copilot
AI
Jun 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concatenating all validation batches into memory may be expensive for large datasets. Consider indexing directly into the DataLoader's dataset or loading only the required sample to reduce memory usage.
| # Gather all batches (as in your code) | |
| batches = list(last_val_loader) | |
| all_xray_images, all_ecg_waveforms, all_labels = [ | |
| torch.cat(items) for items in zip(*batches) | |
| ] | |
| # --- Select Sample --- | |
| xray_image = ( | |
| all_xray_images[sample_idx] | |
| .unsqueeze(0) | |
| .to(next(last_fold_model.parameters()).device) | |
| ) | |
| ecg_waveform = ( | |
| all_ecg_waveforms[sample_idx] | |
| .unsqueeze(0) | |
| .to(next(last_fold_model.parameters()).device) | |
| ) | |
| label = all_labels[sample_idx].item() | |
| # --- Select Sample --- | |
| xray_image, ecg_waveform, label = last_val_loader.dataset[sample_idx] | |
| xray_image = ( | |
| xray_image.unsqueeze(0) | |
| .to(next(last_fold_model.parameters()).device) | |
| ) | |
| ecg_waveform = ( | |
| ecg_waveform.unsqueeze(0) | |
| .to(next(last_fold_model.parameters()).device) | |
| ) | |
| label = label.item() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for now. Rejecting it.
shuo-zhou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address my and Copilot's comments.
| from scipy.ndimage import binary_dilation | ||
|
|
||
|
|
||
| def multimodal_ecg_cxr_attribution( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add docstring. Consider how to integrate this function to kale in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can finetune_config.py and pretraining_config.py be merged into one? You can keep the two files as they are now and seek feedback from other team members and Pete/Kelly later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, separate is better. As both are separate operations and for the perticipent it will easier to distinguish the args for pre-trainign and fine-tuning.
| EPOCHS: 10 | ||
| LR: 0.001 | ||
| HIDDEN_DIM: 128 | ||
| NUM_CLASSES: 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need if not different from the default values in *_config.py, check all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping the LR and Epochs, the participants can play with these parameters to see the performance difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the sys.stdout AttributeError and remove all empty cells
| ZOOM_RANGE: [3, 3.5] | ||
| ECG_THRESHOLD: 0.7 | ||
| CXR_THRESHOLD: 0.7 | ||
| LEAD_NUMBER: 12 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between NUM_LEADS and LEAD_NUMBER. Is LEAD_INDEX better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed LEAD_NUMBER.
# Conflicts: # _toc.yml
wenruifan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Adds a tutorial for cardiac hemodynamics assessment using PyKale with multimodal low-cost CXR and ECG modalities.
Notebook structure consisting of: