TIME: Tumor-grade IDH-mutation Multimodal Ensemble Model

English | 한국어

Overview

TIME (Tumor-grade IDH-mutation Multimodal Ensemble) is a deep learning framework for predicting:

IDH Mutation Status: Binary classification (Wild-type vs Mutant)
WHO Grade: Multi-class classification (Grade 2, 3, 4)

The framework consists of two complementary models:

Model A (AM Model): Aggregated MRI Model - processes T1, T1Gd, T2, FLAIR sequences
Model B (PM Model): Patient Microstructure Model - processes DTI maps with clinical features

These models can be used independently or combined in an ensemble for improved performance.

Model Architecture

Model A: Aggregated MRI Model (AM)

Input: 4-channel MRI (T1, T1Gd, T2, FLAIR) as 3D volumes
Encoder: 3D CNN (ResNet, DenseNet, SegResNet, etc.)
Output: IDH logits + Grade logits + Feature vector (for fusion)

Model B: Patient Microstructure Model (PM)

Model B processes scalar features instead of 3D images:

Input 1 - Metadata:
- Age (continuous, normalized)
- Gender (binary: 0=F, 1=M)
- 1p/19q Codeletion (binary: 0=No, 1=Yes, -1=Unknown)
Input 2 - DTI Scalar Metrics:
- FA (Fractional Anisotropy)
- AD (Axial Diffusivity)
- RD (Radial Diffusivity)
- MD (Mean Diffusivity)
Architecture: MLP encoders + Feature Fusion
Output: IDH logits + Grade logits + Feature vector (for ML fusion)

Feature Extraction for ML Fusion

Both models support feature extraction for fusion with external machine learning:

# Model A: Extract features from MRI
features_a = model_a.get_features(mri_tensor)  # [B, feat_dim]

# Model B: Extract features from metadata + DTI
features_b = model_b.get_features(age, gender, codeletion, fa, ad, rd, md)  # [B, feat_dim]

# Fusion with external ML model (blackbox)
fused = fusion_module(features_b, ml_predictions)

Ensemble

Methods: Weighted Average, Stacking, Voting, Feature Fusion
ML Fusion: MetadataMLFusion class for combining with external ML models

Architecture Variants (Ablation Study)

baseline: Standard MLP/CNN encoder
transformer: Encoder + Transformer blocks
mamba: Encoder + State-Space Model (Mamba) blocks
transformer_mamba: Hybrid approach

Project Structure

TIME/
├── README.md                    # English documentation
├── README_KR.md                 # Korean documentation
├── requirements.txt             # Python dependencies
├── .gitignore                   # Git ignore rules
│
├── config.py                    # Global configuration
├── train.py                     # Training script
├── evaluate.py                  # Evaluation script
├── visualize_slices.py          # MRI slice visualization
│
├── data/                        # Data loading modules
│   ├── __init__.py
│   ├── io.py                    # CSV parsing, data loading
│   ├── transforms.py            # MONAI transforms
│   └── loaders.py               # DataLoader creation
│
├── models/                      # Model definitions
│   ├── __init__.py
│   ├── model_a.py               # Model A (AM Model)
│   ├── model_b.py               # Model B (PM Model)
│   ├── ensemble.py              # Ensemble model
│   ├── common.py                # Shared components
│   ├── get_backbone.py          # Model factory
│   └── backbones/               # Backbone networks
│       └── __init__.py
│
├── engine/                      # Training engine
│   ├── __init__.py
│   ├── trainer.py               # Training/evaluation loops
│   ├── losses.py                # Loss functions
│   └── metrics.py               # Evaluation metrics
│
├── utils/                       # Utilities
│   ├── __init__.py
│   ├── seed.py                  # Reproducibility
│   ├── logger.py                # Logging
│   └── visualization.py         # Plotting functions
│
├── experiments/                 # Experiment outputs (gitignored)
│   └── .gitkeep
│
├── datasets/                    # Dataset folder (gitignored)
│   └── .gitkeep
│
├── imgs/                        # Documentation images
│   └── overview_time.png
│
└── scripts/                     # Shell scripts

Installation

Clone Repository

This is a private repository. You need a GitHub Personal Access Token to clone.

1. Generate GitHub Personal Access Token

Go to GitHub → Settings → Developer settings → Personal access tokens → Tokens (classic)
Click "Generate new token (classic)"
Set expiration and select repo scope
Copy the generated token (shown only once!)

2. Clone with Token

# Format: git clone https://<USERNAME>:<TOKEN>@github.com/khkim1729/TIME.git

# Example:
git clone https://gildong:ghp_haedalissupergoodenough@github.com/khkim1729/TIME.git

# Navigate to project
cd TIME

Virtual Environment Setup

# Create virtual environment
python -m venv .venv

# Activate (Linux/Mac)
source .venv/bin/activate

# Activate (Windows)
.venv\Scripts\activate

Install Dependencies

# Upgrade pip
pip install --upgrade pip

# Install PyTorch (adjust CUDA version as needed)
# For CUDA 11.8:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

# For CUDA 12.1:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

# Install other dependencies
pip install -r requirements.txt

Verify Installation

python -c "import torch; import monai; print(f'PyTorch: {torch.__version__}'); print(f'MONAI: {monai.__version__}'); print(f'CUDA: {torch.cuda.is_available()}')"

Dataset Setup

Dataset Structure

The datasets should be organized as follows:

TIME/datasets/
├── EGD/                              # European Glioma Dataset
│   ├── Amodel.csv                    # Metadata CSV
│   └── EGD 001-342/                  # Patient data
│       ├── EGD 000-099/
│       │   └── MR_EGD-0080/
│       │       ├── 1_T1/
│       │       │   └── NIFTI/
│       │       │       └── T1.nii.gz
│       │       ├── 2_T1GD/
│       │       │   └── NIFTI/
│       │       ├── 3_T2/
│       │       │   └── NIFTI/
│       │       ├── 4_FLAIR/
│       │       │   └── NIFTI/
│       │       └── mask/
│       │           └── tumor_mask.nii.gz
│       └── ...
├── EGD_stripped/                     # Skull-stripped version
│   └── ...
└── UPENN/                            # UPenn GBM dataset
    └── ...

CSV Format

The CSV file should contain the following columns:

Model A (MRI) - Required columns:

Column	Description	Example
subject	Patient ID	EGD-0080
gender	M/F	M
t1	Path to T1	EGD 001-342/.../T1.nii.gz
t1gd	Path to T1Gd	EGD 001-342/.../T1GD.nii.gz
t2	Path to T2	EGD 001-342/.../T2.nii.gz
flair	Path to FLAIR	EGD 001-342/.../FLAIR.nii.gz
mask	Path to mask	EGD 001-342/.../mask.nii.gz
idh	IDH status	0 (WT), 1 (Mut), -1 (Unknown)
grade	WHO Grade	2, 3, or 4

Model B (Metadata + DTI) - Additional columns:

Column	Description	Example
age	Patient age	55
codeletion_1p19q	1p/19q codeletion	0, 1, or -1 (Unknown)
fa	Fractional Anisotropy (scalar)	0.35
ad	Axial Diffusivity (mm²/s)	0.0012
rd	Radial Diffusivity (mm²/s)	0.0008
md	Mean Diffusivity (mm²/s)	0.0010

Syncing Dataset from Remote Server

Use rsync to transfer datasets from another server:

# Sync entire datasets folder
rsync -avzP user@remote_server:/path/to/datasets/ ./datasets/

# Sync specific dataset
rsync -avzP user@remote_server:/path/to/datasets/EGD/ ./datasets/EGD/

# With SSH key
rsync -avzP -e "ssh -i ~/.ssh/id_rsa" user@remote_server:/path/to/datasets/ ./datasets/

Download Preprocessed Slice Data

For faster setup, you can download preprocessed slice data that has already been extracted and processed:

Prerequisites

Install gdown to download from Google Drive:

# Activate your virtual environment first
source .venv/bin/activate

# Install gdown
pip install gdown

Download and Extract

# Download preprocessed slices (slices_time.tar.gz)
gdown 1d4AWAgSpSZ67O8omzF1iWys78rVkCmZH

# Extract the tar.gz file
tar -xzvf slices_time.tar.gz

# This will create two directories:
# - slices_out/         # Main preprocessed slices
# - slices_out_final/   # Final processed slices

# Verify extraction
ls -la slices_out/ slices_out_final/

# Clean up the tar file (optional)
rm slices_time.tar.gz

What's Included

The preprocessed data contains:

slices_out/: Extracted 2.5D slices from MRI volumes
slices_out_final/: Final processed slices ready for training

This preprocessed data allows you to skip the time-consuming slice extraction step and start training immediately.

Usage

Training

Basic Training (Model A)

python train.py \
    --dataset EGD \
    --model_type A \
    --backbone resnet3d18 \
    --variant baseline \
    --epochs 100 \
    --batch_size 2 \
    --lr 1e-4

Training with Different Backbones

# ResNet-50
python train.py --dataset EGD --backbone resnet3d50 --epochs 100

# DenseNet-121
python train.py --dataset EGD --backbone densenet3d121 --epochs 100

# SegResNet
python train.py --dataset EGD --backbone segresnet --epochs 100

Ablation Study: Architecture Variants

# Baseline
python train.py --dataset EGD --backbone resnet3d18 --variant baseline

# With Transformer blocks
python train.py --dataset EGD --backbone resnet3d18 --variant transformer

# With Mamba (SSM) blocks
python train.py --dataset EGD --backbone resnet3d18 --variant mamba

# Hybrid (Transformer + Mamba)
python train.py --dataset EGD --backbone resnet3d18 --variant transformer_mamba

Custom CSV and Data Path

python train.py \
    --csv /path/to/custom.csv \
    --data_root /path/to/data \
    --model_type A \
    --backbone resnet3d18

Evaluation

Evaluate Single Checkpoint

python evaluate.py --ckpt ./experiments/EGD/model_A/resnet3d18/baseline/default/checkpoints/best.pt

Evaluate Multiple Checkpoints

python evaluate.py --ckpt_dir ./experiments/ --output_dir ./evaluation_results

With Training Curves

python evaluate.py --ckpt ./path/to/checkpoint.pt --plot_curves --output_dir ./results

Visualization

Visualize MRI Slices

python visualize_slices.py \
    --csv ./datasets/EGD/Amodel.csv \
    --data_root ./datasets/EGD \
    --output_dir ./visualizations \
    --num_samples 10 \
    --show_mask

Auto Training Scheduler

For running multiple experiments automatically across multiple GPUs, use the auto training scheduler:

Test Scheduler (Recommended First)

Test the scheduler with short experiments (3 epochs each):

# Activate virtual environment
source .venv/bin/activate

# Run test scheduler
python test_scheduler.py

Full Auto Training

Run the complete training scheduler (225 experiments, 150 epochs each):

# Method 1: Easy way
./run_auto_training.sh

# Method 2: Direct execution
source .venv/bin/activate
python auto_train_scheduler.py

Scheduler Features

Multi-GPU Support: Uses 3 GPUs (GPU 1, 2, 3)
Parallel Execution: 2 training processes per GPU (6 total)
Auto Queue Management: When one experiment finishes, the next one starts automatically
Comprehensive Coverage: Tests all combinations of:
- Backbones: resnet18, resnet50, densenet121, unet_encoder, highresnet
- Model Types: A, B, ensemble
- Fusion Methods: concat, add, gated
- Ensemble Methods: stacking, voting, feature_fusion

Monitor Training

# View real-time logs
tail -f scheduler_logs/slot_a_*.log

# Check all running experiments
ls -la scheduler_logs/

# Stop scheduler
# Press Ctrl+C (all running experiments will be cleaned up automatically)

Expected Output

The scheduler will automatically generate TSV files with detailed results that can be imported directly into Google Sheets for analysis.

Experiments

Experiment Output Structure

Each experiment creates a structured output directory:

experiments/
└── EGD/                           # Dataset
    └── model_A/                   # Model type
        └── resnet3d18/            # Backbone
            └── baseline/          # Variant
                └── exp_20240101/  # Experiment name
                    ├── checkpoints/
                    │   ├── best.pt
                    │   └── epoch_050.pt
                    ├── logs/
                    │   └── training_log.json
                    └── visualizations/
                        └── training_curves.png

Ablation Study

Run systematic ablation study:

# All backbone + variant combinations
for backbone in resnet3d18 resnet3d50 densenet3d121 segresnet; do
    for variant in baseline transformer mamba transformer_mamba; do
        python train.py \
            --dataset EGD \
            --backbone $backbone \
            --variant $variant \
            --experiment_name "ablation_${backbone}_${variant}"
    done
done

Leaderboard

Training automatically updates experiments/leaderboard.json:

[
    {
        "timestamp": "2024-01-01T12:00:00",
        "dataset": "EGD",
        "backbone": "resnet3d18",
        "variant": "mamba",
        "best_score": 0.8542,
        "best_epoch": 67
    },
    ...
]

View leaderboard:

cat experiments/leaderboard.json | python -m json.tool

Configuration

Key configuration options in config.py:

Parameter	Default	Description
`model_type`	"A"	Model type: A, B, or ensemble
`backbone_name`	"resnet3d18"	Backbone architecture
`architecture_variant`	"baseline"	baseline/transformer/mamba/transformer_mamba
`max_epochs`	100	Training epochs
`batch_size`	2	Patients per batch
`lr`	1e-4	Learning rate
`roi_size`	(128,128,128)	Patch size
`num_samples`	4	Patches per patient
`w_idh`	1.0	IDH loss weight
`w_grade`	1.0	Grade loss weight

Results

Results will be published upon paper acceptance.

Model	Dataset	AUC (IDH)	Accuracy (Grade)
Model A (ResNet18)	EGD	-	-
Model A (Mamba)	EGD	-	-
Ensemble	EGD	-	-

Setting Up on New Server

Complete setup guide for a new server:

# 1. Clone repository
git clone https://<USERNAME>:<TOKEN>@github.com/khkim1729/TIME.git
cd TIME

# 2. Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate

# 3. Install PyTorch (check CUDA version with: nvidia-smi)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

# 4. Install dependencies
pip install -r requirements.txt

# 5. Sync datasets
rsync -avzP user@source_server:/path/to/datasets/ ./datasets/

# 6. Verify setup
python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}')"

# 7. Run training
python train.py --dataset EGD --backbone resnet3d18 --epochs 10

Git Workflow

Push Changes

# Check status
git status

# Add files
git add .

# Commit
git commit -m "Add new feature"

# Push (with token in remote URL)
git push origin main

# Or set up credential helper
git config --global credential.helper store
git push origin main  # Enter token when prompted

Pull Latest Changes

git pull origin main

Citation

If you use this code in your research, please cite:

@article{time2024,
  title={TIME: Tumor-grade IDH-mutation Multimodal Ensemble Model},
  author={},
  journal={},
  year={2024}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

For questions or issues, please open a GitHub issue or contact the authors.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
data		data
datasets		datasets
datasets_csv		datasets_csv
engine		engine
imgs		imgs
models		models
newloader		newloader
scripts		scripts
slicer		slicer
tta		tta
utils		utils
.gitignore		.gitignore
.gitkeep		.gitkeep
README.md		README.md
README_KR.md		README_KR.md
README_KR_DATA.md		README_KR_DATA.md
auto_test_scheduler.py		auto_test_scheduler.py
auto_train_scheduler.py		auto_train_scheduler.py
auto_train_scheduler2.py		auto_train_scheduler2.py
config.py		config.py
create_dataset_tars.py		create_dataset_tars.py
data_loader.py		data_loader.py
evaluate.py		evaluate.py
requirements.txt		requirements.txt
status.txt		status.txt
test_final.py		test_final.py
train_final.py		train_final.py
visualize_slices.py		visualize_slices.py
visualize_validation_results.py		visualize_validation_results.py
work.md		work.md

Folders and files

Latest commit

History

Repository files navigation

TIME: Tumor-grade IDH-mutation Multimodal Ensemble Model

Table of Contents

Overview

Model Architecture

Model A: Aggregated MRI Model (AM)

Model B: Patient Microstructure Model (PM)

Feature Extraction for ML Fusion

Ensemble

Architecture Variants (Ablation Study)

Project Structure

Installation

Clone Repository

1. Generate GitHub Personal Access Token

2. Clone with Token

Virtual Environment Setup

Install Dependencies

Verify Installation

Dataset Setup

Dataset Structure

CSV Format

Syncing Dataset from Remote Server

Download Preprocessed Slice Data

Prerequisites

Download and Extract

What's Included

Usage

Training

Basic Training (Model A)

Training with Different Backbones

Ablation Study: Architecture Variants

Custom CSV and Data Path

Evaluation

Evaluate Single Checkpoint

Evaluate Multiple Checkpoints

With Training Curves

Visualization

Visualize MRI Slices

Auto Training Scheduler

Test Scheduler (Recommended First)

Full Auto Training

Scheduler Features

Monitor Training

Expected Output

Experiments

Experiment Output Structure

Ablation Study

Leaderboard

Configuration

Results

Setting Up on New Server

Git Workflow

Push Changes

Pull Latest Changes

Citation

License

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages