Skip to content

khkim1729/TIME

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TIME: Tumor-grade IDH-mutation Multimodal Ensemble Model

English | 한국어


Table of Contents


Overview

TIME (Tumor-grade IDH-mutation Multimodal Ensemble) is a deep learning framework for predicting:

  1. IDH Mutation Status: Binary classification (Wild-type vs Mutant)
  2. WHO Grade: Multi-class classification (Grade 2, 3, 4)

The framework consists of two complementary models:

  • Model A (AM Model): Aggregated MRI Model - processes T1, T1Gd, T2, FLAIR sequences
  • Model B (PM Model): Patient Microstructure Model - processes DTI maps with clinical features

These models can be used independently or combined in an ensemble for improved performance.


Model Architecture

TIME Model Overview

Model A: Aggregated MRI Model (AM)

  • Input: 4-channel MRI (T1, T1Gd, T2, FLAIR) as 3D volumes
  • Encoder: 3D CNN (ResNet, DenseNet, SegResNet, etc.)
  • Output: IDH logits + Grade logits + Feature vector (for fusion)

Model B: Patient Microstructure Model (PM)

Model B processes scalar features instead of 3D images:

  • Input 1 - Metadata:

    • Age (continuous, normalized)
    • Gender (binary: 0=F, 1=M)
    • 1p/19q Codeletion (binary: 0=No, 1=Yes, -1=Unknown)
  • Input 2 - DTI Scalar Metrics:

    • FA (Fractional Anisotropy)
    • AD (Axial Diffusivity)
    • RD (Radial Diffusivity)
    • MD (Mean Diffusivity)
  • Architecture: MLP encoders + Feature Fusion

  • Output: IDH logits + Grade logits + Feature vector (for ML fusion)

Feature Extraction for ML Fusion

Both models support feature extraction for fusion with external machine learning:

# Model A: Extract features from MRI
features_a = model_a.get_features(mri_tensor)  # [B, feat_dim]

# Model B: Extract features from metadata + DTI
features_b = model_b.get_features(age, gender, codeletion, fa, ad, rd, md)  # [B, feat_dim]

# Fusion with external ML model (blackbox)
fused = fusion_module(features_b, ml_predictions)

Ensemble

  • Methods: Weighted Average, Stacking, Voting, Feature Fusion
  • ML Fusion: MetadataMLFusion class for combining with external ML models

Architecture Variants (Ablation Study)

  • baseline: Standard MLP/CNN encoder
  • transformer: Encoder + Transformer blocks
  • mamba: Encoder + State-Space Model (Mamba) blocks
  • transformer_mamba: Hybrid approach

Project Structure

TIME/
├── README.md                    # English documentation
├── README_KR.md                 # Korean documentation
├── requirements.txt             # Python dependencies
├── .gitignore                   # Git ignore rules
│
├── config.py                    # Global configuration
├── train.py                     # Training script
├── evaluate.py                  # Evaluation script
├── visualize_slices.py          # MRI slice visualization
│
├── data/                        # Data loading modules
│   ├── __init__.py
│   ├── io.py                    # CSV parsing, data loading
│   ├── transforms.py            # MONAI transforms
│   └── loaders.py               # DataLoader creation
│
├── models/                      # Model definitions
│   ├── __init__.py
│   ├── model_a.py               # Model A (AM Model)
│   ├── model_b.py               # Model B (PM Model)
│   ├── ensemble.py              # Ensemble model
│   ├── common.py                # Shared components
│   ├── get_backbone.py          # Model factory
│   └── backbones/               # Backbone networks
│       └── __init__.py
│
├── engine/                      # Training engine
│   ├── __init__.py
│   ├── trainer.py               # Training/evaluation loops
│   ├── losses.py                # Loss functions
│   └── metrics.py               # Evaluation metrics
│
├── utils/                       # Utilities
│   ├── __init__.py
│   ├── seed.py                  # Reproducibility
│   ├── logger.py                # Logging
│   └── visualization.py         # Plotting functions
│
├── experiments/                 # Experiment outputs (gitignored)
│   └── .gitkeep
│
├── datasets/                    # Dataset folder (gitignored)
│   └── .gitkeep
│
├── imgs/                        # Documentation images
│   └── overview_time.png
│
└── scripts/                     # Shell scripts

Installation

Clone Repository

This is a private repository. You need a GitHub Personal Access Token to clone.

1. Generate GitHub Personal Access Token

  1. Go to GitHub → Settings → Developer settings → Personal access tokens → Tokens (classic)
  2. Click "Generate new token (classic)"
  3. Set expiration and select repo scope
  4. Copy the generated token (shown only once!)

2. Clone with Token

# Format: git clone https://<USERNAME>:<TOKEN>@github.com/khkim1729/TIME.git

# Example:
git clone https://gildong:ghp_haedalissupergoodenough@github.com/khkim1729/TIME.git

# Navigate to project
cd TIME

Virtual Environment Setup

# Create virtual environment
python -m venv .venv

# Activate (Linux/Mac)
source .venv/bin/activate

# Activate (Windows)
.venv\Scripts\activate

Install Dependencies

# Upgrade pip
pip install --upgrade pip

# Install PyTorch (adjust CUDA version as needed)
# For CUDA 11.8:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

# For CUDA 12.1:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

# Install other dependencies
pip install -r requirements.txt

Verify Installation

python -c "import torch; import monai; print(f'PyTorch: {torch.__version__}'); print(f'MONAI: {monai.__version__}'); print(f'CUDA: {torch.cuda.is_available()}')"

Dataset Setup

Dataset Structure

The datasets should be organized as follows:

TIME/datasets/
├── EGD/                              # European Glioma Dataset
│   ├── Amodel.csv                    # Metadata CSV
│   └── EGD 001-342/                  # Patient data
│       ├── EGD 000-099/
│       │   └── MR_EGD-0080/
│       │       ├── 1_T1/
│       │       │   └── NIFTI/
│       │       │       └── T1.nii.gz
│       │       ├── 2_T1GD/
│       │       │   └── NIFTI/
│       │       ├── 3_T2/
│       │       │   └── NIFTI/
│       │       ├── 4_FLAIR/
│       │       │   └── NIFTI/
│       │       └── mask/
│       │           └── tumor_mask.nii.gz
│       └── ...
├── EGD_stripped/                     # Skull-stripped version
│   └── ...
└── UPENN/                            # UPenn GBM dataset
    └── ...

CSV Format

The CSV file should contain the following columns:

Model A (MRI) - Required columns:

Column Description Example
subject Patient ID EGD-0080
gender M/F M
t1 Path to T1 EGD 001-342/.../T1.nii.gz
t1gd Path to T1Gd EGD 001-342/.../T1GD.nii.gz
t2 Path to T2 EGD 001-342/.../T2.nii.gz
flair Path to FLAIR EGD 001-342/.../FLAIR.nii.gz
mask Path to mask EGD 001-342/.../mask.nii.gz
idh IDH status 0 (WT), 1 (Mut), -1 (Unknown)
grade WHO Grade 2, 3, or 4

Model B (Metadata + DTI) - Additional columns:

Column Description Example
age Patient age 55
codeletion_1p19q 1p/19q codeletion 0, 1, or -1 (Unknown)
fa Fractional Anisotropy (scalar) 0.35
ad Axial Diffusivity (mm²/s) 0.0012
rd Radial Diffusivity (mm²/s) 0.0008
md Mean Diffusivity (mm²/s) 0.0010

Syncing Dataset from Remote Server

Use rsync to transfer datasets from another server:

# Sync entire datasets folder
rsync -avzP user@remote_server:/path/to/datasets/ ./datasets/

# Sync specific dataset
rsync -avzP user@remote_server:/path/to/datasets/EGD/ ./datasets/EGD/

# With SSH key
rsync -avzP -e "ssh -i ~/.ssh/id_rsa" user@remote_server:/path/to/datasets/ ./datasets/

Download Preprocessed Slice Data

For faster setup, you can download preprocessed slice data that has already been extracted and processed:

Prerequisites

Install gdown to download from Google Drive:

# Activate your virtual environment first
source .venv/bin/activate

# Install gdown
pip install gdown

Download and Extract

# Download preprocessed slices (slices_time.tar.gz)
gdown 1d4AWAgSpSZ67O8omzF1iWys78rVkCmZH

# Extract the tar.gz file
tar -xzvf slices_time.tar.gz

# This will create two directories:
# - slices_out/         # Main preprocessed slices
# - slices_out_final/   # Final processed slices

# Verify extraction
ls -la slices_out/ slices_out_final/

# Clean up the tar file (optional)
rm slices_time.tar.gz

What's Included

The preprocessed data contains:

  • slices_out/: Extracted 2.5D slices from MRI volumes
  • slices_out_final/: Final processed slices ready for training

This preprocessed data allows you to skip the time-consuming slice extraction step and start training immediately.


Usage

Training

Basic Training (Model A)

python train.py \
    --dataset EGD \
    --model_type A \
    --backbone resnet3d18 \
    --variant baseline \
    --epochs 100 \
    --batch_size 2 \
    --lr 1e-4

Training with Different Backbones

# ResNet-50
python train.py --dataset EGD --backbone resnet3d50 --epochs 100

# DenseNet-121
python train.py --dataset EGD --backbone densenet3d121 --epochs 100

# SegResNet
python train.py --dataset EGD --backbone segresnet --epochs 100

Ablation Study: Architecture Variants

# Baseline
python train.py --dataset EGD --backbone resnet3d18 --variant baseline

# With Transformer blocks
python train.py --dataset EGD --backbone resnet3d18 --variant transformer

# With Mamba (SSM) blocks
python train.py --dataset EGD --backbone resnet3d18 --variant mamba

# Hybrid (Transformer + Mamba)
python train.py --dataset EGD --backbone resnet3d18 --variant transformer_mamba

Custom CSV and Data Path

python train.py \
    --csv /path/to/custom.csv \
    --data_root /path/to/data \
    --model_type A \
    --backbone resnet3d18

Evaluation

Evaluate Single Checkpoint

python evaluate.py --ckpt ./experiments/EGD/model_A/resnet3d18/baseline/default/checkpoints/best.pt

Evaluate Multiple Checkpoints

python evaluate.py --ckpt_dir ./experiments/ --output_dir ./evaluation_results

With Training Curves

python evaluate.py --ckpt ./path/to/checkpoint.pt --plot_curves --output_dir ./results

Visualization

Visualize MRI Slices

python visualize_slices.py \
    --csv ./datasets/EGD/Amodel.csv \
    --data_root ./datasets/EGD \
    --output_dir ./visualizations \
    --num_samples 10 \
    --show_mask

Auto Training Scheduler

For running multiple experiments automatically across multiple GPUs, use the auto training scheduler:

Test Scheduler (Recommended First)

Test the scheduler with short experiments (3 epochs each):

# Activate virtual environment
source .venv/bin/activate

# Run test scheduler
python test_scheduler.py

Full Auto Training

Run the complete training scheduler (225 experiments, 150 epochs each):

# Method 1: Easy way
./run_auto_training.sh

# Method 2: Direct execution
source .venv/bin/activate
python auto_train_scheduler.py

Scheduler Features

  • Multi-GPU Support: Uses 3 GPUs (GPU 1, 2, 3)
  • Parallel Execution: 2 training processes per GPU (6 total)
  • Auto Queue Management: When one experiment finishes, the next one starts automatically
  • Comprehensive Coverage: Tests all combinations of:
    • Backbones: resnet18, resnet50, densenet121, unet_encoder, highresnet
    • Model Types: A, B, ensemble
    • Fusion Methods: concat, add, gated
    • Ensemble Methods: stacking, voting, feature_fusion

Monitor Training

# View real-time logs
tail -f scheduler_logs/slot_a_*.log

# Check all running experiments
ls -la scheduler_logs/

# Stop scheduler
# Press Ctrl+C (all running experiments will be cleaned up automatically)

Expected Output

The scheduler will automatically generate TSV files with detailed results that can be imported directly into Google Sheets for analysis.


Experiments

Experiment Output Structure

Each experiment creates a structured output directory:

experiments/
└── EGD/                           # Dataset
    └── model_A/                   # Model type
        └── resnet3d18/            # Backbone
            └── baseline/          # Variant
                └── exp_20240101/  # Experiment name
                    ├── checkpoints/
                    │   ├── best.pt
                    │   └── epoch_050.pt
                    ├── logs/
                    │   └── training_log.json
                    └── visualizations/
                        └── training_curves.png

Ablation Study

Run systematic ablation study:

# All backbone + variant combinations
for backbone in resnet3d18 resnet3d50 densenet3d121 segresnet; do
    for variant in baseline transformer mamba transformer_mamba; do
        python train.py \
            --dataset EGD \
            --backbone $backbone \
            --variant $variant \
            --experiment_name "ablation_${backbone}_${variant}"
    done
done

Leaderboard

Training automatically updates experiments/leaderboard.json:

[
    {
        "timestamp": "2024-01-01T12:00:00",
        "dataset": "EGD",
        "backbone": "resnet3d18",
        "variant": "mamba",
        "best_score": 0.8542,
        "best_epoch": 67
    },
    ...
]

View leaderboard:

cat experiments/leaderboard.json | python -m json.tool

Configuration

Key configuration options in config.py:

Parameter Default Description
model_type "A" Model type: A, B, or ensemble
backbone_name "resnet3d18" Backbone architecture
architecture_variant "baseline" baseline/transformer/mamba/transformer_mamba
max_epochs 100 Training epochs
batch_size 2 Patients per batch
lr 1e-4 Learning rate
roi_size (128,128,128) Patch size
num_samples 4 Patches per patient
w_idh 1.0 IDH loss weight
w_grade 1.0 Grade loss weight

Results

Results will be published upon paper acceptance.

Model Dataset AUC (IDH) Accuracy (Grade)
Model A (ResNet18) EGD - -
Model A (Mamba) EGD - -
Ensemble EGD - -

Setting Up on New Server

Complete setup guide for a new server:

# 1. Clone repository
git clone https://<USERNAME>:<TOKEN>@github.com/khkim1729/TIME.git
cd TIME

# 2. Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate

# 3. Install PyTorch (check CUDA version with: nvidia-smi)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

# 4. Install dependencies
pip install -r requirements.txt

# 5. Sync datasets
rsync -avzP user@source_server:/path/to/datasets/ ./datasets/

# 6. Verify setup
python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}')"

# 7. Run training
python train.py --dataset EGD --backbone resnet3d18 --epochs 10

Git Workflow

Push Changes

# Check status
git status

# Add files
git add .

# Commit
git commit -m "Add new feature"

# Push (with token in remote URL)
git push origin main

# Or set up credential helper
git config --global credential.helper store
git push origin main  # Enter token when prompted

Pull Latest Changes

git pull origin main

Citation

If you use this code in your research, please cite:

@article{time2024,
  title={TIME: Tumor-grade IDH-mutation Multimodal Ensemble Model},
  author={},
  journal={},
  year={2024}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.


Contact

For questions or issues, please open a GitHub issue or contact the authors.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors