# fMRI-to-Image Reconstruction Evaluation

Comprehensive evaluation suite for fMRI reconstruction quality using multiple metrics.

## Available Metrics

### **Working Metrics**
- **AlexNet-2/5**: Feature similarity analysis
- **InceptionV3**: High-level feature similarity
- **EfficientNet-B1**: Distance metrics

### **Requires Ground Truth**
- **SSIM**: Structural similarity
- **PixCorr**: Pixel correlation
- **Two-way identification**: AlexNet, CLIP, InceptionV3

### **Requires Additional Data**
- **Brain correlations**: GNet model + fMRI data
- **Retrieval metrics**: Candidate datasets
- **Image quality**: IS, FID


## Configuration

**Modify these variables to customize your evaluation:**


In [1]:
# ===== EVALUATION CONFIGURATION =====
# Modify these variables to customize your evaluation

# Paths
PROJECT_ROOT = '/home/kevin/Documents/ACV/Project/advanced-cv-project'
RECON_DIR = f'{PROJECT_ROOT}/outputs_all/subj01/cp_4096_v1/300000/0.2/ground_truth'
GT_DIR = f'{PROJECT_ROOT}/outputs_all/subj01/cp_4096_v1/300000/0.2/with_z'  # Ground truth images
FMRI_DIR = f'{PROJECT_ROOT}/MindEyeV2'  # fMRI data (if available)

# Evaluation parameters
NUM_IMAGES_TO_EVALUATE = 50  # Number of images to evaluate
BATCH_SIZE = 8  # Batch size for processing (reduce if GPU memory issues)
DEVICE = 'cuda'  # 'cuda' or 'cpu'

# Data availability flags
HAS_GROUND_TRUTH = True  # Set to True if you have ground truth images
HAS_FMRI_DATA = True     # Set to True if you have fMRI data
HAS_GNET_MODEL = False    # Set to True if you have GNet model

print(f"Reconstruction directory: {RECON_DIR}")
print(f"Ground truth directory: {GT_DIR}")
print(f"fMRI data directory: {FMRI_DIR}")
print(f"Images to evaluate: {NUM_IMAGES_TO_EVALUATE}")
print(f"Batch size: {BATCH_SIZE}")
print(f"Device: {DEVICE}")
print(f"Has ground truth: {HAS_GROUND_TRUTH}")
print(f"Has fMRI data: {HAS_FMRI_DATA}")
print(f"Has GNet model: {HAS_GNET_MODEL}")

# Data availability flags
HAS_GROUND_TRUTH = True  # Set to True if you have ground truth images
HAS_FMRI_DATA = True     # Set to True if you have fMRI data
HAS_GNET_MODEL = False    # Set to True if you have GNet model

print(f"Reconstruction directory: {RECON_DIR}")
print(f"Ground truth directory: {GT_DIR}")
print(f"fMRI data directory: {FMRI_DIR}")
print(f"Images to evaluate: {NUM_IMAGES_TO_EVALUATE}")
print(f"Batch size: {BATCH_SIZE}")
print(f"Device: {DEVICE}")
print(f"Has ground truth: {HAS_GROUND_TRUTH}")
print(f"Has fMRI data: {HAS_FMRI_DATA}")
print(f"Has GNet model: {HAS_GNET_MODEL}")


Reconstruction directory: /home/kevin/Documents/ACV/Project/advanced-cv-project/outputs_all/subj01/cp_4096_v1/300000/0.2/ground_truth
Ground truth directory: /home/kevin/Documents/ACV/Project/advanced-cv-project/outputs_all/subj01/cp_4096_v1/300000/0.2/with_z
fMRI data directory: /home/kevin/Documents/ACV/Project/advanced-cv-project/MindEyeV2
Images to evaluate: 50
Batch size: 8
Device: cuda
Has ground truth: True
Has fMRI data: True
Has GNet model: False
Reconstruction directory: /home/kevin/Documents/ACV/Project/advanced-cv-project/outputs_all/subj01/cp_4096_v1/300000/0.2/ground_truth
Ground truth directory: /home/kevin/Documents/ACV/Project/advanced-cv-project/outputs_all/subj01/cp_4096_v1/300000/0.2/with_z
fMRI data directory: /home/kevin/Documents/ACV/Project/advanced-cv-project/MindEyeV2
Images to evaluate: 50
Batch size: 8
Device: cuda
Has ground truth: True
Has fMRI data: True
Has GNet model: False


## Setup and Imports


In [2]:
# Import required libraries
import os
import sys
import torch
import numpy as np
import pandas as pd
from PIL import Image
from pathlib import Path
from tqdm import tqdm
import warnings
warnings.filterwarnings('ignore')

# Add project to path
sys.path.insert(0, PROJECT_ROOT)

# Check GPU availability
device = torch.device(DEVICE if torch.cuda.is_available() and DEVICE == 'cuda' else 'cpu')
print(f"Using device: {device}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")


Using device: cuda
GPU: NVIDIA GeForce RTX 5070 Ti
Memory: 16.60 GB


In [3]:
# Import evaluation metrics
from evaluation.orchestrator import EvaluationOrchestrator
from evaluation.low_level import SSIM, PixCorr
from evaluation.high_level import AlexNetMetrics, CLIPMetrics, InceptionMetrics
from evaluation.distance import EfficientNetDistance, SwAVDistance
from evaluation.brain_correlation import BrainCorrelationMetrics
from evaluation.retrieval import ImageRetrieval, BrainRetrieval
from evaluation.image_quality import InceptionScore, FID

print("All evaluation metrics imported successfully!")
print("Available metrics:")
print("  - Low-level: SSIM, PixCorr")
print("  - High-level: AlexNet, CLIP, InceptionV3")
print("  - Distance: EfficientNet, SwAV")
print("  - Brain correlation: GNet-based correlations")
print("  - Retrieval: Image and Brain retrieval")
print("  - Image quality: IS, FID")


All evaluation metrics imported successfully!
Available metrics:
  - Low-level: SSIM, PixCorr
  - High-level: AlexNet, CLIP, InceptionV3
  - Distance: EfficientNet, SwAV
  - Brain correlation: GNet-based correlations
  - Retrieval: Image and Brain retrieval
  - Image quality: IS, FID


## Data Loading and Validation


In [None]:
# ===== EVALUATION SETUP =====
import torch
import numpy as np
from PIL import Image
from tqdm import tqdm
from pathlib import Path
import json
from datetime import datetime

# Initialize evaluation orchestrator
evaluator = EvaluationOrchestrator(device=device)
print("Evaluation orchestrator initialized!")

## Initialize Evaluation System

In [None]:
# Initialize evaluation orchestrator
evaluator = EvaluationOrchestrator(device=device)
print("Evaluation orchestrator initialized!")

# Initialize individual metrics for testing
print("\nInitializing individual metrics...")

# High-level metrics
alexnet_metrics = AlexNetMetrics(device=device)
inception_metrics = InceptionMetrics(device=device)
print("AlexNet and InceptionV3 metrics ready")

# Distance metrics
effnet_dist = EfficientNetDistance(device=device)
print("EfficientNet distance ready")

# Image quality metrics
is_metric = InceptionScore(device=device)
fid_metric = FID(device=device)
print("Image quality metrics (IS, FID) ready")

# Low-level metrics (if ground truth available)
if HAS_GROUND_TRUTH:
    ssim_metric = SSIM(device=device)
    pixcorr_metric = PixCorr(device=device)
    print("Low-level metrics (SSIM, PixCorr) ready")

# Brain correlation metrics (if fMRI data available)
if HAS_FMRI_DATA:
    brain_corr = BrainCorrelationMetrics(device=device)
    print("Brain correlation metrics ready")

print("\nAll available metrics initialized successfully!")


Evaluation orchestrator initialized!

Initializing individual metrics...
AlexNet and InceptionV3 metrics ready
EfficientNet distance ready
Image quality metrics (IS, FID) ready
Low-level metrics (SSIM, PixCorr) ready
Brain correlation metrics ready

All available metrics initialized successfully!


## Run Evaluation

**Choose your evaluation scenario:**


Found 50 ground truth images


Loading ground truth: 100%|██████████| 50/50 [00:00<00:00, 98.48it/s]


Ground truth batch shape: torch.Size([50, 1, 3, 224, 224])


Starting comprehensive evaluation with best reconstruction selection...
Selecting best reconstructions using SSIM...


Evaluating reconstructions: 100%|██████████| 50/50 [00:00<00:00, 429.37it/s]

Best reconstruction selection complete!
Best reconstruction shape: torch.Size([50, 3, 224, 224])
Best indices: [3 0 4 0 4 0 0 0 2 3 3 0 0 0 3 4 4 4 2 2 1 2 4 1 2 2 2 4 2 3 1 0 2 3 0 1 4
 3 1 0 0 4 3 4 4 1 3 4 3 2]
Score statistics:
  Mean best score: 0.0846
  Score range: 0.0057 to 0.3839

Running evaluation on best reconstructions...
Starting comprehensive evaluation...
Computing low-level metrics...
Computing high-level metrics...





Computing distance metrics...
Computing image quality metrics...

EVALUATION RESULTS (Best Reconstructions)

EVALUATION SUMMARY

LOW LEVEL:
------------------------------
  SSIM: 0.0846 ± 0.0829
  PixCorr: 0.1277 ± 0.1757

HIGH LEVEL:
------------------------------
  AlexNet-2: 4.0000
  AlexNet-5: 14.0000
  CLIP: 4.0000
  InceptionV3: 8.0000

DISTANCE:
------------------------------
  EffNet-B: 0.8467
  SwAV: 0.4670

IMAGE QUALITY:
------------------------------
  IS: 8.4494
  FID: 364.8029


Best Reconstruction Analysis:
  Images evaluated: 50
  Reconstruction diversity: 5 different outputs selected
  Selection distribution: [12  6 10 10 12]


## Compute Metrics

**Run the evaluation based on available data:**


In [None]:
# Run comprehensive evaluation
if recon_batch is not None:
    print("Starting comprehensive evaluation...")
    
    # Prepare evaluation parameters
    eval_params = {
        'recon_batch': recon_batch,
        'gt_batch': gt_batch if HAS_GROUND_TRUTH else None,
        'actual_fmri': None,  # Add fMRI data if available
        'region_masks': None,  # Add brain region masks if available
        'real_images': gt_batch if HAS_GROUND_TRUTH else None
    }
    
    # Run evaluation
    results = evaluator.comprehensive_evaluation(**eval_params)
    
    # Print results
    evaluator.print_summary(results)
    
else:
    print("Cannot run evaluation - no reconstruction images loaded!")


Starting comprehensive evaluation with best reconstruction selection...
Selecting best reconstructions using SSIM...


Evaluating reconstructions: 100%|██████████| 50/50 [00:00<00:00, 430.36it/s]

Best reconstruction selection complete!
Best reconstruction shape: torch.Size([50, 3, 224, 224])
Best indices: [3 0 4 0 4 0 0 0 2 3 3 0 0 0 3 4 4 4 2 2 1 2 4 1 2 2 2 4 2 3 1 0 2 3 0 1 4
 3 1 0 0 4 3 4 4 1 3 4 3 2]
Score statistics:
  Mean best score: 0.0846
  Score range: 0.0057 to 0.3839

Running evaluation on best reconstructions...
Starting comprehensive evaluation...
Computing low-level metrics...
Computing high-level metrics...





Computing distance metrics...
Computing image quality metrics...

EVALUATION RESULTS (Best Reconstructions)

EVALUATION SUMMARY

LOW LEVEL:
------------------------------
  SSIM: 0.0846 ± 0.0829
  PixCorr: 0.1277 ± 0.1757

HIGH LEVEL:
------------------------------
  AlexNet-2: 4.0000
  AlexNet-5: 14.0000
  CLIP: 4.0000
  InceptionV3: 8.0000

DISTANCE:
------------------------------
  EffNet-B: 0.8467
  SwAV: 0.4542

IMAGE QUALITY:
------------------------------
  IS: 8.4494
  FID: 364.8029


Best Reconstruction Analysis:
  Images evaluated: 50
  Reconstruction diversity: 5 different outputs selected
  Selection distribution: [12  6 10 10 12]


## Individual Metric Testing

**Test specific metrics individually:**


In [None]:
# Test individual metrics (if reconstruction data available)
if recon_batch is not None:
    print("Testing individual metrics...")
    
    # Test AlexNet features
    print("\nTesting AlexNet features...")
    alexnet2_feat = alexnet_metrics.extract_features(recon_batch, layer=2)
    alexnet5_feat = alexnet_metrics.extract_features(recon_batch, layer=5)
    print(f"AlexNet-2 features shape: {alexnet2_feat.shape}")
    print(f"AlexNet-5 features shape: {alexnet5_feat.shape}")
    
    # Test InceptionV3 features
    print("\nTesting InceptionV3 features...")
    inception_feat = inception_metrics.extract_features(recon_batch)
    print(f"InceptionV3 features shape: {inception_feat.shape}")
    
    # Test EfficientNet features
    print("\nTesting EfficientNet features...")
    effnet_feat = effnet_dist.extract_features(recon_batch)
    print(f"EfficientNet features shape: {effnet_feat.shape}")
    
    # Compute self-similarity metrics (as proxy for quality)
    print("\nComputing self-similarity metrics...")
    
    def compute_self_similarity(features, name):
        features_norm = torch.nn.functional.normalize(features, dim=1)
        similarities = torch.mm(features_norm, features_norm.T)
        # Remove diagonal
        mask = torch.eye(similarities.shape[0], dtype=bool, device=device)
        similarities_no_diag = similarities[~mask]
        mean_sim = similarities_no_diag.mean().item()
        std_sim = similarities_no_diag.std().item()
        print(f"  {name}: Mean={mean_sim:.4f}, Std={std_sim:.4f}")
        return mean_sim, std_sim
    
    alexnet2_sim = compute_self_similarity(alexnet2_feat, "AlexNet-2")
    alexnet5_sim = compute_self_similarity(alexnet5_feat, "AlexNet-5")
    inception_sim = compute_self_similarity(inception_feat, "InceptionV3")
    effnet_sim = compute_self_similarity(effnet_feat, "EfficientNet")
    
    print("\nIndividual metric testing completed!")
    
else:
    print("Cannot test individual metrics - no reconstruction data available!")


Testing individual metrics...

Testing AlexNet features...
AlexNet-2 features shape: torch.Size([50, 139968])
AlexNet-5 features shape: torch.Size([50, 43264])

Testing InceptionV3 features...
InceptionV3 features shape: torch.Size([50, 1000])

Testing EfficientNet features...
EfficientNet features shape: torch.Size([50, 1280])

Computing self-similarity metrics...
  AlexNet-2: Mean=0.3289, Std=0.0948
  AlexNet-5: Mean=0.1593, Std=0.0870
  InceptionV3: Mean=0.0704, Std=0.1242
  EfficientNet: Mean=0.1738, Std=0.0994

Individual metric testing completed!


## Results Summary

**Current evaluation status:**


In [None]:
# Create results summary
print("EVALUATION SUMMARY")
print("=" * 50)

print(f"\nData Status:")
print(f"  Reconstruction images: {'True' if recon_batch is not None else 'False'}")
print(f"  Ground truth images: {'True' if HAS_GROUND_TRUTH and gt_batch is not None else 'False'}")
print(f"  fMRI data: {'True' if HAS_FMRI_DATA else 'False'}")
print(f"  GNet model: {'True' if HAS_GNET_MODEL else 'False'}")

print(f"\nEvaluation Parameters:")
print(f"  Images evaluated: {NUM_IMAGES_TO_EVALUATE}")
print(f"  Batch size: {BATCH_SIZE}")
print(f"  Device: {device}")

print(f"\nAvailable Metrics:")
print(f"  True Working: AlexNet-2/5, InceptionV3, EfficientNet")
if HAS_GROUND_TRUTH:
    print(f"  True With GT: SSIM, PixCorr, Two-way identification")
else:
    print(f"  False Need GT: SSIM, PixCorr, Two-way identification")
    
if HAS_FMRI_DATA and HAS_GNET_MODEL:
    print(f"  True With fMRI: Brain correlations")
else:
    print(f"  False Need fMRI+GNet: Brain correlations")

print(f"\nNext Steps:")
if not HAS_GROUND_TRUTH:
    print(f"  1. Add ground truth images to enable SSIM, PixCorr, two-way identification")
if not HAS_FMRI_DATA:
    print(f"  2. Add fMRI data to enable brain correlation metrics")
if not HAS_GNET_MODEL:
    print(f"  3. Add GNet model to enable brain correlation predictions")
print(f"  4. Increase NUM_IMAGES_TO_EVALUATE for more comprehensive evaluation")


EVALUATION SUMMARY

Data Status:
  Reconstruction images: True
  Ground truth images: True
  fMRI data: True
  GNet model: False

Evaluation Parameters:
  Images evaluated: 50
  Batch size: 8
  Device: cuda

Available Metrics:
  True Working: AlexNet-2/5, InceptionV3, EfficientNet
  True With GT: SSIM, PixCorr, Two-way identification
  False Need fMRI+GNet: Brain correlations

Next Steps:
  3. Add GNet model to enable brain correlation predictions
  4. Increase NUM_IMAGES_TO_EVALUATE for more comprehensive evaluation


## Evaluation Checklist

### True **COMPLETED**
- **AlexNet-2**: ✓ Working (feature similarity)
- **AlexNet-5**: ✓ Working (feature similarity) 
- **InceptionV3**: ✓ Working (feature similarity)
- **EfficientNet-B1**: ✓ Working (distance metrics)

### False **STILL NEEDED**

#### **Low-Level Metrics**
- [ ] **SSIM**: Need ground truth images
- [ ] **PixCorr**: Need ground truth images

#### **High-Level Metrics** 
- [ ] **CLIP**: Need ground truth images for two-way identification
- [ ] **Two-way identification**: Need ground truth images

#### **Distance Metrics**
- [ ] **SwAV**: Need SwAV model implementation

#### **Brain Correlation Scores**
- [ ] **GNet model**: Need to load GNet model
- [ ] **V1-V4 correlations**: Need fMRI data + GNet
- [ ] **Higher Visual cortex correlation**
- [ ] **Whole Visual Cortex correlation**

#### **Retrieval Metrics**
- [ ] **Image Retrieval**: Need 300 candidate images
- [ ] **Brain Retrieval**: Need 300 candidate fMRI scans

#### **Image Quality Metrics**
- [ ] **IS (Inception Score)**: Need implementation
- [ ] **FID (Fréchet Inception Distance)**: Need implementation

### **Key Missing Components:**
1. **Ground truth images** (for SSIM, PixCorr, two-way identification)
2. **GNet model** (for brain correlations)
3. **Actual fMRI data** (for brain correlations)
4. **CLIP model** (for CLIP metrics)
5. **SwAV model** (for SwAV distance)
6. **Retrieval datasets** (300 images + 300 fMRI scans)

**Current Status**: ~20% of metrics working. Main blockers: missing ground truth images, GNet model, and fMRI data.
