<b>EDA - Exploratory Data Analysis</b>

This notebook performs exploratory data analysis on the brain MRI segmentation dataset.
You will see:

- Dataset sizes (train/test)

- Number of slices per volume

- Voxel spacing (if available)

- Intensity distributions

- Class mask counts & tumor‐pixel ratios

- Example slice visualizations

In [1]:
### IMPORTS ###
# notebooks/eda.ipynb, first cell:
import sys, os
import matplotlib.pyplot as plt
# point at the project’s src/ folder
sys.path.insert(0, os.path.abspath(os.path.join("..", "src")))

# now import from the utils package
from utils.eda import gather_stats, plot_distributions

# and bring in your MONAI loader
from dataloader.dataloader import get_mri_dataloader


  from torch.distributed.optim import ZeroRedundancyOptimizer
2025-04-20 20:45:33.204742: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-04-20 20:45:33.233199: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [None]:
DATA_DIR = "/datasets/tdt4265/mic/open/HNTS-MRG"

# 1) build a per‑case stats DataFrame
df = gather_stats(DATA_DIR)
print("Per‑case summary (first 5 rows):")
display(df.head())

print("Descriptive statistics:")
display(df.describe())

# 2) histogram all the key columns
plot_distributions(df)

# 3) show a few example slices overlaid with mask
train_loader, _ = get_mri_dataloader(
    DATA_DIR, subset="train", batch_size=1, validation_fraction=0.0
)
for i, batch in zip(range(3), train_loader):
    img = batch["image"][0].cpu().numpy().squeeze()
    lbl = batch["label"][0].cpu().numpy().squeeze()
    mid = img.shape[2] // 2

    fig, (ax1,ax2,ax3) = plt.subplots(1,3, figsize=(12,4))
    ax1.imshow(img[...,mid], cmap="gray")
    ax1.set_title(f"Raw {i}, slice {mid}")
    ax2.imshow(lbl[...,mid], cmap="tab10")
    ax2.set_title("Mask")
    ax3.imshow(img[...,mid], cmap="gray")
    ax3.imshow(lbl[...,mid], cmap="tab10", alpha=0.5)
    ax3.set_title("Overlay")
    for ax in (ax1,ax2,ax3): ax.axis("off")
    plt.show()


Loading dataset:   5%|▍         | 6/130 [00:06<02:25,  1.17s/it]