# PANGAEA Model Testing Notebook

Use this notebook to evaluate a Pangaea model. To modify the behavior of this notebook, you can modify the corresponding .yaml file (found under ilab-pangaea-bench/configs), as well as the cells of the notebook. See the "global variables" section for some easily editable variables. 

## Building the environment

1) Find the `environment.yml` file in the repo, and executing the command: `conda env create -f environment.yml`. Make sure you are in the ilab-pangaea-repo directory that you've cloned!
2) In the jupyterhub session, click on the top right area that says `Python [...]`, and select `conda env: conda-pangaea-bench`.
3) You can now run the notebook as needed!

## Before you run
This notebook is designed to be run on a model that was trained using this repo, either using the training notebook or the CLI. 

## Setup

Import python packages, create and configure local directories, configure GPU acceleration, build logger, and set some global variables. 

**Note: if you want to change the functionality of this notebook, the Global Variables subsection is a good place to start.**

### Imports and clone repository

In [4]:
import os as os
from pathlib import Path
import pprint

import torch
from torch.nn.parallel import DistributedDataParallel
from hydra.utils import instantiate
from hydra import initialize, compose
from hydra import initialize_config_dir
from omegaconf import DictConfig, OmegaConf
from torch.utils.data import DataLoader
import torch.distributed as dist
from torch.utils.data.distributed import DistributedSampler
from torchmetrics.classification import JaccardIndex

import datetime
from tqdm import tqdm
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize
import pandas as pd
import sys
import subprocess
import warnings

In [5]:
warnings.filterwarnings('ignore')
repo_name = "ilab-pangaea-bench"
if not os.path.exists(repo_name):
    subprocess.run(["git", "clone", "https://github.com/nasa-nccs-hpda/ilab-pangaea-bench.git"], 
                   check=True, stdout=subprocess.DEVNULL)
else:
    subprocess.run(["git", "-C", repo_name, "pull"], check=True, stdout=subprocess.DEVNULL)

subprocess.run([sys.executable, "-m", "pip", "install", "-e", "./ilab-pangaea-bench"], 
               check=True, stdout=subprocess.DEVNULL)
sys.path.append("ilab-pangaea-bench")

Cloning into 'ilab-pangaea-bench'...
Updating files: 100% (257/257), done.
[0m[33m  DEPRECATION: Legacy editable install of pangaea==1.0.0 from file:///panfs/ccds02/nobackup/people/ajkerr1/EO_FM/ilab-pangaea-bench/notebooks/ilab-pangaea-bench (setup.py develop) is deprecated. pip 25.3 will enforce this behaviour change. A possible replacement is to add a pyproject.toml or enable --use-pep517, and use setuptools >= 64. If the resulting installation is not behaving as expected, try using --config-settings editable_mode=compat. Please consult the setuptools documentation for more information. Discussion can be found at https://github.com/pypa/pip/issues/11457[0m[33m
[0m

In [6]:
from pangaea.decoders.base import Decoder
from pangaea.encoders.base import Encoder
from pangaea.utils.logger import init_logger
from pangaea.datasets.base import GeoFMDataset
from pangaea.utils.collate_fn import get_collate_fn
from pangaea.utils.eval_utils import (config_cuda, load_apply_ckpt, test_loop,
                                      plot_results_heatmap, plot_results_scatter)

### Create and configure directories
These dictate where our outputs (logs, plots, etc) will be directed. A folder with the current date and time will be created.  

In [7]:
def make_experiment_dirs(datetime_str, base_dir="."):
    exp_dir = Path(base_dir) / datetime_str
    exp_dir.mkdir(exist_ok=True)
    logger_path = exp_dir / "test.log"

    return exp_dir, logger_path

In [8]:
datetime_str = datetime.datetime.now().strftime("%Y-%m-%d-%H:%M")
exp_dir, logger_path = make_experiment_dirs(datetime_str, ".")

### Global Variables
These will affect the rest of the notebook, change them for your individual task. For example, if you wish to create your own .yaml file to run this notebook, change the `config_name` variable below. 

In [9]:
# Where to create the working directory for this notebook
working_dir = exp_dir  # Default is to use exp_dir for everything

# Plotting path information
heatmap_plot_filename = f"heatmap"
scatter_plot_filename = f"scatter"

# How much info the logger will display; 0 means no logging, 1 means minimal logging, 2 means extra logging detail
logger_verbosity = 0

# This creates a directory where we will load our trained model checkpoint
ckpt_path = "/explore/nobackup/people/mfrost2/lscratch/EO_FM/training_dir/"
ckpt_dir = Path(ckpt_path, "20250822_151213_a5bd3f_dofa_landsat_seg_upernet_landsatnlcd_7band")

# This can be left as the empty string, "", if your test data is in the same directory as training data
# Otherwise, this notebook will try to evaluate using the same data used to train the model checkpoint
test_data_dir = ("/explore/nobackup/projects/ilab/data/"
                 "foundation_model_comparison/landsat_nlcd/Landsat_NLCD_50_agg9")

# Where we will load our hydra config from
config_path = str(ckpt_dir / "configs" / "config.yaml")

### Configure Hydra
This allows us to run a test using the .yaml framework from Pangaea.

In [10]:
# Init Hydra from config
with initialize_config_dir(config_dir=config_path, version_base=None):
    cfg = OmegaConf.load(config_path)

print("Config loaded successfully!")

Config loaded successfully!


### Configure CUDA for GPU acceleration
We need this to run our notebook on a GPU.

In [11]:
device, local_rank, world_size = config_cuda(cfg, backend="nccl")

Single GPU training detected, skipping distributed init.


### Build logger

This provides some debugging information on various processes.

In [12]:
def build_logger(cfg, logger_path, exp_dir, device, rank=local_rank):
    logger = init_logger(logger_path, rank=rank)
    if (logger_verbosity > 0):
        logger.info("============ Initialized logger ============")
        if (logger_verbosity > 1):
            logger.info(pprint.pformat(OmegaConf.to_container(cfg), compact=True).strip("{}"))
        logger.info("The experiment is stored in %s\n" % exp_dir)
    return logger

In [13]:
logger = build_logger(cfg, logger_path, exp_dir, device, local_rank)

### Editing config to use test dataset

In [14]:
cfg["dataset"]["root_path"] = test_data_dir
cfg.train = False

## Build Model

### Initialize encoder/decoder before loading checkpoint

In [15]:
def build_encoder(cfg, logger):
    encoder: Encoder = instantiate(cfg.encoder)
    encoder.load_encoder_weights(logger)
    logger.info(f"Built {encoder.model_name}, using weights from: {cfg.dataset.download_url}")
    return encoder

def build_decoder(cfg, encoder, device, logger):
    decoder: Decoder = instantiate(
        cfg.decoder,
        encoder=encoder,
    )
    decoder.to(device)
    decoder_name = cfg.decoder._target_.split(".")[-1]
    logger.info(f"Built {decoder_name} decoder.")

    return decoder

In [16]:
# Model operations in Pangaea require a logger
encoder = build_encoder(cfg, logger)
# Pangaea wraps encoder and decoder into one Decoder type
encoder_decoder = build_decoder(cfg, encoder, device, logger)

# Final model is built using distributed GPU resources if we have more than 1 GPU
if (world_size > 1):
    model = DistributedDataParallel(
        encoder_decoder,
        device_ids=[local_rank],
        output_device=local_rank,
        find_unused_parameters=cfg.finetune,
    )

Downloading ./pretrained_models/DOFA_ViT_base_e100.pth: 100%|█████████▉| 427M/427M [00:09<00:00, 49.6Mb/s]   
INFO - 09/04/25 10:18:16 - 0:00:17 - Built dofa_encoder, using weights from: None
INFO - 09/04/25 10:18:16 - 0:00:17 - Built SegUPerNet decoder.


### Load model checkpoint from path

In [17]:
model = load_apply_ckpt(ckpt_dir, device, encoder_decoder, logger, logger_verbosity)

## Load test data, run through model

### Get Dataloader
From our config file, we can build a dataset which is then served in batches by the PyTorch DataLoader.

In [18]:
def get_dataloader(cfg, encoder, logger):
    # Preprocessor is required by dataset class
    test_preprocessor = instantiate(
        cfg.preprocessing.test,
        dataset_cfg=cfg.dataset,
        encoder_cfg=cfg.encoder,
        _recursive_=False,
    )

    # Create dataset
    raw_test_dataset = instantiate(cfg.dataset, split="test")
    test_dataset = GeoFMDataset(raw_test_dataset, test_preprocessor)

    # Create batches by modality using collate function
    modalities = list(encoder.input_bands.keys())
    collate_fn = get_collate_fn(modalities)

    # Dataloader from dataset
    test_loader = DataLoader(
        test_dataset,
        batch_size=32,  # Change this to a larger size if desired
        num_workers=4,  # Change this to a larger number if desired
        pin_memory=True,
        collate_fn=collate_fn
    )

    if (logger_verbosity > 1):
        logger.info(f"Built dataloader from dataset: {cfg.dataset.dataset_name}")
        logger.info(f"Dataset gathered files from: {cfg.dataset.root_path}")

    return test_loader

In [19]:
test_loader = get_dataloader(cfg, encoder, logger)

Found 50 paired image-label files in /explore/nobackup/projects/ilab/data/foundation_model_comparison/landsat_nlcd/Landsat_NLCD_50_agg9/images


### Perform forward pass of model on test dataset
Using the test dataloader, we can get model predictions on test inputs.

In [20]:
logger.info("Beginning test loop...")
test_dict = test_loop(cfg, model, device, test_loader, logger)
logger.info("Testing complete.")

INFO - 09/04/25 10:18:19 - 0:00:20 - Beginning test loop...


task: segmentation


Test loop:   0%|          | 0/2 [00:00<?, ?it/s]

preds, targets devices: (device(type='cuda', index=0), device(type='cuda', index=0))


Test loop: 100%|██████████| 2/2 [00:22<00:00, 11.13s/it]

preds, targets devices: (device(type='cuda', index=0), device(type='cuda', index=0))





ValueError: Default process group has not been initialized, please make sure to call init_process_group.

## Visualize model performance on test set

### Plot targets vs predictions heatmap
This plots the first 5 targets and predictions alongside one another.

In [None]:
logger.info("Creating heatmap and saving to png...")

plot_results_heatmap(
    targets=test_dict["targets"],
    preds=test_dict["preds"],
    save_dir=working_dir,
    png_prefix=heatmap_plot_filename,
)
plt.show()
plt.close()

### Create scatter plot
This plot visualizes all prediction and target data points based on their closeness. The closer they are to the middle, the more similar they are (and the better the model is performing).

In [None]:
logger.info("Creating scatter plot and saving to png...")

fig, val_df = plot_results_scatter(
    targets=test_dict["targets"],
    predictions=test_dict["preds"],
    save_dir=working_dir,
    png_prefix=scatter_plot_filename
)
plt.show()
plt.close()