# LCComparison2026 Quick Start

This notebook walks through the basic setup and configuration of the LCComparison2026 framework.

## Overview

LCComparison2026 compares foundation models (Prithvi, SatLas, SSL4EO) against existing LCAnalysis2026 segmentation models for land cover classification using a **linear probing** approach:

1. Extract embeddings from pretrained satellite image models
2. Train lightweight classifiers on the embeddings
3. Compare classification quality across models

## 1. Configuration

The project uses OmegaConf for configuration management. Let's load and inspect the config.

In [None]:
import sys
sys.path.insert(0, '..')

from src.config_schema import CLASS_SCHEMA, CLASS_COLORS, load_config, validate_config

# Load the main config
config = load_config('../config/config.yaml')

# Validate it
issues = validate_config(config)
if issues:
    print('Issues found:')
    for issue in issues:
        print(f'  - {issue}')
else:
    print('Configuration is valid.')

## 2. Class Schema

The project uses a 7-class land cover schema. LCAnalysis2026's 8 classes are mapped to these 7.

In [None]:
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches

print('7-Class Schema:')
print(f"{'Class':<10} {'Index':<8} {'Color'}")
print('-' * 30)
for name, idx in CLASS_SCHEMA.items():
    print(f'{name:<10} {idx:<8} {CLASS_COLORS[name]}')

# Visualize the color palette
fig, ax = plt.subplots(1, 1, figsize=(8, 1.5))
patches = []
for name, color in CLASS_COLORS.items():
    patches.append(mpatches.Patch(color=color, label=f'{name} ({CLASS_SCHEMA[name]})'))
ax.legend(handles=patches, loc='center', ncol=7, fontsize=9)
ax.axis('off')
plt.title('Land Cover Class Colors')
plt.tight_layout()
plt.show()

## 3. Discover Existing Models

Scan the LCAnalysis2026 experiments directory to find existing trained models.

In [None]:
from src.integration.existing_model_integration import discover_experiments

results = discover_experiments(
    base_path=config.existing_models.get('base_path', '/media/ken/data/LCAnalysis2026'),
    experiments_dir=config.existing_models.get('experiments_dir', 'experiments'),
)

print(f"Discovered {results['num_experiments']} experiments")
print(f"Complete (with metrics): {len(results['complete_experiments'])}")
print()

for exp in results['complete_experiments']:
    metrics = exp.get('metrics', {})
    iou = metrics.get('test/iou', 'N/A')
    f1 = metrics.get('test/f1', 'N/A')
    if isinstance(iou, float):
        iou = f'{iou:.4f}'
    if isinstance(f1, float):
        f1 = f'{f1:.4f}'
    print(f"  {exp['name']}: model={exp['model_name']}, IoU={iou}, F1={f1}")

## 4. Class Mapping

LCAnalysis2026 uses 8 classes that map to our 7-class schema.

In [None]:
mapping = results['class_mapping']

print('LCAnalysis2026 -> LCComparison2026 class mapping:')
print(f"{'Source (8-class)':<20} {'Target (7-class)':<20}")
print('-' * 40)
for src_name, dst_name in mapping['name_mapping'].items():
    src_idx = mapping.get('index_mapping', {}).get(src_name, '?')
    dst_idx = CLASS_SCHEMA.get(dst_name, '?')
    print(f'{src_name:<20} {dst_name} ({dst_idx})')

## 5. Foundation Models

Three foundation models are supported for embedding extraction.

In [None]:
from omegaconf import OmegaConf

models = config.get('foundation_models', {})
print(f"{'Model':<10} {'Architecture':<18} {'Bands':<8} {'Input':<10} {'Emb Dim'}")
print('-' * 60)
for name, cfg in models.items():
    bands = len(cfg.get('input_bands', []))
    size = cfg.get('input_size', '?')
    dim = cfg.get('embedding_dim', '?')
    enabled = 'yes' if cfg.get('enabled', False) else 'no'
    print(f'{name:<10} {cfg.get("model_id", "?")[:16]:<18} {bands:<8} {size}x{size:<6} {dim}')

## 6. Tile Management

The tile manager tracks processing status across the pipeline.

In [None]:
from pathlib import Path
from src.data.tile_manager import TileManager

# Create a tile manager (or load existing)
index_path = Path('../data/checkpoints/tile_index.json')

if index_path.exists():
    tm = TileManager(index_path)
    progress = tm.get_progress()
    print('Tile Processing Status:')
    for status_name, count in progress.items():
        print(f'  {status_name}: {count}')
else:
    print('No tile index found. Run `python -m src.pipeline init` first.')
    print()
    # Demo: create a sample grid
    import tempfile
    with tempfile.NamedTemporaryFile(suffix='.json', delete=False) as f:
        demo_tm = TileManager(f.name)
        tiles = demo_tm.create_grid(
            bbox={'west': -122.5, 'south': 47.0, 'east': -122.0, 'north': 47.5},
            resolution=10.0,
            tile_size=256,
        )
        print(f'Demo grid: {len(tiles)} tiles created')
        print(f'Progress: {demo_tm.get_progress()}')

## Next Steps

With the project configured, the typical workflow is:

1. **Export or import imagery** -> `export-imagery` or place tiles in `data/tiles/`
2. **Prepare labels** -> `generate-labels` or `import-labels`
3. **Download model weights** -> `download-models`
4. **Preprocess tiles** -> `preprocess --model prithvi`
5. **Generate embeddings** -> `generate-embeddings --model prithvi`
6. **Train classifiers** -> `train-classifier --model prithvi`
7. **Predict and mosaic** -> `predict` then `mosaic-tiles`
8. **Assess and compare** -> `assess-accuracy`, `compare-models`
9. **Ensemble** -> `ensemble`, `fuse-predictions`

See `02_model_comparison.ipynb` for a detailed comparison workflow.