# Configuration
 
This tutorial covers the configuration system for the displacement calibration workflow.

**Philosophy**: The config system uses Pydantic for validation at serialization boundaries. This means:
- Type-safe configurations with automatic validation
- Clear error messages when something is wrong
- YAML files that map directly to Python objects
- No scattered validation checks throughout the codebase

## Quick Start: Minimal Configuration

Let's start with the simplest possible config:

In [1]:
from pathlib import Path

from cal_disp.config.workflow import CalibrationWorkflow

# Create minimal config - just directories
config = CalibrationWorkflow.create_minimal()
print(config.summary())

Calibration Workflow Configuration

Directories:
  Work directory:   /u/aurora-r0/govorcin/01_OPERA/CAL/cal-disp/work
  Output directory: /u/aurora-r0/govorcin/01_OPERA/CAL/cal-disp/output
  Log file:         /u/aurora-r0/govorcin/01_OPERA/CAL/cal-disp/work/cal_disp.log
  Keep relative:    False

Input Files:
  Not configured!

Worker Settings:
  Workers:          4
  Threads/worker:   2
  Total threads:    8
  Block shape:      (128, 128)

Workflow Status: NOT READY
Errors:
  - input_options must be provided
  - dynamic_ancillary_options must be provided


**Note**: This config isn't ready to run yet. It needs input files.

## Step 1: Adding Required Input Files

Every workflow needs:
1. A displacement file (DISP-S1 product)
2. A calibration reference grid
3. Dynamic ancillary files (DEM, LOS, masks, etc.)

In [None]:
from cal_disp.config import DynamicAncillaryFileGroup, InputFileGroup

# Set up input files
config.input_options = InputFileGroup(
    disp_file=Path("data/OPERA_L3_DISP-S1_T001-000001-IFG_20240101_20240113_v1.0.nc"),
    unr_grid_latlon_file=Path("data/grid_latlon_lookup_v0.2.txt"),
    unr_timeseries_dir=Path("data/unr_grid/"),
    frame_id=1,
    unr_grid_version="0.2",
    unr_grid_type='constant',
)

# Set up dynamic ancillaries
config.dynamic_ancillary_options = DynamicAncillaryFileGroup(
    algorithm_parameters_file = Path("configs/algorithm_parameters.yaml"),
    static_dem_file=Path("data/dem.tif"),
    static_los_file=Path("data/line_of_sight_enu.tif"),
)

# Check if ready to run
status = config.validate_ready_to_run()
print(f"Ready to run: {status['ready']}")
if status['errors']:
    print("Errors:")
    for error in status['errors']:
        print(f"  - {error}")

Ready to run: True


## Step 2: Working with YAML Files

Configs are typically stored as YAML files. Here's the round-trip:

In [3]:
# Save to YAML
yaml_path = Path("workflow_config.yaml")
config.to_yaml(yaml_path)
print(f"Saved to {yaml_path}")

# View the YAML
print("\nYAML contents:")
print(yaml_path.read_text())

Saved to workflow_config.yaml

YAML contents:
# Configuration for required input files. Must be provided before running workflow.
#   Type: None | None.
input_options:
  disp_file: data/OPERA_L3_DISP-S1_T001-000001-IFG_20240101_20240113_v1.0.nc
  calibration_reference_latlon_file: data/unr3/grid_latlon_lookup_v0.2.txt
  calibration_reference_grid_dir: data/unr3
  frame_id: 8882
# Directory for intermediate processing files. Created if it doesn't exist.
#   Type: string.
work_directory: /u/aurora-r0/govorcin/01_OPERA/CAL/cal-disp/work
# Directory for final output files. Created if it doesn't exist.
#   Type: string.
output_directory: /u/aurora-r0/govorcin/01_OPERA/CAL/cal-disp/output
# If False, resolve all relative paths to absolute paths. If True, keep paths as provided.
#   Type: boolean.
keep_paths_relative: false
# Dynamic ancillary files (dem, los, masks, troposphere, etc.).
#   Type: None | None.
dynamic_ancillary_options:
  algorithm_parameters_file: configs/algorithm_parameters

In [4]:
# Load from YAML
loaded_config = CalibrationWorkflow.from_yaml(yaml_path)
print(loaded_config.summary())

Calibration Workflow Configuration

Directories:
  Work directory:   /u/aurora-r0/govorcin/01_OPERA/CAL/cal-disp/work
  Output directory: /u/aurora-r0/govorcin/01_OPERA/CAL/cal-disp/output
  Log file:         /u/aurora-r0/govorcin/01_OPERA/CAL/cal-disp/work/cal_disp.log
  Keep relative:    False

Input Files:
  DISP file:        data/OPERA_L3_DISP-S1_T001-000001-IFG_20240101_20240113_v1.0.nc
  Frame ID:         8882
  UNR lookup:       data/unr3/grid_latlon_lookup_v0.2.txt
  UNR grid dir:     data/unr3

Worker Settings:
  Workers:          4
  Threads/worker:   2
  Total threads:    8
  Block shape:      (128, 128)

Dynamic Ancillary Files:
  algorithm_parameters_file: configs/algorithm_parameters.yaml
  los_file: data/static_input/OPERA_L3_DISP-S1-STATIC_F08882_20140403_S1A_v1.0_line_of_sight_enu.tif
  dem_file: data/static_input/OPERA_L3_DISP-S1-STATIC_F08882_20140403_S1A_v1.0_dem.tif

Workflow Status: READY



## Step 3: Worker Configuration

Control parallelism and memory usage:

In [5]:
from cal_disp.config import WorkerSettings

# Default settings (auto-detect CPU count)
config.worker_settings = WorkerSettings.create_standard()
print(f"Workers: {config.worker_settings.n_workers}")
print(f"Threads per worker: {config.worker_settings.threads_per_worker}")
print(f"Total threads: {config.worker_settings.total_threads}")

# Custom settings for a specific machine
config.worker_settings = WorkerSettings(
    n_workers=4,
    threads_per_worker=2,
    block_shape=(512, 512),
)
print(f"\nCustom: {config.worker_settings.total_threads} total threads")

Workers: 4
Threads per worker: 2
Total threads: 8

Custom: 8 total threads


## Step 4: File Validation

Check which files exist before running:

In [7]:
# Check all input files
file_status = config.validate_input_files_exist()

print("File validation results:")
for filename, info in file_status.items():
    status = "✓" if info['exists'] else "✗"
    print(f"  {status} {filename}: {info['exists']}")

# Get just the missing files
missing = config.get_missing_files()
if missing:
    print(f"\nMissing files: {', '.join(missing)}")
else:
    print("\nAll files exist!")

File validation results:
  ✗ disp_file: False
  ✓ calibration_reference_latlon_file: True
  ✓ calibration_reference_grid_dir: True
  ✓ algorithm_parameters_file: True
  ✓ los_file: True
  ✓ dem_file: True

Missing files: disp_file


## Step 5: Static Ancillary Files (Optional)

For algorithm overrides, custom databases, etc:

In [None]:
from cal_disp.config import StaticAncillaryFileGroup

config.static_ancillary_options = StaticAncillaryFileGroup(
    algorithm_parameters_overrides_json=Path("data/custom_params.json"),
    # Add other static files as needed
)

print(config.summary())

## Step 6: Directory Management

Create output directories and set up logging:

In [9]:
# Create directories
config.create_directories()
print(f"Work directory: {config.work_directory}")
print(f"Output directory: {config.output_directory}")
print(f"Log file: {config.log_file}")

# Set up logging
import logging

logger = config.setup_logging(level=logging.INFO)
logger.info("Workflow initialized")

2026-01-05 14:10:57,841 - cal_disp - INFO - Workflow initialized


Work directory: /u/aurora-r0/govorcin/01_OPERA/CAL/cal-disp/work
Output directory: /u/aurora-r0/govorcin/01_OPERA/CAL/cal-disp/output
Log file: /u/aurora-r0/govorcin/01_OPERA/CAL/cal-disp/work/cal_disp.log


## Complete Example: From Scratch

Putting it all together:

In [None]:
# Create a complete config in one go
complete_config = CalibrationWorkflow(
    work_directory=Path("./processing/work"),
    output_directory=Path("./processing/output"),
    input_options=InputFileGroup(
        disp_file=Path("data/disp.nc"),
        unr_grid_latlon_file=Path('latlon.txt'),
        unr_timeseries_dir=Path("data/"),
        frame_id=1,
        unr_grid_version="0.2",
        unr_grid_type="constant",
    ),
    dynamic_ancillary_options=DynamicAncillaryFileGroup(
        algorithm_parameters_file=Path('algorithm.yaml'),
        static_dem_file=Path("data/dem.tif"),
        static_los_file=Path("data/los_enu.tif"),
    ),
    worker_settings=WorkerSettings(
        n_workers=4,
        threads_per_worker=2,
    ),
    keep_paths_relative=True,  # Keep relative for portability
)

# Save and verify
complete_config.to_yaml("production_config.yaml")
print(complete_config.summary())

## Path Resolution

By default, paths are resolved to absolute. Control this with `keep_paths_relative`:

In [None]:
# Absolute paths (default)
abs_config = CalibrationWorkflow(
    work_directory=Path("./work"),
    output_directory=Path("./output"),
    keep_paths_relative=False,
)
print(f"Work dir: {abs_config.work_directory}")

# Relative paths (for portability)
rel_config = CalibrationWorkflow(
    work_directory=Path("./work"),
    output_directory=Path("./output"),
    keep_paths_relative=True,
)
print(f"Work dir: {rel_config.work_directory}")

## Error Handling

The config system validates at creation time:

In [None]:
try:
    # This will fail - invalid type for n_workers
    bad_config = CalibrationWorkflow(
        worker_settings=WorkerSettings(n_workers="not a number")
    )
except Exception as e:
    print(f"Validation error: {e}")

In [None]:
# Check readiness before running
incomplete_config = CalibrationWorkflow.create_minimal()
status = incomplete_config.validate_ready_to_run()

if not status['ready']:
    print("Not ready to run!")
    print("Errors:")
    for error in status['errors']:
        print(f"  - {error}")
else:
    print("Ready to run!")

## Command-Line Usage

Typical workflow from the command line:

```bash
# Create a template config
python -c "from cal_disp.config import CalibrationWorkflow; \
           CalibrationWorkflow.create_example().to_yaml('config.yaml')"

# Edit config.yaml with your paths
vim config.yaml

# Run the workflow
cal-disp run config.yaml
```

## Tips and Best Practices

1. **Start with `create_minimal()` or `create_example()`** - don't build configs from scratch
2. **Use relative paths** when configs need to be portable across machines
3. **Always call `validate_ready_to_run()`** before starting a long job
4. **Set up logging early** with `setup_logging()` to catch issues
5. **Version control your YAML configs** - they're small and readable
6. **Use `summary()` to sanity-check** your configuration before running

## Common Patterns

### Pattern 1: Load, Modify, Save

```python
# Load existing config
config = CalibrationWorkflow.from_yaml("config.yaml")

# Modify one thing
config.worker_settings.n_workers = 8

# Save as new config
config.to_yaml("config_8workers.yaml")
```

### Pattern 2: Batch Processing

```python
# Create configs for multiple frames
base_config = CalibrationWorkflow.from_yaml("base_config.yaml")

for frame_id in [1, 2, 3, 4]:
    config = base_config.model_copy(deep=True)
    config.input_options.frame_id = frame_id
    config.input_options.disp_file = Path(f"data/frame_{frame_id}.nc")
    config.output_directory = Path(f"output/frame_{frame_id}")
    config.to_yaml(f"config_frame_{frame_id}.yaml")
```

### Pattern 3: Conditional Configuration

```python
import os

# Different settings for local vs HPC
if os.getenv("SLURM_JOB_ID"):
    # On HPC cluster
    config.worker_settings = WorkerSettings(
        n_workers=int(os.getenv("SLURM_CPUS_PER_TASK", 16)),
        threads_per_worker=1,
    )
else:
    # Local machine
    config.worker_settings = WorkerSettings.create_standard()
```

## Next Steps

- See `docs/tutorials/workflow.ipynb` for running the full calibration workflow
- Check the [API docs](https://cal-disp.readthedocs.io/) for all available options
- Look at `examples/` directory for real-world configurations