# ModelSpec Example

This notebook illustrates how the `ModelSpec` works with `models.yml` to curate and access model attributes.


## Overview

The `ModelSpec` class defines the complete specification for an ocean model configuration, including:

- **Templates**: Jinja2 template locations for compile-time and run-time configuration files
- **Settings**: Default settings and configuration files
- **Code**: Repository specifications for ROMS, MARBL, and associated code
- **Inputs**: Default specifications for grid, initial conditions, and forcing data
- **Datasets**: List of required source datasets

Model specifications are stored in `models.yml` and loaded using `load_models_yaml()`.


## Setup

Import the necessary modules and enable autoreload for development.


In [1]:
%load_ext autoreload
%autoreload 2

from pathlib import Path
from cson_forge import models, config


## Load ModelSpec

Load a model specification from `models.yml` using `load_models_yaml()`. This function takes the path to the YAML file and the model name.


In [2]:
# Load a model specification
model_name = "cson_roms-marbl_v0.1"
model_spec = models.load_models_yaml(config.paths.models_yaml, model_name)

print(f"Loaded ModelSpec: {model_spec.name}")
print(f"Type: {type(model_spec)}")


Loaded ModelSpec: cson_roms-marbl_v0.1
Type: <class 'cson_forge.models.ModelSpec'>


## Inspect ModelSpec Structure

The `ModelSpec` is a Pydantic model with several main components. Let's explore each one:


In [None]:
# View all ModelSpec attributes
print("ModelSpec attributes:")
# Use the class to access model_fields (not the instance) to avoid deprecation warning
for attr in model_spec.__class__.model_fields.keys():
    value = getattr(model_spec, attr)
    if isinstance(value, (list, dict)) and len(str(value)) > 100:
        print(f"  - {attr}: {type(value).__name__} (length: {len(value)})")
    else:
        print(f"  - {attr}: {value}")


ModelSpec attributes:
  - name: cson_roms-marbl_v0.1
  - templates: compile_time=CodeRepository(documentation='', locked=False, location='/Users/mclong/codes/cson-forge/cson_forge/model-configs/cson_roms-marbl_v0.1/templates/compile-time', commit='', branch='main', filter=PathFilter(directory='', files=['bgc.opt.j2', 'blk_frc.opt.j2', 'cdr_frc.opt.j2', 'cppdefs.opt.j2', 'diagnostics.opt.j2', 'ocean_vars.opt.j2', 'param.opt.j2', 'river_frc.opt.j2', 'surf_flux.opt.j2', 'tides.opt.j2', 'tracers.opt.j2', 'Makefile'])) run_time=CodeRepository(documentation='', locked=False, location='/Users/mclong/codes/cson-forge/cson_forge/model-configs/cson_roms-marbl_v0.1/templates/run-time', commit='', branch='main', filter=PathFilter(directory='', files=['roms.in.j2', 'marbl_in', 'marbl_tracer_output_list', 'marbl_diagnostic_output_list']))
  - settings: properties=PropertiesSpec(n_tracers=34) compile_time=SettingsStage(settings_dict={'bgc': {'wrt_his': True, 'output_period_his': 86400, 'nrpf_his': 7,

/var/folders/x8/7n8hknbj717fxnf07pnk3pch0000gn/T/ipykernel_11022/1169389429.py:3: PydanticDeprecatedSince211: Accessing the 'model_fields' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.
  for attr in model_spec.model_fields.keys():


:::{important}
Note that `datasets` is a derived property; it is a list of all the source datasets used in configuring this model.
:::

## Templates Specification

The `templates` field defines where Jinja2 templates are located for compile-time and run-time configuration files.


In [4]:
if model_spec.templates:
    print("Templates Specification:")
    print(f"  Compile-time location: {model_spec.templates.compile_time.location}")
    print(f"  Compile-time files: {model_spec.templates.compile_time.filter.files}")
    print(f"\n  Run-time location: {model_spec.templates.run_time.location}")
    print(f"  Run-time files: {model_spec.templates.run_time.filter.files}")
else:
    print("No templates specification")


Templates Specification:
  Compile-time location: /Users/mclong/codes/cson-forge/cson_forge/model-configs/cson_roms-marbl_v0.1/templates/compile-time
  Compile-time files: ['bgc.opt.j2', 'blk_frc.opt.j2', 'cdr_frc.opt.j2', 'cppdefs.opt.j2', 'diagnostics.opt.j2', 'ocean_vars.opt.j2', 'param.opt.j2', 'river_frc.opt.j2', 'surf_flux.opt.j2', 'tides.opt.j2', 'tracers.opt.j2', 'Makefile']

  Run-time location: /Users/mclong/codes/cson-forge/cson_forge/model-configs/cson_roms-marbl_v0.1/templates/run-time
  Run-time files: ['roms.in.j2', 'marbl_in', 'marbl_tracer_output_list', 'marbl_diagnostic_output_list']


:::{important} Register for dataset access
Template files with a `.j2` extension will be passed through `settings.render_roms_settings`. 

There must be a match between the template keys in these files and the dictionaries in the `_default_config_yaml` files.

:::

## Settings Specification

The `settings` field defines default settings and configuration file locations for compile-time and run-time stages.


In [5]:
if model_spec.settings:
    print("Settings Specification:")
    print(f"  Properties: {model_spec.settings.properties}")
    print(f"  Compile-time defaults: {model_spec.settings.compile_time._default_config_yaml}")
    print(f"  Run-time defaults: {model_spec.settings.run_time._default_config_yaml}")
else:
    print("No settings specification")


Settings Specification:
  Properties: n_tracers=34
  Compile-time defaults: /Users/mclong/codes/cson-forge/cson_forge/model-configs/cson_roms-marbl_v0.1/templates/compile-time-defaults.yml
  Run-time defaults: /Users/mclong/codes/cson-forge/cson_forge/model-configs/cson_roms-marbl_v0.1/templates/run-time-defaults.yml


## Code Repository Specification

The `code` field defines the code repositories (ROMS, MARBL) and their locations, branches, and commits.


In [6]:
print("Code Repository Specification:")
print(f"  ROMS location: {model_spec.code.roms.location}")
print(f"  ROMS branch: {model_spec.code.roms.branch}")

if model_spec.code.marbl:
    print(f"  MARBL location: {model_spec.code.marbl.location}")
    print(f"  MARBL commit: {model_spec.code.marbl.commit}")
else:
    print("  MARBL: Not specified")


Code Repository Specification:
  ROMS location: https://github.com/CWorthy-ocean/ucla-roms.git
  ROMS branch: main
  MARBL location: https://github.com/marbl-ecosys/MARBL.git
  MARBL commit: marbl0.45.0


## Inputs Specification

The `inputs` field defines default specifications for grid, initial conditions, and forcing data. These serve as defaults when generating inputs.


In [7]:
print("Inputs Specification:")
print(f"\nGrid:")
print(f"  Topography source: {model_spec.inputs.grid.topography_source}")

print(f"\nInitial Conditions:")
print(f"  Source: {model_spec.inputs.initial_conditions.source}")
if model_spec.inputs.initial_conditions.bgc_source:
    print(f"  BGC source: {model_spec.inputs.initial_conditions.bgc_source}")

print(f"\nForcing:")
if model_spec.inputs.forcing:
    if model_spec.inputs.forcing.surface:
        print(f"  Surface forcing ({len(model_spec.inputs.forcing.surface)} sources):")
        for i, surf in enumerate(model_spec.inputs.forcing.surface, 1):
            print(f"    {i}. {surf.source.name} ({surf.type})")
    
    if model_spec.inputs.forcing.boundary:
        print(f"  Boundary forcing ({len(model_spec.inputs.forcing.boundary)} sources):")
        for i, bnd in enumerate(model_spec.inputs.forcing.boundary, 1):
            print(f"    {i}. {bnd.source.name} ({bnd.type})")
    
    if model_spec.inputs.forcing.tidal:
        print(f"  Tidal forcing ({len(model_spec.inputs.forcing.tidal)} sources):")
        for i, tide in enumerate(model_spec.inputs.forcing.tidal, 1):
            print(f"    {i}. {tide.source.name} (ntides: {tide.ntides})")
    
    if model_spec.inputs.forcing.river:
        print(f"  River forcing ({len(model_spec.inputs.forcing.river)} sources):")
        for i, riv in enumerate(model_spec.inputs.forcing.river, 1):
            print(f"    {i}. {riv.source.name} (include_bgc: {riv.include_bgc})")


Inputs Specification:

Grid:
  Topography source: ETOPO5

Initial Conditions:
  Source: name='GLORYS' climatology=False
  BGC source: name='UNIFIED' climatology=True

Forcing:
  Surface forcing (2 sources):
    1. ERA5 (physics)
    2. UNIFIED (bgc)
  Boundary forcing (2 sources):
    1. GLORYS (physics)
    2. UNIFIED (bgc)
  Tidal forcing (1 sources):
    1. TPXO (ntides: 15)
  River forcing (1 sources):
    1. DAI (include_bgc: True)


## Required Datasets

The `datasets` field lists all source datasets required by this model configuration. These are derived from the inputs specification.


In [8]:
print("Required Datasets:")
print(f"  Total: {len(model_spec.datasets)}")
for i, dataset in enumerate(model_spec.datasets, 1):
    print(f"    {i}. {dataset}")


Required Datasets:
  Total: 4
    1. ERA5
    2. GLORYS_REGIONAL
    3. TPXO
    4. UNIFIED_BGC


## Accessing Nested Fields

You can access nested fields using dot notation. Here are some examples:


In [9]:
# Examples of accessing nested fields
print("Example field access:")
print(f"  ROMS repository URL: {model_spec.code.roms.location}")
print(f"  Number of tracers: {model_spec.settings.properties.n_tracers}")
print(f"  First surface forcing source: {model_spec.inputs.forcing.surface[0].source.name}")
print(f"  Grid topography source: {model_spec.inputs.grid.topography_source}")


Example field access:
  ROMS repository URL: https://github.com/CWorthy-ocean/ucla-roms.git
  Number of tracers: 34
  First surface forcing source: ERA5
  Grid topography source: ETOPO5


## ModelSpec as Dictionary

You can convert the ModelSpec to a dictionary for inspection or serialization:


In [10]:
# Convert to dictionary (Pydantic model_dump)
model_dict = model_spec.model_dump()

print("ModelSpec as dictionary (top-level keys):")
for key in model_dict.keys():
    print(f"  - {key}")

# You can also use model_dump_json() for JSON serialization
# import json
# json_str = model_spec.model_dump_json(indent=2)


ModelSpec as dictionary (top-level keys):
  - name
  - templates
  - settings
  - code
  - inputs
  - datasets


## Summary

The `ModelSpec` provides a structured, validated representation of model configurations:

- **Type-safe**: Pydantic models provide validation and type checking
- **Accessible**: Use dot notation to access nested fields
- **Serializable**: Convert to dict/JSON for storage or inspection
- **Complete**: Contains all information needed to configure and build a model

This specification is used by `CstarSpecBuilder` to configure model builds and input generation.
