# Blueprint Catalog API

The `catalog` module provides API-driven access to blueprint information stored in the blueprints directory. This allows you to discover, query, and load blueprint data for instantiating `OcnModel` objects.


## Overview

The `BlueprintCatalog` class provides methods to:

- Discover blueprint files in the blueprints directory (pattern: `B_*.yml`)
- Load individual blueprint YAML files
- Extract grid parameters from grid YAML files (`_grid.yml`)
- Load all blueprints into a pandas DataFrame with extracted model/grid names, dates, partitioning, and paths
- Filter blueprints by stage (preconfig, postconfig, build, run)


## Basic Usage

The module provides a convenience instance `blueprint` that you can use directly:


In [3]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [4]:
from cson_forge import catalog

### Finding Blueprint Files

You can find all blueprint files in the blueprints directory:


In [5]:
# Find all blueprint files (defaults to all stages)
blueprint_files = catalog.blueprint.find_blueprint_files()
print(f"Found {len(blueprint_files)} blueprint files:")
for bp_file in blueprint_files[:10]:  # Show first 10
    print(f"  - {bp_file.name}")

# You can also filter by stage
postconfig_files = catalog.blueprint.find_blueprint_files(stage="postconfig")
print(f"\nFound {len(postconfig_files)} postconfig blueprint files")


Found 13 blueprint files:
  - B_cson_roms-marbl_v0.1_ccs-12km_build.yml
  - B_cson_roms-marbl_v0.1_ccs-12km_postconfig.yml
  - B_cson_roms-marbl_v0.1_ccs-12km_preconfig.yml
  - B_cson_roms-marbl_v0.1_gulf-guinea-toy_build.yml
  - B_cson_roms-marbl_v0.1_gulf-guinea-toy_postconfig.yml
  - B_cson_roms-marbl_v0.1_gulf-guinea-toy_preconfig.yml
  - B_cson_roms-marbl_v0.1_hvalfjörður-0_preconfig.yml
  - B_cson_roms-marbl_v0.1_test-tiny_build.yml
  - B_cson_roms-marbl_v0.1_test-tiny_postconfig.yml
  - B_cson_roms-marbl_v0.1_test-tiny_preconfig.yml

Found 3 postconfig blueprint files


### Loading a Single Blueprint

You can load and inspect a single blueprint file:


In [6]:
# Load a single blueprint
if blueprint_files:
    bp_data = catalog.blueprint.load_blueprint(blueprint_files[0])
    blueprint_name = bp_data.get('name', '')
    model_name, grid_name = catalog.blueprint._extract_model_and_grid_name(blueprint_name)
    partitioning = bp_data.get('partitioning', {})
    
    print(f"Blueprint name: {blueprint_name}")
    print(f"Model name: {model_name}")
    print(f"Grid name: {grid_name}")
    print(f"Description: {bp_data.get('description')}")
    print(f"Start time: {bp_data.get('valid_start_date')}")
    print(f"End time: {bp_data.get('valid_end_date')}")
    if isinstance(partitioning, dict):
        print(f"Processors: {partitioning.get('n_procs_x')} x {partitioning.get('n_procs_y')}")


Blueprint name: cson_roms-marbl_v0.1_ccs-12km
Model name: cson_roms-marbl_v0.1
Grid name: ccs-12km
Description: California Current System
Start time: 2024-01-01T00:00:00
End time: 2024-01-02T00:00:00
Processors: 16 x 20


### Loading Grid Parameters

You can extract grid keyword arguments from a grid YAML file:


In [7]:
# Load grid kwargs from a blueprint
# Grid YAML files are typically in the same directory as the blueprint
if blueprint_files:
    bp_file = blueprint_files[0]
    bp_data = catalog.blueprint.load_blueprint(bp_file)

    # Look for _grid.yml in the same directory as the blueprint
    grid_yaml_path = bp_file.parent / "_grid.yml"
    
    if grid_yaml_path.exists():
        try:
            grid_kwargs = catalog.blueprint.load_grid_kwargs(grid_yaml_path)
            print("Grid parameters:")                
            for key, value in grid_kwargs.items():
                print(f"  {key}: {value}")
        except Exception as e:
            print(f"Could not load grid kwargs: {e}")
    else:
        print(f"Grid YAML file not found at {grid_yaml_path}")
        print("Grid parameters may be available in the DataFrame after calling load()")


Grid parameters:
  nx: 224
  ny: 440
  size_x: 2688
  size_y: 5280
  center_lon: -134.5
  center_lat: 39.6
  rot: 33.3
  N: 100
  theta_s: 6.0
  theta_b: 6.0
  hc: 250
  topography_source: {'name': 'ETOPO5'}
  mask_shapefile: None
  hmin: 5.0


## Loading All Blueprints into a DataFrame

The main feature is the `load()` method, which returns a pandas DataFrame with all data necessary to instantiate `OcnModel` objects:


In [8]:
# Load all blueprints into a DataFrame
# Defaults to 'postconfig' stage which has the most complete data
df = catalog.blueprint.load(stage="postconfig")

print(f"Loaded {len(df)} blueprints")
print(f"\nDataFrame columns: {list(df.columns)}")
print(f"\nDataFrame shape: {df.shape}")

# Display the DataFrame (excluding dict columns for readability)
display_cols = [col for col in df.columns if col not in ['grid_kwargs']]
df[display_cols]

Loaded 3 blueprints

DataFrame columns: ['model_name', 'grid_name', 'blueprint_name', 'description', 'start_time', 'end_time', 'np_eta', 'np_xi', 'grid_kwargs', 'blueprint_path', 'grid_yaml_path', 'input_data_dir', 'stage']

DataFrame shape: (3, 13)


Unnamed: 0,model_name,grid_name,blueprint_name,description,start_time,end_time,np_eta,np_xi,blueprint_path,grid_yaml_path,input_data_dir,stage
0,cson_roms-marbl_v0.1,ccs-12km,cson_roms-marbl_v0.1_ccs-12km,California Current System,2024-01-01T00:00:00,2024-01-02T00:00:00,20,16,/Users/mclong/codes/cson-forge/cson_forge/blue...,/Users/mclong/codes/cson-forge/cson_forge/blue...,/Users/mclong/cson-forge-data/input-data/cson_...,postconfig
1,cson_roms-marbl_v0.1,gulf-guinea-toy,cson_roms-marbl_v0.1_gulf-guinea-toy,Gulf of Guinea (toy),2012-01-01T00:00:00,2012-01-02T00:00:00,5,2,/Users/mclong/codes/cson-forge/cson_forge/blue...,/Users/mclong/codes/cson-forge/cson_forge/blue...,/Users/mclong/cson-forge-data/input-data/cson_...,postconfig
2,cson_roms-marbl_v0.1,test-tiny,cson_roms-marbl_v0.1_test-tiny,Test tiny,2012-01-01T00:00:00,2012-01-02T00:00:00,1,2,/Users/mclong/codes/cson-forge/cson_forge/blue...,/Users/mclong/codes/cson-forge/cson_forge/blue...,/Users/mclong/cson-forge-data/input-data/cson_...,postconfig


### Inspecting the DataFrame

Let's look at the structure of the DataFrame:


In [9]:
# Display basic information about the DataFrame
if not df.empty:
    print("DataFrame info:")
    print(df.info())
    
    print("\nFirst few rows:")
    # Display non-dict columns for readability
    display_cols = [col for col in df.columns if col not in ['grid_kwargs']]
    display(df[display_cols].head())


DataFrame info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   model_name      3 non-null      object
 1   grid_name       3 non-null      object
 2   blueprint_name  3 non-null      object
 3   description     3 non-null      object
 4   start_time      3 non-null      object
 5   end_time        3 non-null      object
 6   np_eta          3 non-null      int64 
 7   np_xi           3 non-null      int64 
 8   grid_kwargs     3 non-null      object
 9   blueprint_path  3 non-null      object
 10  grid_yaml_path  3 non-null      object
 11  input_data_dir  3 non-null      object
 12  stage           3 non-null      object
dtypes: int64(2), object(11)
memory usage: 444.0+ bytes
None

First few rows:


Unnamed: 0,model_name,grid_name,blueprint_name,description,start_time,end_time,np_eta,np_xi,blueprint_path,grid_yaml_path,input_data_dir,stage
0,cson_roms-marbl_v0.1,ccs-12km,cson_roms-marbl_v0.1_ccs-12km,California Current System,2024-01-01T00:00:00,2024-01-02T00:00:00,20,16,/Users/mclong/codes/cson-forge/cson_forge/blue...,/Users/mclong/codes/cson-forge/cson_forge/blue...,/Users/mclong/cson-forge-data/input-data/cson_...,postconfig
1,cson_roms-marbl_v0.1,gulf-guinea-toy,cson_roms-marbl_v0.1_gulf-guinea-toy,Gulf of Guinea (toy),2012-01-01T00:00:00,2012-01-02T00:00:00,5,2,/Users/mclong/codes/cson-forge/cson_forge/blue...,/Users/mclong/codes/cson-forge/cson_forge/blue...,/Users/mclong/cson-forge-data/input-data/cson_...,postconfig
2,cson_roms-marbl_v0.1,test-tiny,cson_roms-marbl_v0.1_test-tiny,Test tiny,2012-01-01T00:00:00,2012-01-02T00:00:00,1,2,/Users/mclong/codes/cson-forge/cson_forge/blue...,/Users/mclong/codes/cson-forge/cson_forge/blue...,/Users/mclong/cson-forge-data/input-data/cson_...,postconfig


### Viewing Grid Parameters

The `grid_kwargs` column contains dictionaries with grid parameters:


In [10]:
# Display grid kwargs for the first blueprint
if not df.empty:
    first_row = df.iloc[0]
    print(f"Grid kwargs for {first_row['grid_name']}:")
    grid_kwargs = first_row['grid_kwargs']
    if isinstance(grid_kwargs, dict):
        for key, value in grid_kwargs.items():
            print(f"  {key}: {value}")


Grid kwargs for ccs-12km:
  nx: 224
  ny: 440
  size_x: 2688
  size_y: 5280
  center_lon: -134.5
  center_lat: 39.6
  rot: 33.3
  N: 100
  theta_s: 6.0
  theta_b: 6.0
  hc: 250
  topography_source: {'name': 'ETOPO5'}
  mask_shapefile: None
  hmin: 5.0


### Querying the DataFrame

You can query the DataFrame to find specific blueprints:


In [11]:
# Query by model name
if not df.empty:
    model_name = df['model_name'].iloc[0] if 'model_name' in df.columns else None
    if model_name:
        model_blueprints = df[df['model_name'] == model_name]
        print(f"Found {len(model_blueprints)} blueprints for model '{model_name}':")
        print(model_blueprints[['grid_name', 'start_time', 'end_time']].to_string())

# Query by grid name
if not df.empty and 'grid_name' in df.columns:
    grid_name = df['grid_name'].iloc[0]
    grid_blueprints = df[df['grid_name'] == grid_name]
    print(f"\nFound {len(grid_blueprints)} blueprints for grid '{grid_name}':")
    print(grid_blueprints[['model_name', 'start_time', 'end_time']].to_string())


Found 3 blueprints for model 'cson_roms-marbl_v0.1':
         grid_name           start_time             end_time
0         ccs-12km  2024-01-01T00:00:00  2024-01-02T00:00:00
1  gulf-guinea-toy  2012-01-01T00:00:00  2012-01-02T00:00:00
2        test-tiny  2012-01-01T00:00:00  2012-01-02T00:00:00

Found 1 blueprints for grid 'ccs-12km':
             model_name           start_time             end_time
0  cson_roms-marbl_v0.1  2024-01-01T00:00:00  2024-01-02T00:00:00


## Instantiating OcnModel from DataFrame

The DataFrame contains all the data needed to instantiate `OcnModel` objects. Here's how to use it:


In [12]:
# Example: Using blueprint data to create a CstarSpecBuilder
from cson_forge import CstarSpecBuilder
from datetime import datetime

# Example: Create a builder from blueprint data
if not df.empty:
    row = df.iloc[0]
    
    # Note: To recreate a builder from a blueprint, you would need to:
    # 1. Load the blueprint file using CstarSpecBuilder.from_blueprint() (if such method exists)
    # 2. Or manually extract data from the DataFrame and create a new builder
    # 3. The blueprint_path column contains the path to the blueprint file
    
    print(f"Blueprint path: {row['blueprint_path']}")
    print(f"Model: {row['model_name']}, Grid: {row['grid_name']}")
    print(f"Time range: {row['start_time']} to {row['end_time']}")
    
    # TODO: Add method to CstarSpecBuilder to load from existing blueprint file
    

Blueprint path: /Users/mclong/codes/cson-forge/cson_forge/blueprints/cson_roms-marbl_v0.1_ccs-12km/B_cson_roms-marbl_v0.1_ccs-12km_postconfig.yml
Model: cson_roms-marbl_v0.1, Grid: ccs-12km
Time range: 2024-01-01T00:00:00 to 2024-01-02T00:00:00


## Summary

The `catalog.blueprint` module provides:

1. **Discovery**: Find all blueprint files in the blueprints directory (pattern: `B_*.yml`)
2. **Loading**: Load individual blueprints or all blueprints at once (with optional stage filtering)
3. **Data Extraction**: Extract model/grid names, dates, partitioning, and grid parameters
4. **DataFrame Interface**: Get all blueprint data in a pandas DataFrame for easy querying
5. **Grid Parameters**: Load grid keyword arguments from `_grid.yml` files when available

This makes it easy to:
- Query existing blueprints by model, grid, or stage
- Compare configurations across different domains
- Access blueprint metadata (names, dates, partitioning, paths)
- Build analysis workflows that work with multiple domains
- Programmatically work with stored blueprint configurations
