# Inspect Oceanum OCC CMIP6 Wave Datasources

This notebook provides a comprehensive introduction to searching for and inspecting hydrodynamic
model outputs from Oceanum's OCC oceanum climate projections available through Datamesh. 

## What You'll Learn

By the end of this tutorial, you will understand how to:
- Connect to the Datamesh API using the Oceanum Python client
- Search for wave datasets using different strategies (keywords, tags, filters)
- Understand the different types of wave data available (parameters, spectra, statistics)
- Inspect dataset metadata to understand data structure and content
- Navigate the CMIP6 climate projection datasets from Oceanum

## Background: Oceanum's OCC CMIP6 Wave Model Data

Oceanum provides high-resolution oceanum model outputs based on CMIP6 climate projections.
The data includes:

- **SCHISM (Semi-implicit Cross-scale Hydroscience Integrated System Model)**
- **Wind forcing**: Global and NIWA downscale (CCAM) wind forcing data from CMIP6 ACCESS-CM2 and EC-Earth3 global climate models
- **Time Periods**: Historical (1985-2015) and future projections (2015-2100) under different emission scenarios
- **Shared Socioeconomic Pathways**: SSP245, SSP370
- **Data Types**:
  - **Parameters**: Three-hourly, integraded wave parameters over the full computational grid (height, period, direction, etc.)
  - **Spectra**: Three-hourly, Frequency-direction 2D wave energy spectra at a large subset of grid points
  - **Gridstats**: Statistical summaries and derived metrics

### Required Python Libraries

- [oceanum](https://oceanum-python.readthedocs.io/en/latest/): Python client for accessing Datamesh


In [2]:
import textwrap
from oceanum.datamesh import Connector

import warnings
warnings.filterwarnings("ignore")


## 1. Setting Up the Connection

The [Connector](https://oceanum-python.readthedocs.io/en/latest/classes/datamesh/oceanum.datamesh.Connector.html) class provides access to methods for querying data and metadata from Datamesh.

**Authentication**: You can provide your Datamesh token directly or set it as an environment variable `DATAMESH_TOKEN`.


In [3]:
conn = Connector(token=None)

## 2. Searching for Datasets

The `get_catalog` method allows you to search for datasets using keywords that match against dataset names, descriptions, and tags.


### 2.1 Basic Keyword Search

Let's start with a broad search to see all available Oceanum CMIP6 wave datasources:


In [4]:
cat = conn.get_catalog("occ cmip6 current")

# List the matching datasources. Each entry is a Datasource object carrying all the metadata

print(f"Found {len(list(cat))} datasets matching 'occ cmip6 wave'")

list(cat)


Found 6 datasets matching 'occ cmip6 wave'


[
         Calypso CMIP6 SCHISM ACCESS-CM2 nz historical hydro parameters [calypso_cmip6_schism_nz_access_cm2_historical_r1i1p1f1_grid]
             Extent: (163.73491587, -49.24794654, 182.32942388, -32.00917599)
             Timerange: 1984-11-01 00:00:00+00:00 to 2015-01-01 00:00:00+00:00
         ,
 
         Calypso CMIP6 SCHISM ACCESS-CM2 nz ssp245 hydro parameters [calypso_cmip6_schism_nz_access_cm2_ssp245_r1i1p1f1_grid]
             Extent: (163.73491587, -49.24794654, 182.32942388, -32.00917599)
             Timerange: 2015-01-01 00:00:00+00:00 to 2025-09-15 23:35:27.470505+00:00
         ,
 
         Calypso CMIP6 SCHISM ACCESS-CM2 nz ssp370 hydro parameters [calypso_cmip6_schism_nz_access_cm2_ssp370_r1i1p1f1_grid]
             Extent: (163.73491587, -49.24794654, 182.32942388, -32.00917599)
             Timerange: 2015-01-01 00:00:00+00:00 to 2025-09-15 23:35:27.470505+00:00
         ,
 
         Calypso CMIP6 SCHISM EC-Earth3 nz historical hydro parameters [calypso_cmip6_sc

### 2.2 Refined Search by Model and Time Period

The search results show many datasets. Let's refine our search to focus on historical runs driven by the EC-Earth3 climate model:


In [5]:
cat = conn.get_catalog("occ cmip6 current historical EC-Earth3")

print(f"Found {len(list(cat))} historical EC-Earth3 datasets")

list(cat)


Found 1 historical EC-Earth3 datasets


[
         Calypso CMIP6 SCHISM EC-Earth3 nz historical hydro parameters [calypso_cmip6_schism_nz_ec_earth3_historical_r1i1p1f1_grid]
             Extent: (163.73491587, -49.24794654, 182.32942388, -32.00917599)
             Timerange: 1984-11-01 00:00:00+00:00 to 2015-01-01 00:00:00+00:00
         ]

### 2.3 Tag-Based Search

For more precise searches, you can use tags with the format `tags:tag1&tag2&tag3`. This is particularly useful when you know exactly what type of data you need.

**Example 1**: Search for all nz schism datasets


In [6]:
cat = conn.get_catalog("tags:occ&cmip6&schism&nz")

print(f"Found {len(list(cat))} SCHISM NZ parameter datasets")

list(cat)

Found 6 SCHISM NZ parameter datasets


[
         Calypso CMIP6 SCHISM ACCESS-CM2 nz historical hydro parameters [calypso_cmip6_schism_nz_access_cm2_historical_r1i1p1f1_grid]
             Extent: (163.73491587, -49.24794654, 182.32942388, -32.00917599)
             Timerange: 1984-11-01 00:00:00+00:00 to 2015-01-01 00:00:00+00:00
         ,
 
         Calypso CMIP6 SCHISM ACCESS-CM2 nz ssp245 hydro parameters [calypso_cmip6_schism_nz_access_cm2_ssp245_r1i1p1f1_grid]
             Extent: (163.73491587, -49.24794654, 182.32942388, -32.00917599)
             Timerange: 2015-01-01 00:00:00+00:00 to 2025-09-15 23:35:27.470505+00:00
         ,
 
         Calypso CMIP6 SCHISM ACCESS-CM2 nz ssp370 hydro parameters [calypso_cmip6_schism_nz_access_cm2_ssp370_r1i1p1f1_grid]
             Extent: (163.73491587, -49.24794654, 182.32942388, -32.00917599)
             Timerange: 2015-01-01 00:00:00+00:00 to 2025-09-15 23:35:27.470505+00:00
         ,
 
         Calypso CMIP6 SCHISM EC-Earth3 nz historical hydro parameters [calypso_cmip6_sc

**Example 3**: Search for all gridded hydro statistics datasets (not done yet)


In [10]:
cat = Connector().get_catalog("tags:occ&cmip6&schism&gridstats")

print(f"Found {len(list(cat))} gridded hydro statistics datasets")

list(cat)

Found 0 gridded hydro statistics datasets


[]

## 4. Inspecting Dataset Metadata

Each dataset in the catalog is represented by a [Datasource](https://oceanum-python.readthedocs.io/en/latest/classes/datamesh/oceanum.datamesh.Datasource.html#oceanum.datamesh.Datasource) object containing comprehensive metadata.


### 4.1 Getting a Specific Dataset

You can retrieve a specific dataset either from search results:


In [11]:
# Get the first dataset from our search results

ds = list(cat)[0]

print("Dataset from search results:")

print(ds)

IndexError: list index out of range

Or get a dataset directly by its ID. In this example, we'll examine in detail the output from a **SWAN gridded parameters** dataset.


In [12]:
ds = conn.get_datasource("calypso_cmip6_schism_nz_access_cm2_historical_r1i1p1f1_grid")

print("Dataset retrieved by ID:")

print(ds)

Dataset retrieved by ID:

        Calypso CMIP6 SCHISM ACCESS-CM2 nz historical hydro parameters [calypso_cmip6_schism_nz_access_cm2_historical_r1i1p1f1_grid]
            Extent: (163.73491587, -49.24794654, 182.32942388, -32.00917599)
            Timerange: 1984-11-01 00:00:00+00:00 to 2015-01-01 00:00:00+00:00
            10 attributes
            27 variables
        


### 4.2 Essential Dataset Attributes

Let's explore the key metadata attributes that help you understand and work with the dataset:


**Dataset ID** (unique identifier):


In [13]:
print("Dataset ID:", ds.id)

Dataset ID: calypso_cmip6_schism_nz_access_cm2_historical_r1i1p1f1_grid


**Human-readable name**:


In [14]:
print("Dataset Name:", ds.name)

Dataset Name: Calypso CMIP6 SCHISM ACCESS-CM2 nz historical hydro parameters


**Detailed description**:


In [15]:
print("Description:")

print(textwrap.fill(ds.description, width=120))

Description:
NZ sea surface elevation and depth-average current from Calypso SCHISM driven by ACCESS-CM2 under CMIP6 historical
simulations (r1i1p1f1 ensemble member). This dataset provides comprehensive parameters for climate research and
oceanographic applications as part of the Our Changing Coast project.


**Available variables** (wave parameters):


In [16]:
print("Available Variables:")

variables = list(ds.variables.keys())

print(f"Total: {len(variables)} variables")

print("Variables:", variables)

Available Variables:
Total: 27 variables
Variables: ['Cs', 'dahv', 'elev', 'zcor', 'depth', 'sigma_h_c', 'wetdry_elem', 'wetdry_node', 'wetdry_side', 'SCHISM_hgrid', 'minimum_depth', 'sigma_theta_b', 'sigma_theta_f', 'dry_value_flag', 'sigma_maxdepth', 'ele_bottom_index', 'edge_bottom_index', 'node_bottom_index', 'SCHISM_hgrid_edge_x', 'SCHISM_hgrid_edge_y', 'SCHISM_hgrid_face_x', 'SCHISM_hgrid_face_y', 'SCHISM_hgrid_node_x', 'SCHISM_hgrid_node_y', 'coordinate_system_flag', 'SCHISM_hgrid_edge_nodes', 'SCHISM_hgrid_face_nodes']


### 4.3 Temporal Coverage


In [17]:
print("Time Coverage:")

print(f"Start: {ds.tstart}")

print(f"End: {ds.tend}")

print(f"Duration: {ds.tend.year - ds.tstart.year} years")

Time Coverage:
Start: 1984-11-01 00:00:00+00:00
End: 2015-01-01 00:00:00+00:00
Duration: 31 years


### 4.4 Spatial Coverage


In [18]:
print("Spatial Extent (West, South, East, North):")

bounds = ds.geom.bounds

print(f"Longitude: {bounds[0]}° to {bounds[2]}°")

print(f"Latitude: {bounds[1]}° to {bounds[3]}°")

Spatial Extent (West, South, East, North):
Longitude: 163.73491587° to 182.32942388°
Latitude: -49.24794654° to -32.00917599°


### 4.5 Coordinate System


In [19]:
print("Coordinate Mapping:")

for key, value in ds.coordinates.items():
    print(f"  {key} → {value}")

Coordinate Mapping:
  t → time
  x → longitude
  y → latitude


## 5. Detailed Data Schema Inspection

The data schema provides complete information about the dataset structure, including dimensions, coordinates, and variable attributes:


In [22]:
schema = ds.dataschema.model_dump()

print("Dataset Dimensions:")

for dim, size in schema['dims'].items():
    print(f"  {dim}: {size}")

print(f"\nTotal data points: {schema['dims']['time'] * schema['dims']['nSCHISM_hgrid_node'] }")


Dataset Dimensions:
  one: 1
  two: 2
  time: 264433
  sigma: 2
  nSCHISM_hgrid_edge: 229403
  nSCHISM_hgrid_face: 151682
  nSCHISM_hgrid_node: 77703
  nSCHISM_vgrid_layers: 2
  nMaxSCHISM_hgrid_face_nodes: 4

Total data points: 20547237399


### 5.1 Variable Details

Let's examine a few key wave variables in detail:


In [23]:
# Key parameters to highlight
key_variables = ['elev', 'dahv' ]

print("Key Variables:")
print("-" * 80)

for var in key_variables:
    if var in schema['data_vars']:
        var_info = schema['data_vars'][var]
        print(f"\n{var.upper()}:")
        # print(f"  Name: {var_info['attrs']['long_name']}")
        # print(f"  Units: {var_info['attrs']['units']}")
        print(f"  Range: {var_info['attrs']['valid_min']} - {var_info['attrs']['valid_max']}")
        if 'standard_name' in var_info['attrs']:
            print(f"  CF Standard Name: {var_info['attrs']['standard_name']}")


Key Variables:
--------------------------------------------------------------------------------

ELEV:
  Range: -10 - 10

DAHV:
  Range: -10 - 10


## 6. Summary and Next Steps

### What We've Covered

1. **Connection Setup**: How to connect to Datamesh using the Oceanum Python client
2. **Search Strategies**: Different approaches to find relevant hydro datasets
3. **Dataset Types**: Understanding parameters and gridstats datasets
4. **Metadata Inspection**: Exploring dataset attributes, coverage, and structure
5. **Variable Understanding**: Categorizing and interpreting wave parameters

### Recommended Next Steps

1. **Data Access**: Learn how to actually download and work with the data using `conn.get_data()`
2. **Spatial/Temporal Filtering**: Practice subsetting data for specific regions and time periods
3. **Data Analysis**: Explore hydro climate patterns, trends, and statistics
4. **Visualization**: Create maps and time series plots of hydro parameters

### Key Takeaways for New Users

- Start with **parameter datasets** (`_grid`) for general wave analysis
- Use **tag-based searches** for precise dataset discovery
- Always inspect **metadata** before downloading large datasets
- Consider **spatial and temporal coverage** when selecting datasets
- Understand the difference between **historical** and **projection** scenarios

### Getting Help

- [Oceanum Python Documentation](https://oceanum-python.readthedocs.io/en/latest/)
- [Datamesh API Reference](https://oceanum-python.readthedocs.io/en/latest/classes/datamesh/)
- Contact Oceanum support for specific questions about datasets
