# Standard Name Convention

The "Standard Name Convention" is one realization of a convention promoted by the toolbox. It is based on the idea, that every dataset must have a physical unit (or none if it is dimensionless) and that datasets must be identifiable via an identifier attribute rather than the dataset name itself.

The key standard attributes are 

 - `standard_name`: A human- and machine-readable dataset identifier based on construction rules and listed in a "Standard Name Table",
 - `standard_name_table`: List of `standard_name` together with the base unit (SI) and a comprehensive description. It also includes additional information about how a `standard_name` can be transformed into a new `standard_name`
 - `units`: The unit attribute of a dataset. Must not be SI-unit, but must be convertible to it and then match the registered SI-unit in the Standard name table,
 - `long_name`: An alternative name if no `standard_name` is applicable.

This concept is first introduced by the Climate and Forecast community and is called [CF-convention](http://cfconventions.org/). The `h5RDMtoolbox` adopts the concept and implements a general version of it, so that users can define their own discipline- or problem-specific standard name convention.

Main benefits of the convention are:
- achieving self-describing files, which are human and machine interpretation interpretable,
- validating correctness of dataset identifiers (standard_name) and their units
- allowing unit-aware processing of data.

This chapter walks you through the concept and shows how to apply it

In [None]:
import h5rdmtoolbox as h5tbx
import warnings
warnings.filterwarnings('ignore')

from h5rdmtoolbox.convention.standard_names.table import StandardNameTable

## Standard Name Tables

### Example 1: cf-convention

The Standard name table should be defined in documents (typically XML or YAML). The corresponding object then can be initialized by the respective constructor methods (`from_yaml`, `from_web`, ...).

For reading the original CF-convention table, do the following:

In [None]:
cf = StandardNameTable.from_web("https://cfconventions.org/Data/cf-standard-names/79/src/cf-standard-name-table.xml",
                               known_hash='4c29b5ad70f6416ad2c35981ca0f9cdebf8aab901de5b7e826a940cf06f9bae4')
cf

The standard names are items of the table object:

In [None]:
cf['x_wind']

In [None]:
cf['x_wind'].units

In [None]:
cf['x_wind'].description

## Example 2: User defined table

Initializing standard name tables from a web-resource should be the standard process, because a project or community might defined it and published it under a DOI.

The `h5rdmtoolbox` especially supports tables that are published on [Zenodo](https://zenodo.org/):

In [None]:
snt = StandardNameTable.from_zenodo(10428795)
snt

Here are the standard names of the table:

In [None]:
snt.names

In a notebook, we can also get a nice overview of the table by calling `dump()`:

In [None]:
snt.dump()

### Transformation of base standard names
Not all allowed standard names must be included in the table. There are some so-called transformations of the listed ones. 
There are two ways to transform a standard name.

 1. Using affixes: Adding a prefix or a suffix
 2. Apply a mathematical operation to the name

#### 1. Adding affixes

Note, that 'x_velocity' is not part of the table:

In [None]:
'x_velocity' in snt

... but 'velocity' is. And it is a vector. The vector property tells us, if we can add a "vector component name" as a prefix, e.g. a "x" or "y":

In [None]:
snt['velocity'].is_vector()

Which vector component exist, are defined in the table:

In [None]:
snt.affixes['component'].values

Thus, by indexing "x_velocity" the table checks whether the prefix is valid and if yes returns the new (transformed) standard name:

In [None]:
snt['x_velocity']

#### Apply a mathematical operation

During processing of data, often times datasets are transformed in with mathematical function like taking the square or applying a derivative of one quantity with respect to (wrt) another one. Some mathemtaical operations like these are supported in the version, e.g.:

In [None]:
snt['derivative_of_x_velocity_wrt_x_coordinate']

In [None]:
snt['square_of_static_pressure']

In [None]:
snt['arithmetic_mean_of_static_pressure']

## Usage with HDF5 files

Let's apply the convention to HDF5 files. We lazyly take the existing tutorial convention and remove some standard attributes in order to limit the example to the relevant attributes of the standard name convention:

In [None]:
zenodo_cv = h5tbx.convention.from_zenodo('https://zenodo.org/record/8357399')
sn_cv = zenodo_cv.pop('contact', 'comment', 'references', 'data_type')
sn_cv.name = 'standard name convention'
sn_cv.register()

h5tbx.use(sn_cv)
sn_cv

Find out about the available standard names: We do this by creating a file and retrieving the attribute`standard_name_table`. Based on the convention, it is set by default, so it is available without explicitly setting it:

In [None]:
with h5tbx.File() as h5:
    snt = h5.standard_name_table

print('The available (base) standard names are: ', snt.names)

One possible dataset based on the standard name table could be "x_velocity". This is possible, because *component* is available in the list of **affixes**. Based on the transformation pattern, it is clear the "component" is a **prefix**. "x" is within the available components, so "x_velocity" is a valid transformed standard name from the given table:

In [None]:
print('Available affixes: ', snt.affixes.keys())

print('\nValues for the component prefix:')
snt.affixes['component']

Let's access the name from the table. It exists and the description is adjusted, too:

In [None]:
snt['x_velocity']

Creating a x-velocity dataset:

In [None]:
with h5tbx.File() as h5:
    h5.create_dataset('u', data=[1,2,3], standard_name='x_velocity', units='km/s')
    h5.dump()

## Usage with HDF5 files (update)

In [None]:
from ontolutils import SSNO

In [None]:
with h5tbx.File(mode='w') as h5:
    ds = h5.create_dataset('u', data=3)
    ds.attrs['standard_name', SSNO.hasStandardName] = 'x_velocity'
    ds.rdf.object['standard_name'] = SSNO.StandardName  # https://matthiasprobst.github.io/ssno#StandardName
    
    ds = h5.create_dataset('v', data=3)
    ds.attrs['standard_name', SSNO.hasStandardName] = 'y_velocity'
    ds.rdf.object['standard_name'] = SSNO.StandardName  # https://matthiasprobst.github.io/ssno#StandardName
    h5.dump(collapsed=False)

hdf_filename = h5.hdf_filename