# Standard Name Convention

The "Standard Name Convention" is one realization of a convention promoted by the toolbox. The key standard attributes are 

 - `standard_name`: A human- and machine-readable standardized name (identifier) of a dataset based on construction rules and documented in a "Standard name table",
 - `standard_name_table`: List of `standard_name` together with the base unit (SI) and a comprehensive description. It also includes additional information about how a `standard_name` can be transformed into a new `standard_name`
 - `units`: The unit attribute of a dataset. Must not be SI-unit, but must be convertible to it and then match the registered SI-unit in the Standard name table,
 - `long_name`: An alternative name if no `standard_name` is applicable.

## Concept and its implications
The concept behind it is quite simple:
- The standard name is a human-readable (of course) and interpretable identifier of the dataset. This means, the content of the dataset can be interpreted by the standard_name alone.
- A further description can be found in the so-called "Standard Name table". It is a table with all allowed standard names, their description *and* physical unit. The later must be provided with the dataset, too.

This concept is first introduced by the climate and forecast community and is called [CF-convention](http://cfconventions.org/). The `h5RDMtoolbox` adopts the concept and implemented a general version of the concept, so users can define their own discipline- or problem-specific standard name convention.

Main benefits of the convention are:
- Human and machine interpretation of dataset content,
- validation of dataset identifiers (standard_name) and their units
- unit-aware processing of data.

This chapter walks you through the concept and shows how to apply it

In [1]:
import h5rdmtoolbox as h5tbx
import warnings
warnings.filterwarnings('ignore')

from h5rdmtoolbox.conventions.standard_names.table import StandardNameTable

## Standard Name Tables

## Example 1: cf-convention

The Standard name table should be defined in documents (typically XML or YAML). The corresponding object then can be initialized by the respective constructor methods (`from_yaml`, `from_web`, ...).

For reading the original CF-convention table, do the following:

In [2]:
cf = StandardNameTable.from_web("https://cfconventions.org/Data/cf-standard-names/79/src/cf-standard-name-table.xml")
cf

The standard names are items of the table object:

In [3]:
cf['x_wind']

In [4]:
cf['x_wind'].units

In [5]:
cf['x_wind'].description

'"x" indicates a vector component along the grid x-axis, positive with increasing x. Wind is defined as a two-dimensional (horizontal) air velocity vector, with no vertical component. (Vertical motion in the atmosphere has the standard name upward_air_velocity.).'

## Example 2: User defined table

Initializing standard name tables from a web-resource should be the standard process, because a project or community might defined it and published it under a DOI.

The `h5rdmtoolbox` especially supports tables that are published on [Zenodo](https://zenodo.org/):

In [6]:
snt = StandardNameTable.from_zenodo(8276716)
snt

Here are the standard names of the table:

In [7]:
snt.names

['coordinate', 'static_pressure', 'time', 'velocity']

In a notebook, we can also get a nice overview of the table by calling `dump()`:

In [8]:
snt.dump()

Unnamed: 0,units,description,vector
coordinate,m,Coordinate refers to the spatial coordinate. Coordinate is a vector quantity.,True
static_pressure,Pa,Static pressure refers to the force per unit area exerted by a fluid. Pressure is a scalar quantity.,
time,s,Time refers tothe relative time since start of data aquisition.,
velocity,m/s,Velocity refers to the change of position over time. Velocity is a vector quantity.,True


### Transformation of base standard names
Not all allowed standard names must be included in the table. There are some so-called transformations of the listed ones. 
There are two ways to transform a standard name.

 1. Using affixes: Adding a prefix or a suffix
 2. Apply a mathematical operation to the name

#### 1. Adding affixes

Note, that 'x_velocity' is not part of the table:

In [9]:
'x_velocity' in snt

False

... but 'velocity' is. And it is a vector. The vector property tells us, if we can add a "vector component name" as a prefix, e.g. a "x" or "y":

In [10]:
snt['velocity'].is_vector()

True

Which vector component exist, are defined in the table:

In [11]:
snt.affixes['component'].values

{'x': 'X indicates the x-axis component of the vector.',
 'y': 'Y indicates the y-axis component of the vector.',
 'z': 'Z indicates the z-axis component of the vector.'}

Thus, by indexing "x_velocity" the table checks whether the prefix is valid and if yes returns the new (transformed) standard name:

In [12]:
snt['x_velocity']

#### Apply a mathematical operation

During processing of data, often times datasets are transformed in with mathematical function like taking the square or applying a derivative of one quantity with respect to (wrt) another one. Some mathemtaical operations like these are supported in the version, e.g.:

In [13]:
snt['derivative_of_x_velocity_wrt_x_coordinate']

In [14]:
snt['square_of_static_pressure']

In [15]:
snt['arithmetic_mean_of_static_pressure']

## Usage with HDF5 files

Let's apply the convention to HDF5 files. We lazyly take the existing tutorial convention and remove some standard attributes in order to limit the example to the relevant attributes of the standard name convention:

In [16]:
zenodo_cv = h5tbx.conventions.from_zenodo('https://zenodo.org/record/8301535')
sn_cv = zenodo_cv.pop('contact', 'comment', 'references', 'data_type')
sn_cv.name = 'standard name convention'
sn_cv.register()

h5tbx.use(sn_cv)
sn_cv

[1mConvention("standard name convention")[0m
contact: https://orcid.org/0000-0001-8729-0482
  File.__init__():
    * [3mstandard_name_table[0m (default=<h5rdmtoolbox.conventions.consts.DefaultValue object at 0x0000018A020F88B0>):
		The standard name table of the convention.
  Group.create_dataset():
    * [1munits[0m:
		The physical unit of the dataset. If dimensionless, the unit is ''.
    * [1mstandard_name[0m:
		Standard name of the dataset. If not set, the long_name attribute must be given.
    * [1mlong_name[0m:
		An comprehensive description of the dataset. If not set, the standard_name attribute must be given.
    * [3mreferencesdataset[0m (default=None):
		Web resources servering as references for the dataset.
  Group.create_group(): ([3mNothing registered[0m)

Find out about the available standard names: We do this by creating a file and retrieving the attribute`standard_name_table`. Based on the convention, it is set by default, so it is available without explicitly setting it:

In [17]:
with h5tbx.File() as h5:
    snt = (h5.standard_name_table)

print('The available (base) standard names are: ', snt.names)

The available (base) standard names are:  ['coordinate', 'static_pressure', 'time', 'velocity']


One possible dataset based on the standard name table could be "x_velocity". This is possible, because *component* is available in the list of **affixes**. Based on the transformation pattern, it is clear the "component" is a **prefix**. "x" is within the available components, so "x_velocity" is a valid transformed standard name from the given table:

In [18]:
print('Available affixes: ', snt.affixes.keys())

print('\nValues for the component prefix:')
snt.affixes['component']

Available affixes:  dict_keys(['device', 'location', 'reference_frame', 'component'])

Values for the component prefix:


<Affix: name="component", description="Components are prefixes to the standard_name, e.g. x_velocity." transformation_pattern=^(.*)_(.*)$, values=['x', 'y', 'z']>

Let's access the name from the table. It exists and the description is adjusted, too:

In [19]:
snt['x_velocity']

Creating a x-velocity dataset:

In [20]:
with h5tbx.File() as h5:
    h5.create_dataset('u', data=[1,2,3], standard_name='x_velocity', units='km/s')
    h5.dump()