# Standard Name Convention

Standard name conventions define how meta information is called what syntax is accepted. Intentionally this results in efficient, automatated and clear data exploration and processing.

Most basically, a name identifier should be defined as an attribute of every dataset in an HDF file. A popular one is "standard_name" as used by the climate and forecast community. It e.g. does not allow space in standard names and must be lower case. Furthermore, the construction of it is defined in online documentations and naming tables (standard name tables) provide standard names currently excepted by the community. This packages adopts this concept by introducing standardized name tables (class `StandardNameTable`) which allows flexible usage of such name definitions.

In [1]:
import h5rdmtoolbox as h5tbx
from h5rdmtoolbox import use

use('cflike')

2023-03-01_14:19:11,807 INFO     [__init__.py:72] Switched to "cflike"


Whenever a dataset is written and the parameter `standard_name` is set, it is verified against the standard name convention/table associated with the wrapper class. If the constant `STRICT` is set to True (default), the name is looked-up in the table and, if not found, the dataset cannot be written. To allow standard names, that fulfill the spelling requirements but are not yet listed in the table, set `STRICT` to False:

In [2]:
h5tbx.conventions.cflike.standard_name.STRICT = False

## Initialize a Standard Name Convention
A standardized name table is a XML document, which contains (at least) a description and a canonical unit for a standarized name. We'll build one from scratch first and then have a look into already implemented ones:

Call `StandardNameTable` from the sub-package `conventions` and provide a `name`, `version`, `table`, `contact` and and `insitution`:

In [3]:
from h5rdmtoolbox.conventions.cflike import StandardNameTable

In [4]:
sc = StandardNameTable(
    name='Test_SNC',
    table={},
    version_number=1,
    contact='contact@python.com',
    institution='my_institution'
)
sc

Test_SNC (version number: 1)

We have built an empty convention (no table content). Lets add content. We can do this by creating a dictionary first...

In [5]:
tabledict = {'x_velocity': dict(canonical_units='m/s', description='velocity is a vector quantity.')}
tabledict

{'x_velocity': {'canonical_units': 'm/s',
  'description': 'velocity is a vector quantity.'}}

... and add it to the object by calling `update()`:

In [6]:
sc.update(tabledict)
sc.dump()

Unnamed: 0,canonical_units,description
x_velocity,m/s,velocity is a vector quantity.


New entries can be assigned by using `set` or `modified` depending on whether the entry already exists or not:

In [7]:
sc.set('time', canonical_units='s', description='physical time')
sc.modify('x_velocity', canonical_units='m/s', description='velocity is a vector quantity. x indicates the component in y-axis direction')
sc.set('y_velocity', canonical_units='m/s', description='velocity is a vector quantity. y indicates the component in y-axis direction')
sc.set('z_velocity', canonical_units='m/s', description='velocity is a vector quantity. z indicates the component in z-axis direction')
sc.sdump()

Test_SNC (version: 1)
+------------+-------------------+------------------------------------------------------------------------------+
|            | canonical_units   | description                                                                  |
|------------+-------------------+------------------------------------------------------------------------------|
| time       | s                 | physical time                                                                |
| x_velocity | m/s               | velocity is a vector quantity. x indicates the component in y-axis direction |
| y_velocity | m/s               | velocity is a vector quantity. y indicates the component in y-axis direction |
| z_velocity | m/s               | velocity is a vector quantity. z indicates the component in z-axis direction |
+------------+-------------------+------------------------------------------------------------------------------+


## Saving Standard Name Table

Standard name tables should be saved as xml-documents or yaml-files:

In [8]:
xml_filename = h5tbx.generate_temporary_filename(suffix='.xml')
sc.to_xml(xml_filename)

yml_filename = h5tbx.generate_temporary_filename(suffix='.yml')
sc.to_yaml(yml_filename)
pass

For later usage from anywhere, the table can be registerd with the toolbox. Call `register()`. It save the convention as yml file in the user directory for standard name data and will use the `versionname`:

In [9]:
sc.register(overwrite=True)
StandardNameTable.print_registered()

 > fluid-v1
 > piv-v1
 > Test-v1
 > Test_SNC-v1


Use the command line call

In [10]:
! h5tbx standard_name --list-registered

 > fluid-v1
 > piv-v1
 > Test-v1
 > Test_SNC-v1


## Load Standard Name Convention from file

If you have standard name tables to your hand, just load them. They must be provided as XML or YML:

In [11]:
sc_test = StandardNameTable.from_yaml(yml_filename)
print(sc_test.versionname)
sc_test.dump()

Test_SNC-v1


Unnamed: 0,canonical_units,description
time,s,physical time
x_velocity,m/s,velocity is a vector quantity. x indicates the component in y-axis direction
y_velocity,m/s,velocity is a vector quantity. y indicates the component in y-axis direction
z_velocity,m/s,velocity is a vector quantity. z indicates the component in z-axis direction


In [12]:
sc_test['x_velocity'].snt

Test_SNC (version number: 1)

Load a registered Standard Name Table from the toolbox:

In [13]:
StandardNameTable.load_registered('Test_SNC-v1').dump()

Unnamed: 0,canonical_units,description
time,s,physical time
x_velocity,m/s,velocity is a vector quantity. x indicates the component in y-axis direction
y_velocity,m/s,velocity is a vector quantity. y indicates the component in y-axis direction
z_velocity,m/s,velocity is a vector quantity. z indicates the component in z-axis direction


## Load from web
Optimally a community has defined a naming conventions, just like the cfconventions from where the concept is adoped. Let's imort their latest xml document:

In [14]:
cf = StandardNameTable.from_web(url='https://cfconventions.org/Data/cf-standard-names/79/src/cf-standard-name-table.xml')
cf

standard_name_table (version number: 79)

In [15]:
cf.versionname

'standard_name_table-v79'

In [16]:
cf.dump(max_rows=4)

Unnamed: 0,canonical_units,grib,amip,description
acoustic_signal_roundtrip_travel_time_in_sea_water,s,,,"The quantity with standard name acoustic_signal_roundtrip_travel_time_in_sea_water is the time taken for an acoustic signal to propagate from the emitting instrument to a reflecting surface and back again to the instrument. In the case of an instrument based on the sea floor and measuring the roundtrip time to the sea surface, the data are commonly used as a measure of ocean heat content."
aerodynamic_particle_diameter,m,,,The diameter of a spherical particle with density 1000 kg m-3 having the same aerodynamic properties as the particles in question.
...,...,...,...,...
y_wind_gust,m s-1,,,"""y"" indicates a vector component along the grid y-axis, positive with increasing y. Wind is defined as a two-dimensional (horizontal) air velocity vector, with no vertical component. (Vertical motion in the atmosphere has the standard name upward_air_velocity.) A gust is a sudden brief period of high wind speed. In an observed time series of wind speed, the gust wind speed can be indicated by a cell_methods of maximum for the time-interval. In an atmospheric model which has a parametrised calculation of gustiness, the gust wind speed may be separately diagnosed from the wind speed."
zenith_angle,degree,,,Zenith angle is the angle to the local vertical; a value of zero is directly overhead.


## Perform checks
A naming convention can be used to test new standard names, whether they comply with it or not:

In [17]:
cf.check_name('zenith_angle', strict=True)

True

In [18]:
cf['x_wind_gust'].canonical_units

'm/s'

In [19]:
try:
    cf.check_units('x_wind_gust', units='m/s')
except h5tbx.erros.StandardizedNameError as e:
    print(e)

In [20]:
try:
    cf.check_units('zenith_angle', units='K')
except h5tbx.errors.StandardNameError as e:
    print(e)
cf.check_units('zenith_angle', units='degree')

Unit of standard name "zenith_angle" not as expected: "K" != "degree"


True

Perform a check on a file

In [21]:
with h5tbx.H5File() as h5:
    h5.create_dataset('zenith angle 1', shape=(3,), units='K', standard_name='zenith_angle')
    h5.create_dataset('zenith angle 2', shape=(3,), units='degree', standard_name='zenith_angle')

In [22]:
cf.check_file(h5.hdf_filename, raise_error=False)

2023-03-01_14:19:15,427 ERROR    [standard_name.py:666]  > ds: /zenith angle 1: Unit of standard name "zenith_angle" not as expected: "K" != "degree"
2023-03-01_14:19:15,427 ERROR    [standard_name.py:666]  > ds: /zenith angle 1: Unit of standard name "zenith_angle" not as expected: "K" != "degree"


Use the command line call

In [23]:
cf.register(overwrite=True)

In [24]:
! h5tbx standard_name -f {h5.hdf_filename} -t {cf.versionname}

 > Checking file "C:\Users\da4323\AppData\Local\h5rdmtoolbox\h5rdmtoolbox\tmp\tmp1210\tmp2.hdf" with standard name table "standard_name_table-v79"


2023-03-01_14:19:30,384 ERROR    [standard_name.py:666]  > ds: /zenith angle 1: Unit of standard name "zenith_angle" not as expected: "K" != "degree"
2023-03-01_14:19:30,384 ERROR    [standard_name.py:666]  > ds: /zenith angle 1: Unit of standard name "zenith_angle" not as expected: "K" != "degree"
