Naming conventions with :mod:`xoa.cf`
Warning
Have a quick loop to the appendix :ref:`appendix.cf.default` before going any further!
This module is an application and extension of a subset of the CF conventions in application to naming conventions. It has two intents:
- Searching :class:`xarray.DataArray` variables or coordinates with another
:class:`xarray.DataArray` or a :class:`xarray.Dataset`,
by scanning name and attributes like
units
andstandard_name
. - Formatting :class:`xarray.DataArray` variables or coordinates with
unique
name
, andstandard_name
,long_name
andunits
attributes, with support of staggered grid location syntax.
The module offers capabilities for the user to extend and specialize default behaviors for user's special datasets.
Note
This module shares common feature with the excellent and long awaited xf_xarray package and started a long time before with the Vacumm package. The most notable differences differences include:
- The current module is also designed for data variables, not only coordinates.
- It search for items not only using standard_names but also specialized names.
- It is not only available as accessors, but also as independant objects that can be configured for each type of dataset or in contexts by the user.
Scanning and formatting actions are based on specifications, and this module natively includes a :ref:`default configuration <appendix.cf.default>` for various oceanographic, sea surface and atmospheric variables and coordinates. A distinction is made between data variables (:ref:`data_vars <appendix.cf.data_vars>`) and coordinates (:ref:`coords <appendix.cf.coords>`), like in :mod:`xarray`.
Getting the current specifications for data variables and coordinates with the :func:`~xoa.cf.get_cf_specs` function:
.. ipython:: python from xoa import cf cfspecs = cf.get_cf_specs() cfspecs["data_vars"] is cfspecs.data_vars cfspecs["data_vars"] is cfspecs.coords cfspecs.data_vars.names[:3] cfspecs.coords.names[:3]
See the appendix :ref:`appendix.cf.default` for the list of available default specifications.
An instance of the :class:`CFSpecs` has other configuration :ref:`sections <appendix.cf.sections>` than :cfgmsec:`data_vars <[data_vars]>`, :cfgmsec:`coords <[coords]>` and :cfgmsec:`dims <[dims]>`: :cfgmsec:`registration <[registration]>`, :cfgmsec:`sglocator <[sglocator]>`, :cfgmsec:`vertical <[vertical]>` and :cfgmsec:`accessors <[accessors]>`.
Here is the example of the :cfdatavar:`sst` data variable:
.. ipython:: python from pprint import pprint pprint(cfspecs.data_vars['sst'])
Description of specification keys:
Key | Type | Description |
name |
str | Specialized name for decoding and encoding which is empty by default |
alt_names |
list(str) | Alternate names for decoding |
standard_name |
list(str) | "standard_name" attributes |
long_name |
list(str) | "long_name" attributes |
units |
list(str) | "units" attributes |
domain |
choice | Domain of application, within {'generic', 'atmos', 'ocean', 'surface'} |
search_order |
str | Search order within properties as combination of letters: [n]name, [s]tandard_name, [u]nits |
cmap |
str | Colormap specification |
inherit |
str | Inherit specifications from another data variable |
select |
eval | Item selection evaluated and applied to the array |
squeeze |
list(str) | List of dimensions that must be squeezed out |
Note
The standard_name
, long_name
and units
specifications are
internally stored as a dict in the attrs
key.
Get the specialized name and the attributes only:
.. ipython:: python cfspecs.data_vars.get_name("sst") cfspecs.data_vars.get_attrs("sst")
Here is the example of the :cfdatavar:`lon` coordinate:
.. ipython:: python from pprint import pprint pprint(cfspecs.coords['lon'])
Description of specification keys:
Key | Type | Description |
name |
str | Specialized name for decoding and encoding which is empty by default |
alt_names |
list(str) | Alternate names for decoding |
standard_name |
list(str) | "standard_name" attributes |
long_name |
list(str) | "long_name" attributes |
units |
list(str) | "units" attributes |
axis |
str | "axis" attribute like X, Y, Z, T or F |
search_order |
str | Search order within properties as combination of letters: [n]name, [s]tandard_name, [u]nits |
inherit |
str | Inherit specifications from another data variable or coordinate |
Note
Like for data variables, the standard_name
, long_name
, units
and axis
specifications are internally stored as a dict in the attrs
key.
Get the specialized name and the attributes only:
.. ipython:: python cfspecs.coords.get_name("lon") cfspecs.coords.get_attrs("lon")
Key | Type | Description |
x |
list(str) | Possible names of X-type dimensions |
y |
list(str) | Possible names of Y-type dimensions |
z |
list(str) | Possible names of Z-type vertical dimensions |
t |
list(str) | Possible names of time dimensions |
f |
list(str) | Possible names of forecast dimensions |
This section of the specs defines the names that the dimensions
of type x
, y
, z
, t
and f
(forecast) can take.
It is filled automatically by default with possible names (name
and alt_names
)
of the coordinates that have their axis
defined.
The user can also add more spcific names.
Theses list of names are used when searching for dimensions that are not coordinates: since they don't have attribute, their type can only be guessed from their name.
Warning
It is recommended to not fill the dims section, but fill the coordinate section instead.
Here is the default content:
.. ipython:: python cfspecs.dims # or cfspecs["dims"]
.. ipython:: python cfspecs["register"] cfspecs["sglocator"] cfspecs["accessors"] cfspecs["vertical"]
Searching within a :class:`~xarray.Dataset` or :class:`~xarray.DataArray`
Let's define a minimal dataset:
.. ipython:: python @suppress import xarray as xr, numpy as np nx = 3 lon = xr.DataArray(np.arange(3, dtype='d'), dims='mylon', attrs={'standard_name': 'longitude'}) temp = xr.DataArray(np.arange(20, 23, dtype='d'), dims='mylon', coords={'mylon': lon}, attrs={'standard_name': 'sea_water_temperature'}) sal = xr.DataArray(np.arange(33, 36, dtype='d'), dims='mylon', coords={'mylon': lon}, attrs={'standard_name': 'sea_water_salinity'}) ds = xr.Dataset({'mytemp': temp, 'mysal': sal})
All these arrays are CF compliant according to their
standard_name
attribute, despite their name is not really explicit.
Check if they match known or explicit CF items:
.. ipython:: python cfspecs.coords.match(lon, "lon") # explicit cfspecs.coords.match(lon, "lat") # explicit cfspecs.coords.match(lon) # any known cfspecs.data_vars.match(temp) # any known cfspecs.data_vars.match(sal) # any known
Search for known CF items:
.. ipython:: python mytemp = cfspecs.search(ds, "temp") mylon = cfspecs.search(mytemp, "lon")
Datasets are searched for data variables ("data_vars") and data variables are searched for coordinates ("coords"). You can also search for coordinates in datasets, for instance like this:
.. ipython:: python cfspecs.coords.search(ds, "lon")
.. seealso:: - CF items: :cfcoord:`lon` :cfcoord:`lat` :cfdatavar:`temp` :cfdatavar:`sal` - Methods: :meth:`xoa.cf.CFCoordSpecs.match` :meth:`xoa.cf.CFVarSpecs.match` :meth:`xoa.cf.CFSpecs.search` :meth:`xoa.cf.CFCoordSpecs.search` :meth:`xoa.cf.CFVarSpecs.search`
The staggered-grid locator :class:`~xoa.cf.SGLocator`
A :class:`~xoa.cf.CFSpecs` instance comes with :class:`~xoa.cf.SGLocator` that is accessible through the :attr:`~xoa.cf.CFSpecs.sglocator` attribute. A :class:`~xoa.cf.SGLocator` helps parsing and formatting staggered grid location from :attr:`name`, :attr:`standard_name` and :attr:`long_name` data array attributes. It is configured at the :cfgmsec:`sglocator <[sglocator]>` section in which you can specify the format of the name (:cfgmopt:`name_format <[sglocator] name_format>` and the allowed location names :cfgmopt:`allowed_locations <[sglocator] allowed_locations>`.
.. ipython:: python sglocator = cfspecs.sglocator sglocator.formats sglocator.format_attr("name", "lon", "rho") sglocator.format_attr("standard_name", "lon", "rho") sglocator.format_attr("long_name", "lon", "rho") sglocator.parse_attr("name", "lon_rho")
Formatting means changing or setting names and attributes. It is possible to format, or even auto-format data variables and coordinates.
During an auto-formatting, each array is matched against CF specs,
and the array is formatted when a matching is successfull.
If the array contains coordinates, the same process is applied on them,
as soon as the format_coords
keyword is True
.
If the name
key is set in the matching item specs, its value can be used
to name the variable or coordinate array, else the generic name is used
like "lon"
.
Explicit formatting:
.. ipython:: python cfspecs.format_coord(lon, "lon") cfspecs.format_data_var(temp, "temp")
Auto-formatting:
.. ipython:: python ds2 = cfspecs.auto_format(ds) ds2.temp ds2.lon
It can be applied to a data array or a full dataset.
.. seealso:: :meth:`xoa.cf.CFSpecs.format_coord` :meth:`xoa.cf.CFSpecs.format_data_var` :meth:`xoa.cf.CFSpecs.auto_format`
By default, formatting renames known arrays to their generic name,
like "temp" in the example above. We speak here of encoding.
If the specialize
keyword is set to True
, arrays are
renamed with their specialized name if set in the specs with the
:cfgmopt:`name <[data_vars] [__many__] name>` option.
We speak here of decoding.
Two shortcut methods exists for these tasks:
- Decoding: :meth:`~xoa.cf.CFSpecs.decode`
- Encoding: :meth:`~xoa.cf.CFSpecs.encode`
Chaining the two methods should lead to the initial dataset or data array. See the last section of this page for an exemple: :ref:`uses.cf.croco`.
.. seealso:: :meth:`xoa.cf.CFSpecs.decode` :meth:`xoa.cf.CFSpecs.encode`
Accessors for :class:`xarray.Dataset` and :class:`xarray.DataArray` can be registered with the :func:`xoa.cf.register_cf_accessors`:
.. ipython:: python import xoa xoa.register_accessors(xcf=True)
The accessor is named here :class:`xcf <xoa.accessors.CFDatasetAccessor>` to not conflict with the :class:`cf` accessor of cf-xarray.
Note
All xoa accessors can be be registered with :func:`xoa.register_accessors`. Note also that all functionalities of the :class:`xcf <xoa.accessors.CFDatasetAccessor>` accessor are also available with the more global :class:`xoa <xoa.accessors.XoaDatasetAccessor>` accessor.
These accessors make it easy to use some of the :class:`xoa.cf.CFSpecs` capabilities. Here are examples of use:
.. ipython:: python :okwarning: temp temp.xcf.get("lon") # access by .get ds.xcf.get("temp") # access by .get ds.xcf.lon # access by attribute ds.xcf.coords.lon # specific search = ds.cf.coords.get("lon") ds.xcf.temp # access by attribute ds.xcf["temp"].name # access by item ds.xcf.data_vars.temp.name # specific search = ds.cf.coords.get("temp") ds.xcf.data_vars.bathy is None # returns None when not found ds.xcf.temp.xcf.lon.name # chaining ds.xcf.temp.xcf.name # CF name, not real name ds.xcf.temp.xcf.attrs # attributes, merged with CF attrs ds.xcf.temp.xcf.standard_name # single attribute ds.mytemp.xcf.auto_format() # or ds.temp.xcf() ds.xcf.auto_format() # or ds.xcf()
As you can see, accessing an accessor attribute or item make an implicit call to :class:`~xoa.cf.DataArrayCFAccessor.get`. The root accessor :attr:`cf` give access to two sub-accessors, :attr:`~xoa.cf.DatasetCFAccessor.data_vars` and :attr:`~xoa.cf.DatasetCFAccessor.coords`, for being able to specialize the searches.
.. seealso:: :class:`xoa.cf.DataArrayCFAccessor` :class:`xoa.cf.DatasetCFAccessor`
The :mod:`xoa.cf` module has internal defaults as shown in appendix :ref:`appendix.cf.default`.
You can extend these defaults with a user file, whose location is printable with the following command, at the line containing "user CF specs file":
.. command-output:: xoa info paths
The current specs can be updated with different methods.
From a well structured dictionary:
.. ipython:: python cfspecs.load_cfg({"data_vars": {"banana": {"standard_name": "banana"}}}) cfspecs.data_vars["banana"]
From a configuration file: instead of the dictionary as an argument to :meth:`~xoa.cf.CFSpecs.load_cfg` method, you can give either a file name or a multi-line string with the same content as the file. Following the previous example:
[data_vars]
[[banana]]
standard_name: banana
If you only want to update a :attr:`~xoa.cf.CFSpecs.category`, you can use such method (here :meth:`~xoa.cf.CFVarSpecs.set_specs`):
.. ipython:: python cfspecs.data_vars.set_specs("banana", name="bonono") cfspecs.data_vars["banana"]["name"]
Alternatively, a :class:`xoa.cf.CFSpecs` instance can be loaded with the :meth:`~xoa.cf.CfSpecs.load_cfg` method, as explained below.
To create new specs, you must instantiate the :class:`xoa.cf.CFSpecs` class, with an input type as those presented above:
- A config file name.
- A Multi-line string in the format of a config file.
- A dictionary.
- A :class:`configobj.ConfigObj` instance.
- Another :class:`~xoa.cf.CFSpecs` instance.
- A list of them, with the having priority over the lasts.
The initialization also accepts two options:
default
: wether to load or not the default internal config.user
: wether to load or not the user config file.
An config created from default and user configs:
.. ipython:: python banana_specs = {"data_vars": {"banana": {"attrs": {"standard_name": "banana"}}}} mycfspecs = cf.CFSpecs(banana_specs) mycfspecs["data_vars"]["sst"]["attrs"]["standard_name"] mycfspecs["data_vars"]["banana"]["attrs"]["standard_name"]
An config created from scratch:
.. ipython:: python mycfspecs = cf.CFSpecs(banana_specs, default=False, user=False) mycfspecs.pprint(depth=2)
An config created from two other configs:
.. ipython:: python cfspecs_banana = cf.CFSpecs(banana_specs, default=False, user=False) apple_specs = {"data_vars": {"apple": {"attrs": {"long_name": "Big apple"}}}} cfspecs_apple = cf.CFSpecs(apple_specs, default=False, user=False) cfspecs_fruits = cf.CFSpecs((cfspecs_apple, cfspecs_banana), default=False, user=False) cfspecs_fruits.data_vars.names
As shown before, the currents CF specs are accessible with the :func:`xoa.cf.get_cf_specs` function. You can replace them with the :class:`xoa.cf.set_cf_specs` class, to be used as a fonction.
.. ipython:: python cfspecs_old = cf.get_cf_specs() cf.set_cf_specs(cfspecs_banana) cf.get_cf_specs() is cfspecs_banana cf.set_cf_specs(cfspecs_old) cf.get_cf_specs() is cfspecs_old
In case of a temporary change, you can used :class:`~xoa.cf.set_cf_specs` in a context statement:
.. ipython:: python with cf.set_cf_specs(cfspecs_banana) as myspecs: print('inside', cf.get_cf_specs() is cfspecs_banana) print('inside', myspecs is cf.get_cf_specs()) print('outside', cf.get_cf_specs() is cfspecs_old)
For convience, you can set specs directly with a dictionary:
.. ipython:: python with cf.set_cf_specs({"data_vars": {"apple": {}}}) as myspecs: print("apple" in cf.get_cf_specs()) print("apple" in cf.get_cf_specs())
Application with an accessor usage:
.. ipython:: python data = xr.DataArray([5], attrs={'standard_name': 'sea_surface_banana'}) ds = xr.Dataset({'toto': data}) mycfspecs = cf.CFSpecs({"data_vars": {"ssb": {"standard_name": "sea_surface_banana"}}}) with cf.set_cf_specs(mycfspecs): print(ds.xcf.get("ssb"))
It is possible to register specialized :class:`~xoa.cf.CFSpecs` instances with :func:`~xoa.cf.register_cf_specs` for future access.
Here we register new specs with a internal registration name "mycroco"
:
.. ipython:: python content = { "register": { "name": "mycroco" }, "data_vars": { "temp": { "name": "supertemp" } }, "coords": { "lon": { "name": "mylon" } } } mycfspecs = cf.CFSpecs(content) cf.register_cf_specs(mycfspecs)
We can now access with it the :func:`~xoa.cf.get_cf_specs` function:
.. ipython:: python these_cfspecs = cf.get_cf_specs('mycroco') these_cfspecs is mycfspecs
If you set the :attr:`cfspecs` attribute or encoding of a dataset to the name of a registered :class:`~xoa.cf.CFSpecs` instance, you can get it automatically with the :func:`~xoa.cf.get_cf_specs` or :func:`~xoa.cf.infer_cf_specs` functions.
Let's register another :class:`~xoa.cf.CFSpecs` instance:
.. ipython:: python content = { "register": { "name": "myhycom" }, "data_vars": { "sal": { "name": "supersal" } }, } mycfspecs2 = cf.CFSpecs(content) cf.register_cf_specs(mycfspecs2)
Let's create a dataset:
.. ipython:: python ds = xr.Dataset({'supertemp': ("mylon", [0, 2])}, coords={"mylon": [10, 20]})
Now find the best registered specs instance which has either the name
myhycom
or mycroco
:
.. ipython:: python cf_specs_auto = cf.infer_cf_specs(ds) print(cf_specs_auto.name) ds_decoded = cf_specs_auto.decode(ds) ds_decoded cf_specs_auto.encode(ds)
It is mycroco
as expected.
All xoa routines that needs to access specific coordinates or variables try to infer the approriate specs, which default to the current specs. When the :attr:`cfspecs` attribute or encoding is set, :meth:`~xoa.cf.get_cf_specs` uses it to search within registered specs.
.. ipython:: python ds.encoding.update(cfspecs="mycroco") cfspecs = cf.get_cf_specs(ds) cfspecs.encode(ds)
The :attr:`cfspecs` encoding is set at the dataset level, not at the data array level:
.. ipython:: python cf.get_cf_specs(ds.supertemp) is cfspecs
To propagate to all the data arrays, use :func:`~xoa.cf.assign_cf_specs`:
.. ipython:: python cf.assign_cf_specs(ds, "mycroco") ds.mylon.encoding cf.get_cf_specs(ds.supertemp) is cfspecs
Here are the specs for Croco:
.. literalinclude:: ../xoa/_samples/croco.cfg :language: ini
Register them:
.. ipython:: python @suppress import xoa, xoa.cf xoa.cf.register_cf_specs(xoa.get_data_sample("croco.cfg")) xoa.cf.is_registered_cf_specs("croco")
Register the :class:`xoa <~xoa.accessors.XoaDatasetAccessor>` accessor:
.. ipython:: python xoa.register_accessors()
Now let's open a Croco sample as a xarray dataset:
.. ipython:: python ds = xoa.open_data_sample("croco.south-africa.meridional.nc") ds
Let's decode it:
.. ipython:: python dsd = ds.xoa.decode() dsd
Let's re-encode it!
.. ipython:: python dse = dsd.xoa.encode() dse
Et voilà !
At Shom, the Hycom model outputs are splitted into separate files, one for each variable.
Conflicts may occur when using variables that are not at the same staggered grid location,
since all variables are stored with dimensions (y, x)
and lon
and lat
coordinates, all with the same name.
To avoid this problem, the horizontal dimensions and coordinates of
staggered variables are renamed to indicate their location.
Here are the specs to take care of the staggered grid indicators in the names:
.. literalinclude:: ../xoa/_samples/hycom.cfg :language: ini
Note the add_coords_loc
sections.
Location is not added to the u
and v
variables
but to their dimensions and coordinates.
Register them:
.. ipython:: python @suppress import xoa, xoa.cf xoa.cf.register_cf_specs(xoa.get_data_sample("hycom.cfg")) xoa.cf.is_registered_cf_specs("hycom")
Overview of the U dataset:
.. ipython:: python dsu = xoa.open_data_sample("hycom.gdp.u.nc") dsu
Decoding:
.. ipython:: python dsu = dsu.xoa.decode() dsu
Now we can read, decode and merge all files without any conflict:
.. ipython:: python dsv = xoa.open_data_sample("hycom.gdp.u.nc").xoa.decode() dsh = xoa.open_data_sample("hycom.gdp.h.nc").xoa.decode() ds = xr.merge([dsu, dsv, dsh]) ds