Skip to content

CaSPAr file naming convention and variables

Julie edited this page Oct 2, 2023 · 9 revisions

Please find the French version of this page here.
Veuillez trouver la version française de cette page ici.

File naming convention

Each file downloaded has a common file naming convention for convenience.

Deterministic products

For deterministic products the naming convention is YYYYMMDDHH.nc (e.g. for RDPS one file is 2017100200.nc) where YYYYMMDDHH specifies the issue date of the forecast. The time axis in the NetCDF file are the forecast horizons (in hours).

Ensemble products

The naming convention for ensemble products is the same except all ensembles are in independent files. The convention is YYYYMMDDHH_EEE.nc (e.g. one member of CaLDAS is named 2017100200_002.nc) where the 002 indicates that it is the second ensemble member. The ensemble number 000 is a special member in that it is the reference ensemble member. Again the YYYYMMDDHH indicates the issue time of the forecast. If you request an ensemble product in CaSPAR, you will receive all members of the ensemble rather than just being able to request single members. (If you really think about it why would you want an ensemble if you didn't want all of the members?)

Product variable data

General information

Time step availability

All dates and times are in UTC.

Each CaSPAr data product can have several variables associated with it. Each of these variables are used in the ECCC modelling process and are sometimes only available during specific forecast horizons. Most variables are available at all forecast horizons. Some variables are used to initialize the forecast and are only available at the very first time step, usually T=0. In other cases a variable may only be available at specific forecast horizons such as every 4 or every 6 hours.

A netCDF file will have data stored in a (lat,lon,time) array where time is the forecast horizon. Unfortunately, the axis variables in the netCDF file must have consistent dimensions. If a variable is requested that is not available at all horizons a missing value will automatically be inserted. The details are somewhat irrelevant for our purposes here, but the netCDF API efficiently replaces and compresses the missing values so the difference in file size is negligible.

Variable naming convention

A simple variable naming convention is applied, i.e. PPPP_T_VV_LLLLL where:

  • PPPP - product name (in case you extract it) but there's also a global netCDF attribute called product which you should use since the length of PPPP will be variable and may include underscores.
  • T - type of product. This will be a P for prediction or forecast and A for analysis.
  • VV - an internal variable name used by ECCC that CaSPAr has kept for consistency. In the netCDF file the long_name will also be available for further description of the variable.
  • LLLLL - product level indicator.

The product level indicator LLLLL can actually have several meanings. For atmospheric variables this is the percentage of the atmosphere based on pressure elevation. Divide the number by 100 to get the percentage value. 10000 is the bottom of the atmosphere, 0 is the top of the atmosphere, 09950 would be 99.5% etc. For CaSPAr, as the name implies, the focus is surface predictions so we only archive atmospheric variables near the surface. The level 0 can also indicate some surface variables. In CaSPAr this has been conveniently replaced with SFC. Other variables have integer numbers for levels. We again replace this for users convenience. For soil information this is usually 10cm meaning a soil layer of a depth from 0-10cm or Profile which is 0-[2]m or the full depth of the modelled soil profile. It is important to note that this overlaps with the 0-10cm data. In other cases the integers represent other types of land cover. For example, in RDPS 1=Vegetated Land, 2=Glaciers, 3=Open Water. Again the integers have been replaced in the variable names.

Variable levels

Variable names are following the convention <Product>_<Type:A=Analysis,P=Prediction>_<ECCC name>_<Level/Tile/Category>. Variables with level 10000 are at surface level. The height [m] of variables with level 0XXXX needs to be inferred using the corresponding fields of geopotential height (GZ_0XXXX - GZ_10000).

Wind speed and direction

In a weather prediction model wind is treated as a two component vector. People are used to hearing reported wind by direction on a compass rose (0-360 degrees) and by its speed. Output from the NWP model is based on the u-component vector which runs parallel to the x-axis and the v-component vector which runs parallel to the y-axis. In both cases the vectors and angles are measured from meridians of longitude. Just like in map projections, a climate model grid must be 'wrapped' around a geoid Earth which leads to distortion. In order to minimize the distortion it is common for the lat-lon grid of a climate model to be rotated and warped to minimize the distortion over the area of interest. The WRF model uses this approach. In CaSPAr when you request the UU and VV components these are the raw output from the climate model which follows the CaSPAr philosophy of not modifying data provided to users. This allows the user to have the ultimate control over the types of transformation that are applied. However, in the case of the u and v wind components CaSPAr also provides corrected values based on an unrotated grid. These are the variables UUC and VVC, which should be selected unless you as a user wishes to do the correction. CaSPAr also converted the u- and v-components into wind speed (UVC) and wind direction (WDC).

Specific conventions

Some variables are specified to be at surface level in the variable names (i.e., showing SFC; ECCC IP1=12000) even though the variables are indeed at higher levels. Those variables are:

  • variables related with wind (UU, VV, UV, WD, UUC, VVC, UVC, WDC) are at 10m rather than SFC
  • variable temperature (TT) is at 1.5m rather than SFC
  • variable dew point (TD) is at 1.5m rather than SFC