# Dataset structure

SPEED consists of collocations of GPM PMW sensors with *reference preciptiation estimates* from multiple *reference data sources*. All collocations are provided on two grids: The native grid of the respective GPM PMW sensor and regridded to a regular lat/lon grid with a resolution of 0.036$^\circ$. These two types of collocations will be referred to as *native* and *gridded*.
 

## Organization

For a given source of reference data, here named ``reference``, the data is organized into folders as shown below.

````
<reference_data>
    ├── <sensor_1>
    │   ├── native
    │   │   ├── <reference_data>_<sensor_2>_YYYYMMDDHHMMSS.nc
    │   │   └── ...
    │   └── gridded
    │       ├── <reference_data>_<sensor_2>_YYYYMMDDHHMMSS.nc
    │       └── ...
    └── <sensor_2>
        ├── native
        │   ├── <reference_data>_<sensor2>_YYYYMMDDHHMMSS.nc
        │   └── ...
        └── gridded
            ├── <reference_data>_<sensor_2>_YYYYMMDDHHMMSS.nc
            └── ...
````

At the highest-level, the data is separated by reference data source. The collocations for every reference data souce are split up into a ``native`` sub-folder containing the collocations on the native grids and a ``gridded`` folder containing the gridded collocations. 
Within the ``native`` folder, collcation files are organized into different folders with respect to the sensor they are derived from (``sensor1`` and ``sensor2`` in the example).

# File content

The file structure of the native and regridded data is slightly different but they share the same variable names. Native-grid files contain both *input* and *reference* data in separate groups, whereas for the regridded data the reference data is provided as a separate file.

## Variable names

### Input data

The input data files all contain the following variables:

#### Observations

| Variable name               | Explanation                                             | Unit         |
|-----------------------------|---------------------------------------------------------|--------------|
| ``tbs_mw``                  | Microwave brightness temperatures                      | K            | 
| ``tbs_ir``                  | 11 $\mu m$ brightness temperatures                      | K            |

#### Ancillary data

| Variable name               | Explanation                                             | Unit         |
|-----------------------------|---------------------------------------------------------|--------------|
| ``earth_incidence_angle``   | Earth incidence angle                                   | Degree       |
| ``wet_bulb_temperature``    | Wet-bulb temperature                                    | K            |
| ``lapse_rate``              | Lapse rate                                              | K / km       |
| ``total_column_water_vapor``| Total-column water vapor                                | kg / m$^2$   |
| ``surface_temperature``     | Surface temperature                                     | K            |
| ``two_meter_temperature``   | Two-meter temperature                                   | K            |
| ``convective_precipitation``| ERA5 convective precipitation                           | mm / h       |
|  ``moisture_convergence``   | ERA5 moisture convergence                               | kg / m$^2$   |
| ``leaf_area_index``         | Leaf-area index                                         | m$^2$ / m$^2$|
| ``snow_depth``              | Snow depth                                              | mm           |
| ``orographic_wind``         | ERA5 orographic wind                                    | m / s        |
| ``10m_wind``                | ERA5 10-m wind                                          | m / s        |
| ``mountain_type``           | Mountain type                                           |  ---         |
| ``land_fraction``           | Land fraction                                           | %            |
|  ``ice_fraction``           | Ice fraction                                            | %            |
| ``l1c_quality_flag``        | GPM L1C quality flag                                    |    ---       |
| ``sunglint_angle``          | Sunglint angle                                          | Degree       |
| ``surface_type``            | CSU surface type                                        | ---          |
| ``airlifting_index``        | Airlifting index                                        | ---          |

### Geolocation and time

| Variable name               | Explanation                                             | Unit         |
|-----------------------------|---------------------------------------------------------|--------------|
| ``latitude``*               | Latitude                                                | Degree N     |
| ``longitude``*              | Longitude                                               | Degree E     |
| ``scan_time``*              | Time stamp marking the start of the scan line           | ---          |

## Native grids


## Input data

## Reference data

| Variable name               | Explanation                                             | Unit         |
|-----------------------------|---------------------------------------------------------|--------------|
| ``surface_precip``          | Ground-truth surface precipitation                      | mm/h         | 
| ``surface_precip_cmb``      | Surface precip from GPM CMB                             | mm/h         |
| ``surface_precip_mirs``     | Surface precip from MIRS                                | mm/h         |

> **Note**: Not all reference data variables are present in all files. The ``surface_precip_cmb`` and ``surface_precip_mirs``
 fields, for example, are only present in reference data derived from GPM CMB. The ``precip_type`` and ``radar_quality_index``
 field, on the other hand, are present only in files derived from MRMS.