# Input and target data

The SPR dataset consists of multi-sensor satellite observations and retrieval ancillary data that are matched with corresponding precipitation estimates from ground-based precipitation radars. In total, four sources of retrieval input data are available in each dataset: PMW obsrvations from GMI or AMTS (contained in  ``gmi`` or ``atms`` folders), GOES-16 ABI geostationary observations (contained in the ``geo`` folders) and geostationary IR-window channel observations ``geo_ir``.

The file structure is different for the data spatial and tabular format. While the data in tabular format is combined into a single file, the data in spatial format is split up into a separate file for each scene and indexed with the median scan time from that scene.

## GMI and ATMS observations

The GMI and ATMS observations contain the variables ``observations`` and ``earth_incidence_angle`` containing the PMW observations and the corresponding earth-incidence angles.

### Spatial format

In [1]:
import xarray as xr
from satrain.data import list_files
from satrain.config import get_data_path
files = list_files("spr/gmi/training/gridded/spatial/gmi")
satrain_path = get_data_path()
xr.load_dataset(satrain_path / files[0])

### Tabular format

In [2]:
import xarray as xr
from satrain.data import list_files
from satrain.config import get_data_path
files = list_files("spr/gmi/training/gridded/tabular/gmi")
satrain_path = get_data_path()
xr.load_dataset(satrain_path / files[0])

## Ancillary data

The ancillary data contains environmental data that is designed to complement the satellite observations and improve the retrieval accuracy.

Currently, the following variables are available:

 1. Wet-bulb temperature
 2. Lapse rate
 3. Total-colum water vapor
 4. Moisture convergence
 5. Leaf area index
 6. Snow depth
 7. Orographic wind
 8. 10-meter wind
 9. Land fraction
 10. Ice fraction
 11. PMW L1C quality flag
 12. Sunglint angle
 13. GPROF Airlifting index


### Spatial format

In [3]:
import xarray as xr
from satrain.data import list_files
from satrain.config import get_data_path
files = list_files("spr/gmi/training/gridded/spatial/ancillary")
satrain_path = get_data_path()
xr.load_dataset(satrain_path / files[0])

### Tabular format

In [4]:
import xarray as xr
from satrain.data import list_files
from satrain.config import get_data_path
files = list_files("spr/gmi/training/gridded/tabular/ancillary")
satrain_path = get_data_path()
xr.load_dataset(satrain_path / files[0])

## Geostationary observations 

The geostationary observations contain multiple observations from the 16 channels of the GOES ABI. The observations are matched to the four closest full 15 minutes time steps around the median overpass time of each collocation scene. 

### Spatial format

In [5]:
import xarray as xr
from satrain.data import list_files
from satrain.config import get_data_path
files = list_files("spr/gmi/training/gridded/spatial/geo")
satrain_path = get_data_path()
xr.load_dataset(satrain_path / files[0])

### Tabular format

In [6]:
import xarray as xr
from satrain.data import list_files
from satrain.config import get_data_path
files = list_files("spr/gmi/training/on_swath/tabular/ancillary")
satrain_path = get_data_path()
xr.load_dataset(satrain_path / files[0])

## Geostationary IR-window observations 

The IR-window channel observations contain only observations from IR-window channels around $11 \mu m$. However, the files contain observations from 16 full 30-minute time steps centered on the median time of each overpass.

### Spatial format

In [7]:
import xarray as xr
from satrain.data import list_files
from satrain.config import get_data_path
files = list_files("spr/gmi/training/gridded/spatial/geo_ir")
satrain_path = get_data_path()
xr.load_dataset(satrain_path / files[0])

### Tabular format

In [8]:
import xarray as xr
from satrain.data import list_files
from satrain.config import get_data_path
files = list_files("spr/gmi/training/on_swath/tabular/geo_ir")
satrain_path = get_data_path()
xr.load_dataset(satrain_path / files[0])

## Target data

Finally, the target files contain the precipitation estimates as well as the radar-quality index (RQI), the gauge-correction factor (GCF), and the precipitation type fraction.
### Spatial format

In [9]:
import xarray as xr
from satrain.data import list_files
from satrain.config import get_data_path
files = list_files("spr/gmi/training/on_swath/spatial/target")
satrain_path = get_data_path()
xr.load_dataset(satrain_path / files[0])

### Tabular format

In [10]:
import xarray as xr
from satrain.data import list_files
from satrain.config import get_data_path
files = list_files("spr/gmi/training/on_swath/tabular/target")
satrain_path = get_data_path()
xr.load_dataset(satrain_path / files[0])