# Working with multi-variable assets

In addition to catalogs of data assets (files) in time-series (single-variable)
format, intake-esm supports catalogs with data assets in time-slice (history)
format and/or files with multiple variables. For intake-esm to properly work
with multi-variable assets,

- the `variable_column` of the catalog must contain iterables (list, tuple, set)
  of values.
- the user must specifiy a dictionary of functions for converting values in
  certain columns into iterables. This is done via the `csv_kwargs` argument.

In the example below, we are are going to use the following catalog to
demonstrate how to work with multi-variable assets:


In [23]:
# Look at the catalog on disk
!cat multi-variable-catalog.csv

experiment,case,component,stream,variable,member_id,path,time_range
CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,"['SHF', 'REGION_MASK', 'ANGLE', 'DXU', 'KMT', 'NO2', 'O2']",5,../../../tests/sample_data/cesm-multi-variables/b.e11.B1850C5CN.f09_g16.005.pop.h.SHF-NO2-O2.050001-050012.nc,050001-050012
CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,"['SHF', 'REGION_MASK', 'ANGLE', 'DXU', 'KMT', 'NO2', 'O2']",5,../../../tests/sample_data/cesm-multi-variables/b.e11.B1850C5CN.f09_g16.005.pop.h.SHF-NO2-O2.050101-050112.nc,050101-050112
CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,"['SHF', 'REGION_MASK', 'ANGLE', 'DXU', 'KMT', 'NO2', 'PO4']",5,../../../tests/sample_data/cesm-multi-variables/b.e11.B1850C5CN.f09_g16.005.pop.h.SHF-NO2-PO4.050001-050012.nc,050001-050012
CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,"['SHF', 'REGION_MASK', 'ANGLE', 'DXU', 'KMT', 'NO2', 'PO4']",5,../../../tests/sample_data/cesm-multi-variables/b.e11.B1850C5CN.f09_g16.005.pop.h.SHF-NO2-PO4.050101-050112.nc,050101-050112
CTRL,b.e

As you can see, the variable column contains a list of varibles, and this list
was serialized as a string:
`"['SHF', 'REGION_MASK', 'ANGLE', 'DXU', 'KMT', 'NO2', 'O2']"`.


## Loading a catalog

To load a catalog with multiple variable files, we must pass additional
information to `open_esm_datastore` via the `csv_kwargs` argument. We are going
to specify a dictionary of functions for converting values in `variable` column
into iterables. We use the `literal_eval` function from the standard `ast`
module:


In [28]:
import ast

import intake

In [29]:
col = intake.open_esm_datastore(
    "multi-variable-collection.json",
    csv_kwargs={"converters": {"variable": ast.literal_eval}},
)
col

Unnamed: 0,unique
experiment,1
case,1
component,1
stream,1
variable,10
member_id,1
path,5
time_range,2


In [30]:
col.df.head()

Unnamed: 0,experiment,case,component,stream,variable,member_id,path,time_range
0,CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,"(SHF, REGION_MASK, ANGLE, DXU, KMT, NO2, O2)",5,../../../tests/sample_data/cesm-multi-variable...,050001-050012
1,CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,"(SHF, REGION_MASK, ANGLE, DXU, KMT, NO2, O2)",5,../../../tests/sample_data/cesm-multi-variable...,050101-050112
2,CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,"(SHF, REGION_MASK, ANGLE, DXU, KMT, NO2, PO4)",5,../../../tests/sample_data/cesm-multi-variable...,050001-050012
3,CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,"(SHF, REGION_MASK, ANGLE, DXU, KMT, NO2, PO4)",5,../../../tests/sample_data/cesm-multi-variable...,050101-050112
4,CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,"(SHF, REGION_MASK, ANGLE, DXU, KMT, TEMP, SiO3)",5,../../../tests/sample_data/cesm-multi-variable...,050001-050012


The in-memory representation of the catalog contains `variable` with tuple of
values. To confirm that intake-esm has registered this catalog with multiple
variable assets, we can the `._multiple_variable_assets` property:


In [32]:
col._multiple_variable_assets

True

## Searching

The search functionatilty works in the same way:


In [37]:
col_subset = col.search(variable=["O2", "SiO3"])
col_subset.df

Unnamed: 0,experiment,case,component,stream,variable,member_id,path,time_range
0,CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,"(SHF, REGION_MASK, ANGLE, DXU, KMT, NO2, O2)",5,../../../tests/sample_data/cesm-multi-variable...,050001-050012
1,CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,"(SHF, REGION_MASK, ANGLE, DXU, KMT, NO2, O2)",5,../../../tests/sample_data/cesm-multi-variable...,050101-050112
2,CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,"(SHF, REGION_MASK, ANGLE, DXU, KMT, TEMP, SiO3)",5,../../../tests/sample_data/cesm-multi-variable...,050001-050012


## Loading assets into xarray datasets

Loading data assets into xarray datasets works in the same way too:


In [38]:
col_subset.to_dataset_dict(cdf_kwargs={})


--> The keys in the returned dictionary of datasets are constructed as follows:
	'component.experiment.stream'


{'ocn.CTRL.pop.h': <xarray.Dataset>
 Dimensions:    (member_id: 1, nlat: 2, nlon: 2, time: 24)
 Coordinates:
   * time       (time) object 0500-02-01 00:00:00 ... 0502-02-01 00:00:00
     TLAT       (nlat, nlon) float64 dask.array<chunksize=(2, 2), meta=np.ndarray>
     TLONG      (nlat, nlon) float64 dask.array<chunksize=(2, 2), meta=np.ndarray>
     ULAT       (nlat, nlon) float64 dask.array<chunksize=(2, 2), meta=np.ndarray>
     ULONG      (nlat, nlon) float64 dask.array<chunksize=(2, 2), meta=np.ndarray>
   * member_id  (member_id) int64 5
 Dimensions without coordinates: nlat, nlon
 Data variables:
     O2         (member_id, time, nlat, nlon) float32 dask.array<chunksize=(1, 12, 2, 2), meta=np.ndarray>
     SiO3       (member_id, time, nlat, nlon) float32 dask.array<chunksize=(1, 24, 2, 2), meta=np.ndarray>
 Attributes:
     cell_methods:              cell_methods = time: mean ==> the variable val...
     NCO:                       4.3.4
     source:                    CCSM POP2