## Building a CESM Collection Catalog


Building a CESM collection catalog follows the same steps that are used when building a CMIP collection catalog:

- Define a collection catalog in a YAML or nested dictionary
- Pass this collection definition to ``intake_open_esm_metadatastore()`` class.
- Use the built collection catalog

For demonstration purposes, we are going to use data from CESM-LE project.

In [1]:
import intake



In [2]:
cdefinition = {'name': 'cesm1-le_test',
                'collection_type': 'cesm',
                'include_cache_dir': True,
                'data_sources': {'CTRL': {'locations': [{'name': 'SAMPLE-DATA',
                    'loc_type': 'posix',
                    'direct_access': True,
                    'urlpath': '../../../tests/sample_data/cesm-le'}],
                'component_attrs': {'ocn': {'grid': 'POP_gx1v6'}},
                'case_members': [{'case': 'b.e11.B1850C5CN.f09_g16.005',
                    'sequence_order': 0,
                    'ensemble': 0,
                    'has_ocean_bgc': True,
                    'year_offset': 1448}]},
                '20C': {'locations': [{'name': 'SAMPLE-DATA',
                    'loc_type': 'posix',
                    'direct_access': True,
                    'urlpath': '../../../tests/sample_data/cesm-le'}],
                'component_attrs': {'ocn': {'grid': 'POP_gx1v6'}},
                'case_members': [{'case': 'b.e11.B20TRC5CNBDRD.f09_g16.001',
                    'sequence_order': 0,
                    'ensemble': 1,
                    'has_ocean_bgc': True}]}}}

### Building the Collection

The build method loops over all the experiments and each of the ensemble members therein.
It attempts to parse file name; it fails in some instances and skips these files with a warning.
If HPSS access is not available (such as from compute nodes on Cheyenne),
this resource is omitted from the catalog.

In [3]:
col = intake.open_esm_metadatastore(collection_input_definition=cdefinition,
                                       overwrite_existing=True)

Working on experiment: CTRL
Getting file listing : SAMPLE-DATA:posix:../../../tests/sample_data/cesm-le
Building file database : SAMPLE-DATA:posix:../../../tests/sample_data/cesm-le
Getting file listing : CACHE:posix:/var/folders/z7/sdhzbbr96bv2wjrsb92qsm3dwz5p3x/T//.intake_esm/data_cache
Building file database : CACHE:posix:/var/folders/z7/sdhzbbr96bv2wjrsb92qsm3dwz5p3x/T//.intake_esm/data_cache
  result = method(y)
Working on experiment: 20C
Getting file listing : SAMPLE-DATA:posix:../../../tests/sample_data/cesm-le
Building file database : SAMPLE-DATA:posix:../../../tests/sample_data/cesm-le
Getting file listing : CACHE:posix:/var/folders/z7/sdhzbbr96bv2wjrsb92qsm3dwz5p3x/T//.intake_esm/data_cache
Building file database : CACHE:posix:/var/folders/z7/sdhzbbr96bv2wjrsb92qsm3dwz5p3x/T//.intake_esm/data_cache
None


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 38 entries, 0 to 37
Data columns (total 18 columns):
resource            38 non-null object
resource_type       38 non-null object
direct_access       38 non-null object
experiment          38 non-null object
case                38 non-null object
component           38 non-null object
stream              38 non-null object
variable            38 non-null object
date_range          38 non-null object
ensemble            38 non-null object
file_fullpath       38 non-null object
file_basename       38 non-null object
file_dirname        38 non-null object
ctrl_branch_year    0 non-null object
year_offset         36 non-null object
sequence_order      38 non-null object
has_ocean_bgc       38 non-null object
grid                38 non-null object
dtypes: object(18)
memory usage: 5.4+ KB
Persisting cesm1-le_test at : /Users/abanihi/.intake_esm/collections/cesm/cesm1-le_test.cesm.csv


### Using the Built Collection

In [4]:
col.df.head()

Unnamed: 0,resource,resource_type,direct_access,experiment,case,component,stream,variable,date_range,ensemble,file_fullpath,file_basename,file_dirname,ctrl_branch_year,year_offset,sequence_order,has_ocean_bgc,grid
0,SAMPLE-DATA:posix:../../../tests/sample_data/c...,posix,True,20C,b.e11.B20TRC5CNBDRD.f09_g16.001,ocn,pop.h,SHF,185001-200512,1,../../../tests/sample_data/cesm-le/b.e11.B20TR...,b.e11.B20TRC5CNBDRD.f09_g16.001.pop.h.SHF.1850...,../../../tests/sample_data/cesm-le/,,,0,True,POP_gx1v6
1,SAMPLE-DATA:posix:../../../tests/sample_data/c...,posix,True,20C,b.e11.B20TRC5CNBDRD.f09_g16.001,ocn,pop.h,STF_O2,185001-200512,1,../../../tests/sample_data/cesm-le/b.e11.B20TR...,b.e11.B20TRC5CNBDRD.f09_g16.001.pop.h.STF_O2.1...,../../../tests/sample_data/cesm-le/,,,0,True,POP_gx1v6
2,SAMPLE-DATA:posix:../../../tests/sample_data/c...,posix,True,CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,STF_O2,190001-199912,0,../../../tests/sample_data/cesm-le/b.e11.B1850...,b.e11.B1850C5CN.f09_g16.005.pop.h.STF_O2.19000...,../../../tests/sample_data/cesm-le/,,1448.0,0,True,POP_gx1v6
3,SAMPLE-DATA:posix:../../../tests/sample_data/c...,posix,True,CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,SHF,090001-099912,0,../../../tests/sample_data/cesm-le/b.e11.B1850...,b.e11.B1850C5CN.f09_g16.005.pop.h.SHF.090001-0...,../../../tests/sample_data/cesm-le/,,1448.0,0,True,POP_gx1v6
4,SAMPLE-DATA:posix:../../../tests/sample_data/c...,posix,True,CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,STF_O2,090001-099912,0,../../../tests/sample_data/cesm-le/b.e11.B1850...,b.e11.B1850C5CN.f09_g16.005.pop.h.STF_O2.09000...,../../../tests/sample_data/cesm-le/,,1448.0,0,True,POP_gx1v6


Now you can query the collection catalog and load data sets of interests into xarray objects.


In [5]:
cat = col.search(
                variable=['STF_O2', 'SHF'],
                ensemble=[1, 3, 9],
                experiment=['20C', 'RCP85'],
                direct_access=True,
            )

In [6]:
print(cat.yaml(True))

plugins:
  source:
  - module: intake_esm.cesm
sources:
  cesm1-le_test_7c700660-ffcc-4ce1-a24f-a5bf45924d32:
    args:
      collection_name: cesm1-le_test
      query:
        case: null
        component: null
        ctrl_branch_year: null
        date_range: null
        direct_access: true
        ensemble:
        - 1
        - 3
        - 9
        experiment:
        - 20C
        - RCP85
        file_basename: null
        file_dirname: null
        file_fullpath: null
        grid: null
        has_ocean_bgc: null
        resource: null
        resource_type: null
        sequence_order: null
        stream: null
        variable:
        - STF_O2
        - SHF
        year_offset: null
    description: Catalog entry from cesm1-le_test collection
    driver: cesm
    metadata:
      cache: {}
      catalog_dir: ''



**NOTE**:
    

``to_xarray()`` method returns a dictionary of ``xarray`` datasets. The keys in this dictionary are constructed as follows:

- For CESM data, ``key=<stream>.<component>``

In [7]:
ds = cat.to_xarray(decode_times=False, chunks={'time': 100})
ds

HBox(children=(IntProgress(value=0, description='dataset', max=1, style=ProgressStyle(description_width='initi…

HBox(children=(IntProgress(value=0, description='member', max=1, style=ProgressStyle(description_width='initia…




{'pop.h.ocn': <xarray.Dataset>
 Dimensions:               (d2: 2, lat_aux_grid: 395, moc_comp: 3, moc_z: 61, nchar: 256, nlat: 2, nlon: 2, time: 1872, transport_comp: 5, transport_reg: 2, z_t: 60, z_t_150m: 15, z_w: 60, z_w_bot: 60, z_w_top: 60)
 Coordinates:
     ANGLE                 (nlat, nlon) float64 dask.array<shape=(2, 2), chunksize=(2, 2)>
     ANGLET                (nlat, nlon) float64 dask.array<shape=(2, 2), chunksize=(2, 2)>
     DXT                   (nlat, nlon) float64 dask.array<shape=(2, 2), chunksize=(2, 2)>
     DXU                   (nlat, nlon) float64 dask.array<shape=(2, 2), chunksize=(2, 2)>
     DYT                   (nlat, nlon) float64 dask.array<shape=(2, 2), chunksize=(2, 2)>
     DYU                   (nlat, nlon) float64 dask.array<shape=(2, 2), chunksize=(2, 2)>
     HT                    (nlat, nlon) float64 dask.array<shape=(2, 2), chunksize=(2, 2)>
     HTE                   (nlat, nlon) float64 dask.array<shape=(2, 2), chunksize=(2, 2)>
     HTN    

In [8]:
%load_ext watermark

In [9]:
%watermark --iversion -g  -m -v -u -d

intake 0.4.1
last updated: 2019-04-25 

CPython 3.6.7
IPython 7.1.1

compiler   : GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)
system     : Darwin
release    : 17.7.0
machine    : x86_64
processor  : i386
CPU cores  : 8
interpreter: 64bit
Git hash   : 149e935866c2744d86be503a2289bd48562c1d20
