# Use ecgtools to create an intake-esm catalog for the timeseries files

Prior to running this notebook, I requested a single node with 36 cores which enables the `njobs=36` specification (using 36 cores to parallelize the catalog building process)

In this case, we need the `Builder` object, and the `parse_cesm_timeseries` parser

In [3]:
from ecgtools import Builder
from ecgtools.parsers.cesm import parse_cesm_timeseries

## Setup the catalog

Setup the builder - this case, we looked at the directory structure ahead of time and identified the `depth=4`

In [46]:
b = Builder(
    "/glade/campaign/cesm/development/bgcwg/projects/hi-res_JRA/cases/",
    depth=4,
    parsing_func=parse_cesm_timeseries,
    exclude_patterns=["*/hist/*", "*/rest/*"],
    njobs=36,
)

## Build the catalog

In [47]:
b.build()

[Parallel(n_jobs=36)]: Using backend LokyBackend with 36 concurrent workers.
[Parallel(n_jobs=36)]: Done   3 out of   8 | elapsed:    0.4s remaining:    0.6s
[Parallel(n_jobs=36)]: Done   5 out of   8 | elapsed:    0.5s remaining:    0.3s
[Parallel(n_jobs=36)]: Done   8 out of   8 | elapsed:    1.9s finished
[Parallel(n_jobs=36)]: Using backend LokyBackend with 36 concurrent workers.
[Parallel(n_jobs=36)]: Done  90 tasks      | elapsed:    1.2s
[Parallel(n_jobs=36)]: Done 216 tasks      | elapsed:    1.9s
[Parallel(n_jobs=36)]: Done 378 tasks      | elapsed:    2.8s
[Parallel(n_jobs=36)]: Done 576 tasks      | elapsed:    3.8s
[Parallel(n_jobs=36)]: Done 810 tasks      | elapsed:    5.1s
[Parallel(n_jobs=36)]: Done 1080 tasks      | elapsed:    6.5s
[Parallel(n_jobs=36)]: Done 1386 tasks      | elapsed:    8.2s
[Parallel(n_jobs=36)]: Done 1728 tasks      | elapsed:   10.0s
[Parallel(n_jobs=36)]: Done 2106 tasks      | elapsed:   11.5s
[Parallel(n_jobs=36)]: Done 2520 tasks      | elaps

Builder(root_path=PosixPath('/glade/campaign/cesm/development/bgcwg/projects/hi-res_JRA/cases'), extension='.nc', depth=4, exclude_patterns=['*/hist/*', '*/rest/*'], parsing_func=<function parse_cesm_timeseries at 0x2b8ab803db00>, njobs=36)

## Take a quick look at the catalog

In [67]:
b.df

Unnamed: 0,component,stream,case,member_id,variable,start_time,end_time,time_range,long_name,units,vertical_levels,frequency,path
0,ocn,pop.h,g.e22.G1850ECO_JRA_HR.TL319_t13.003,3.0,cocco_C_lim_Cweight_avg_100m,0003-01,0003-12,000301-000312,"Coccolithophores C Limitation, carbon biomass ...",1,1.0,month_1,/glade/campaign/cesm/development/bgcwg/project...
1,ocn,pop.h,g.e22.G1850ECO_JRA_HR.TL319_t13.003,3.0,sp_N_lim_Cweight_avg_100m,0002-01,0002-12,000201-000212,"Small Phyto N Limitation, carbon biomass weigh...",1,1.0,month_1,/glade/campaign/cesm/development/bgcwg/project...
2,ocn,pop.h,g.e22.G1850ECO_JRA_HR.TL319_t13.003,3.0,DOP_diaz_uptake,0002-01,0002-12,000201-000212,Diazotroph DOP Uptake,mmol/m^3/s,1.0,month_1,/glade/campaign/cesm/development/bgcwg/project...
3,ocn,pop.h,g.e22.G1850ECO_JRA_HR.TL319_t13.003,3.0,POC_PROD,0004-01,0004-12,000401-000412,POC Production,mmol/m^3/s,1.0,month_1,/glade/campaign/cesm/development/bgcwg/project...
4,ocn,pop.h,g.e22.G1850ECO_JRA_HR.TL319_t13.003,3.0,POC_FLUX_100m,0003-01,0003-12,000301-000312,POC Flux at 100m,mmol/m^3 cm/s,1.0,month_1,/glade/campaign/cesm/development/bgcwg/project...
...,...,...,...,...,...,...,...,...,...,...,...,...,...
17981,ice,cice.h1,g.e22.G1850ECO_JRA_HR.TL319_t13.004,4.0,dvidtd_d,0029-01-01,0029-12-31,00290101-00291231,volume tendency dynamics,cm/day,1.0,day_1,/glade/campaign/cesm/development/bgcwg/project...
17982,ice,cice.h1,g.e22.G1850ECO_JRA_HR.TL319_t13.004,4.0,hi_d,0007-01-01,0007-12-31,00070101-00071231,grid cell mean ice thickness,m,1.0,day_1,/glade/campaign/cesm/development/bgcwg/project...
17983,ice,cice.h1,g.e22.G1850ECO_JRA_HR.TL319_t13.004,4.0,hi_d,0034-01-01,0034-12-31,00340101-00341231,grid cell mean ice thickness,m,1.0,day_1,/glade/campaign/cesm/development/bgcwg/project...
17984,ice,cice.h1,g.e22.G1850ECO_JRA_HR.TL319_t13.004,4.0,dvidtt_d,0034-01-01,0034-12-31,00340101-00341231,volume tendency thermo,cm/day,1.0,day_1,/glade/campaign/cesm/development/bgcwg/project...


There are only a few invalid assets - we can ignore these!

In [69]:
b.invalid_assets.INVALID_ASSET.values[0]

PosixPath('/glade/campaign/cesm/development/bgcwg/projects/hi-res_JRA/cases/g.e22.G1850ECO_JRA_HR.TL319_t13.003/output/ocn/proc/za.old/za_g.e22.G1850ECO_JRA_HR.TL319_t13.003.pop.h.0004-07.nc')

## Save the Catalog

In [59]:
b.save(
    '/glade/work/mgrover/intake-esm-catalogs/hires-marbl.csv',
    path_column_name='path',
    variable_column_name='variable',
    data_format='netcdf',
    groupby_attrs=[
        'component',
        'stream',
         'case',
    ],
    aggregations=[
        {'type': 'union', 'attribute_name': 'variable'},
        {
            'type': 'join_existing',
            'attribute_name': 'time_range',
            'options': {'dim': 'time', 'coords': 'minimal', 'compat': 'override'},
        },
        {
            'type': 'join_new',
            'attribute_name': 'member_id',
            'options': {'coords': 'minimal', 'compat': 'override'},
        },
    ],
)

Saved catalog location: /glade/work/mgrover/intake-esm-catalogs/hires-marbl.json and /glade/work/mgrover/intake-esm-catalogs/hires-marbl.csv


## Use the Catalog

In [60]:
import intake

In [61]:
col = intake.open_esm_datastore('/glade/work/mgrover/intake-esm-catalogs/hires-marbl.json')

In [62]:
dsets = col.search(variable='TEMP').to_dataset_dict()


--> The keys in the returned dictionary of datasets are constructed as follows:
	'component.stream.case'


In [63]:
dsets

{'ocn.pop.h.g.e22.G1850ECO_JRA_HR.TL319_t13.003': <xarray.Dataset>
 Dimensions:                 (d2: 2, member_id: 1, nlat: 2400, nlon: 3600, time: 48, z_t: 62, z_t_150m: 15, z_w: 62, z_w_bot: 62, z_w_top: 62)
 Coordinates:
   * z_t                     (z_t) float32 500.0 1.5e+03 ... 5.625e+05 5.875e+05
   * z_t_150m                (z_t_150m) float32 500.0 1.5e+03 ... 1.45e+04
   * z_w                     (z_w) float32 0.0 1e+03 2e+03 ... 5.5e+05 5.75e+05
   * z_w_top                 (z_w_top) float32 0.0 1e+03 ... 5.5e+05 5.75e+05
   * z_w_bot                 (z_w_bot) float32 1e+03 2e+03 ... 5.75e+05 6e+05
     ULONG                   (nlat, nlon) float64 dask.array<chunksize=(2400, 3600), meta=np.ndarray>
     ULAT                    (nlat, nlon) float64 dask.array<chunksize=(2400, 3600), meta=np.ndarray>
     TLONG                   (nlat, nlon) float64 dask.array<chunksize=(2400, 3600), meta=np.ndarray>
     TLAT                    (nlat, nlon) float64 dask.array<chunksize=(2400, 

In [65]:
ds = dsets['ocn.pop.h.g.e22.G1850ECO_JRA_HR.TL319_t13.004'].isel(time=0, z_t=0)

In [66]:
ds

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 65.92 MiB 65.92 MiB Shape (2400, 3600) (2400, 3600) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",3600  2400,

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 65.92 MiB 65.92 MiB Shape (2400, 3600) (2400, 3600) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",3600  2400,

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 65.92 MiB 65.92 MiB Shape (2400, 3600) (2400, 3600) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",3600  2400,

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 65.92 MiB 65.92 MiB Shape (2400, 3600) (2400, 3600) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",3600  2400,

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,4 B,4.0 B
Shape,(),()
Count,3 Tasks,1 Chunks
Type,float32,numpy.ndarray
Array Chunk Bytes 4 B 4.0 B Shape () () Count 3 Tasks 1 Chunks Type float32 numpy.ndarray,,

Unnamed: 0,Array,Chunk
Bytes,4 B,4.0 B
Shape,(),()
Count,3 Tasks,1 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,248 B,248 B
Shape,"(62,)","(62,)"
Count,2 Tasks,1 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 248 B 248 B Shape (62,) (62,) Count 2 Tasks 1 Chunks Type float32 numpy.ndarray",62  1,

Unnamed: 0,Array,Chunk
Bytes,248 B,248 B
Shape,"(62,)","(62,)"
Count,2 Tasks,1 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 65.92 MiB 65.92 MiB Shape (2400, 3600) (2400, 3600) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",3600  2400,

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 65.92 MiB 65.92 MiB Shape (2400, 3600) (2400, 3600) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",3600  2400,

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 65.92 MiB 65.92 MiB Shape (2400, 3600) (2400, 3600) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",3600  2400,

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 65.92 MiB 65.92 MiB Shape (2400, 3600) (2400, 3600) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",3600  2400,

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 65.92 MiB 65.92 MiB Shape (2400, 3600) (2400, 3600) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",3600  2400,

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 65.92 MiB 65.92 MiB Shape (2400, 3600) (2400, 3600) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",3600  2400,

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 65.92 MiB 65.92 MiB Shape (2400, 3600) (2400, 3600) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",3600  2400,

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 65.92 MiB 65.92 MiB Shape (2400, 3600) (2400, 3600) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",3600  2400,

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 65.92 MiB 65.92 MiB Shape (2400, 3600) (2400, 3600) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",3600  2400,

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 65.92 MiB 65.92 MiB Shape (2400, 3600) (2400, 3600) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",3600  2400,

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 65.92 MiB 65.92 MiB Shape (2400, 3600) (2400, 3600) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",3600  2400,

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 65.92 MiB 65.92 MiB Shape (2400, 3600) (2400, 3600) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",3600  2400,

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 65.92 MiB 65.92 MiB Shape (2400, 3600) (2400, 3600) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",3600  2400,

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 65.92 MiB 65.92 MiB Shape (2400, 3600) (2400, 3600) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",3600  2400,

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 65.92 MiB 65.92 MiB Shape (2400, 3600) (2400, 3600) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",3600  2400,

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 65.92 MiB 65.92 MiB Shape (2400, 3600) (2400, 3600) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",3600  2400,

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 65.92 MiB 65.92 MiB Shape (2400, 3600) (2400, 3600) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",3600  2400,

Unnamed: 0,Array,Chunk
Bytes,65.92 MiB,65.92 MiB
Shape,"(2400, 3600)","(2400, 3600)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,16 B,16 B
Shape,"(2,)","(2,)"
Count,103 Tasks,1 Chunks
Type,object,numpy.ndarray
"Array Chunk Bytes 16 B 16 B Shape (2,) (2,) Count 103 Tasks 1 Chunks Type object numpy.ndarray",2  1,

Unnamed: 0,Array,Chunk
Bytes,16 B,16 B
Shape,"(2,)","(2,)"
Count,103 Tasks,1 Chunks
Type,object,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,32.96 MiB,32.96 MiB
Shape,"(1, 2400, 3600)","(1, 2400, 3600)"
Count,137 Tasks,1 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 32.96 MiB 32.96 MiB Shape (1, 2400, 3600) (1, 2400, 3600) Count 137 Tasks 1 Chunks Type float32 numpy.ndarray",3600  2400  1,

Unnamed: 0,Array,Chunk
Bytes,32.96 MiB,32.96 MiB
Shape,"(1, 2400, 3600)","(1, 2400, 3600)"
Count,137 Tasks,1 Chunks
Type,float32,numpy.ndarray
