# Getting Started

This notebook demonstrates data catalog functionality provided by `intake-esm`. Let's begin by importing intake:

In [1]:
import intake

## Open a Collection


To use `intake-esm`, we can instatiate a `esm_metadatastore` class in two ways:
- With the name of the collection and type of collection we want to use.
- With a collection input YAML file

Since the class is in the top-level of the package i.e `__init__.py`, and the package name starts with `intake_`, it will be scanned when Intake is imported. Now the plugin automatically appears in the set of known plugins in the Intake registry, and an associated `intake.open_esm_metadatastore` function is created at import time.

In [2]:
intake.registry

{'yaml_file_cat': intake.catalog.local.YAMLFileCatalog,
 'yaml_files_cat': intake.catalog.local.YAMLFilesCatalog,
 'remote-xarray': intake_xarray.xarray_container.RemoteXarray,
 'xarray_image': intake_xarray.image.ImageSource,
 'netcdf': intake_xarray.netcdf.NetCDFSource,
 'opendap': intake_xarray.opendap.OpenDapSource,
 'rasterio': intake_xarray.raster.RasterIOSource,
 'zarr': intake_xarray.xzarr.ZarrSource,
 'esm_metadatastore': intake_esm.core.ESMMetadataStoreCatalog,
 'csv': intake.source.csv.CSVSource,
 'textfiles': intake.source.textfiles.TextFilesSource,
 'catalog': intake.catalog.base.Catalog,
 'intake_remote': intake.catalog.base.RemoteCatalog,
 'numpy': intake.source.npy.NPySource}

In [3]:
collection = intake.open_esm_metadatastore(collection_name='cesm_dple', collection_type="cesm")

In [4]:
collection.df.head()

Unnamed: 0_level_0,experiment,case,component,stream,variable,date_range,ensemble,files,files_basename,files_dirname,ctrl_branch_year,year_offset,sequence_order,has_ocean_bgc,grid
resource,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
GLADE:posix:/glade/p/cgd/oce/projects/DPLE_O2/sigma_coord/CESM-DPLE_POPCICEhindcast,g.e11_LENS.GECOIAF.T62_g16.009_sigma_coord,g.e11_LENS.GECOIAF.T62_g16.009,ocn,pop.h.sigma,O2,024901-031612,0,/glade/p/cgd/oce/projects/DPLE_O2/sigma_coord/...,g.e11_LENS.GECOIAF.T62_g16.009.pop.h.sigma.O2....,/glade/p/cgd/oce/projects/DPLE_O2/sigma_coord/...,,1699,0,,POP_gx1v6
GLADE:posix:/glade/p/cgd/oce/projects/DPLE_O2/sigma_coord/CESM-DPLE_POPCICEhindcast,g.e11_LENS.GECOIAF.T62_g16.009_sigma_coord,g.e11_LENS.GECOIAF.T62_g16.009,ocn,pop.h.sigma,NO3,024901-031612,0,/glade/p/cgd/oce/projects/DPLE_O2/sigma_coord/...,g.e11_LENS.GECOIAF.T62_g16.009.pop.h.sigma.NO3...,/glade/p/cgd/oce/projects/DPLE_O2/sigma_coord/...,,1699,0,,POP_gx1v6
GLADE:posix:/glade/p/cgd/oce/projects/DPLE_O2/sigma_coord/CESM-DPLE_POPCICEhindcast,g.e11_LENS.GECOIAF.T62_g16.009_sigma_coord,g.e11_LENS.GECOIAF.T62_g16.009,ocn,pop.h.sigma,SALT,024901-031612,0,/glade/p/cgd/oce/projects/DPLE_O2/sigma_coord/...,g.e11_LENS.GECOIAF.T62_g16.009.pop.h.sigma.SAL...,/glade/p/cgd/oce/projects/DPLE_O2/sigma_coord/...,,1699,0,,POP_gx1v6
GLADE:posix:/glade/p/cgd/oce/projects/DPLE_O2/sigma_coord/CESM-DPLE_POPCICEhindcast,g.e11_LENS.GECOIAF.T62_g16.009_sigma_coord,g.e11_LENS.GECOIAF.T62_g16.009,ocn,pop.h.sigma,TEMP,024901-031612,0,/glade/p/cgd/oce/projects/DPLE_O2/sigma_coord/...,g.e11_LENS.GECOIAF.T62_g16.009.pop.h.sigma.TEM...,/glade/p/cgd/oce/projects/DPLE_O2/sigma_coord/...,,1699,0,,POP_gx1v6
GLADE:posix:/glade/p/cesm/community/CESM-DPLE/CESM-DPLE_POPCICEhindcast,g.e11_LENS.GECOIAF.T62_g16.009,g.e11_LENS.GECOIAF.T62_g16.009,ocn,pop.h,ADVT,024901-031612,0,/glade/p/cesm/community/CESM-DPLE/CESM-DPLE_PO...,g.e11_LENS.GECOIAF.T62_g16.009.pop.h.ADVT.0249...,/glade/p/cesm/community/CESM-DPLE/CESM-DPLE_PO...,,1699,0,,POP_gx1v6


In [5]:
collection.df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 583 entries, GLADE:posix:/glade/p/cgd/oce/projects/DPLE_O2/sigma_coord/CESM-DPLE_POPCICEhindcast to GLADE:posix:/glade/p/cesm/community/CESM-DPLE/CESM-DPLE_POPCICEhindcast
Data columns (total 15 columns):
experiment          583 non-null object
case                583 non-null object
component           583 non-null object
stream              583 non-null object
variable            583 non-null object
date_range          583 non-null object
ensemble            583 non-null int64
files               583 non-null object
files_basename      583 non-null object
files_dirname       583 non-null object
ctrl_branch_year    0 non-null float64
year_offset         583 non-null int64
sequence_order      583 non-null int64
has_ocean_bgc       0 non-null float64
grid                325 non-null object
dtypes: float64(2), int64(3), object(10)
memory usage: 72.9+ KB


In [6]:
collection.df.head()

Unnamed: 0_level_0,experiment,case,component,stream,variable,date_range,ensemble,files,files_basename,files_dirname,ctrl_branch_year,year_offset,sequence_order,has_ocean_bgc,grid
resource,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
GLADE:posix:/glade/p/cgd/oce/projects/DPLE_O2/sigma_coord/CESM-DPLE_POPCICEhindcast,g.e11_LENS.GECOIAF.T62_g16.009_sigma_coord,g.e11_LENS.GECOIAF.T62_g16.009,ocn,pop.h.sigma,O2,024901-031612,0,/glade/p/cgd/oce/projects/DPLE_O2/sigma_coord/...,g.e11_LENS.GECOIAF.T62_g16.009.pop.h.sigma.O2....,/glade/p/cgd/oce/projects/DPLE_O2/sigma_coord/...,,1699,0,,POP_gx1v6
GLADE:posix:/glade/p/cgd/oce/projects/DPLE_O2/sigma_coord/CESM-DPLE_POPCICEhindcast,g.e11_LENS.GECOIAF.T62_g16.009_sigma_coord,g.e11_LENS.GECOIAF.T62_g16.009,ocn,pop.h.sigma,NO3,024901-031612,0,/glade/p/cgd/oce/projects/DPLE_O2/sigma_coord/...,g.e11_LENS.GECOIAF.T62_g16.009.pop.h.sigma.NO3...,/glade/p/cgd/oce/projects/DPLE_O2/sigma_coord/...,,1699,0,,POP_gx1v6
GLADE:posix:/glade/p/cgd/oce/projects/DPLE_O2/sigma_coord/CESM-DPLE_POPCICEhindcast,g.e11_LENS.GECOIAF.T62_g16.009_sigma_coord,g.e11_LENS.GECOIAF.T62_g16.009,ocn,pop.h.sigma,SALT,024901-031612,0,/glade/p/cgd/oce/projects/DPLE_O2/sigma_coord/...,g.e11_LENS.GECOIAF.T62_g16.009.pop.h.sigma.SAL...,/glade/p/cgd/oce/projects/DPLE_O2/sigma_coord/...,,1699,0,,POP_gx1v6
GLADE:posix:/glade/p/cgd/oce/projects/DPLE_O2/sigma_coord/CESM-DPLE_POPCICEhindcast,g.e11_LENS.GECOIAF.T62_g16.009_sigma_coord,g.e11_LENS.GECOIAF.T62_g16.009,ocn,pop.h.sigma,TEMP,024901-031612,0,/glade/p/cgd/oce/projects/DPLE_O2/sigma_coord/...,g.e11_LENS.GECOIAF.T62_g16.009.pop.h.sigma.TEM...,/glade/p/cgd/oce/projects/DPLE_O2/sigma_coord/...,,1699,0,,POP_gx1v6
GLADE:posix:/glade/p/cesm/community/CESM-DPLE/CESM-DPLE_POPCICEhindcast,g.e11_LENS.GECOIAF.T62_g16.009,g.e11_LENS.GECOIAF.T62_g16.009,ocn,pop.h,ADVT,024901-031612,0,/glade/p/cesm/community/CESM-DPLE/CESM-DPLE_PO...,g.e11_LENS.GECOIAF.T62_g16.009.pop.h.ADVT.0249...,/glade/p/cesm/community/CESM-DPLE/CESM-DPLE_PO...,,1699,0,,POP_gx1v6


## Search entries matching query

One of the features supported in `intake-esm` is querying the collection. This is achieved through the `search` method. The `search` method allows the user to specify a query by using keyword arguments. This method returns a subset of the collection with all the entries that match the query. 

In [7]:
cat = collection.search(case="g.e11_LENS.GECOIAF.T62_g16.009", component='ocn', variable=["O2"])

In [8]:
print(cat.yaml(True))

plugins:
  source:
  - module: intake_esm.cesm
sources:
  cesm_dple_1571d4f5-70a4-42f6-b952-10faf9ba32a5:
    args:
      chunks:
        time: 1
      collection_name: cesm_dple
      collection_type: cesm
      concat_dim: time
      decode_coords: false
      decode_times: false
      engine: netcdf4
      query:
        case: g.e11_LENS.GECOIAF.T62_g16.009
        component: ocn
        ctrl_branch_year: null
        date_range: null
        ensemble: null
        experiment: null
        files: null
        files_basename: null
        files_dirname: null
        grid: null
        has_ocean_bgc: null
        sequence_order: null
        stream: null
        variable:
        - O2
        year_offset: null
    description: Catalog entry from cesm_dple collection
    driver: cesm
    metadata:
      cache: {}
      catalog_dir: ''



In [9]:
cat.results.info()

<class 'pandas.core.frame.DataFrame'>
Index: 2 entries, GLADE:posix:/glade/p/cesm/community/CESM-DPLE/CESM-DPLE_POPCICEhindcast to GLADE:posix:/glade/p/cgd/oce/projects/DPLE_O2/sigma_coord/CESM-DPLE_POPCICEhindcast
Data columns (total 15 columns):
experiment          2 non-null object
case                2 non-null object
component           2 non-null object
stream              2 non-null object
variable            2 non-null object
date_range          2 non-null object
ensemble            2 non-null int64
files               2 non-null object
files_basename      2 non-null object
files_dirname       2 non-null object
ctrl_branch_year    0 non-null float64
year_offset         2 non-null int64
sequence_order      2 non-null int64
has_ocean_bgc       0 non-null float64
grid                2 non-null object
dtypes: float64(2), int64(3), object(10)
memory usage: 256.0+ bytes


In [10]:
cat.results

Unnamed: 0_level_0,experiment,case,component,stream,variable,date_range,ensemble,files,files_basename,files_dirname,ctrl_branch_year,year_offset,sequence_order,has_ocean_bgc,grid
resource,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
GLADE:posix:/glade/p/cesm/community/CESM-DPLE/CESM-DPLE_POPCICEhindcast,g.e11_LENS.GECOIAF.T62_g16.009,g.e11_LENS.GECOIAF.T62_g16.009,ocn,pop.h,O2,024901-031612,0,/glade/p/cesm/community/CESM-DPLE/CESM-DPLE_PO...,g.e11_LENS.GECOIAF.T62_g16.009.pop.h.O2.024901...,/glade/p/cesm/community/CESM-DPLE/CESM-DPLE_PO...,,1699,0,,POP_gx1v6
GLADE:posix:/glade/p/cgd/oce/projects/DPLE_O2/sigma_coord/CESM-DPLE_POPCICEhindcast,g.e11_LENS.GECOIAF.T62_g16.009_sigma_coord,g.e11_LENS.GECOIAF.T62_g16.009,ocn,pop.h.sigma,O2,024901-031612,0,/glade/p/cgd/oce/projects/DPLE_O2/sigma_coord/...,g.e11_LENS.GECOIAF.T62_g16.009.pop.h.sigma.O2....,/glade/p/cgd/oce/projects/DPLE_O2/sigma_coord/...,,1699,0,,POP_gx1v6


In [11]:
cat2 = collection.search(component=["ocn"], variable=["TEMP", "ADVT"])

In [12]:
print(cat2.yaml(True))

plugins:
  source:
  - module: intake_esm.cesm
sources:
  cesm_dple_12ac81c4-2516-44a2-a538-a434c1d8cf42:
    args:
      chunks:
        time: 1
      collection_name: cesm_dple
      collection_type: cesm
      concat_dim: time
      decode_coords: false
      decode_times: false
      engine: netcdf4
      query:
        case: null
        component:
        - ocn
        ctrl_branch_year: null
        date_range: null
        ensemble: null
        experiment: null
        files: null
        files_basename: null
        files_dirname: null
        grid: null
        has_ocean_bgc: null
        sequence_order: null
        stream: null
        variable:
        - TEMP
        - ADVT
        year_offset: null
    description: Catalog entry from cesm_dple collection
    driver: cesm
    metadata:
      cache: {}
      catalog_dir: ''



As the user queries the collection, `intake-esm` builds a dictionary of catalog entries for executed searches so far:

In [13]:
collection._entries

{'cesm_dple_1571d4f5-70a4-42f6-b952-10faf9ba32a5': <Catalog Entry: cesm_dple_1571d4f5-70a4-42f6-b952-10faf9ba32a5>,
 'cesm_dple_12ac81c4-2516-44a2-a538-a434c1d8cf42': <Catalog Entry: cesm_dple_12ac81c4-2516-44a2-a538-a434c1d8cf42>}

In [14]:
for key, val in collection._entries.items():
    print(val.yaml(True))

plugins:
  source:
  - module: intake_esm.cesm
sources:
  cesm_dple_1571d4f5-70a4-42f6-b952-10faf9ba32a5:
    args:
      chunks:
        time: 1
      collection_name: cesm_dple
      collection_type: cesm
      concat_dim: time
      decode_coords: false
      decode_times: false
      engine: netcdf4
      query:
        case: g.e11_LENS.GECOIAF.T62_g16.009
        component: ocn
        ctrl_branch_year: null
        date_range: null
        ensemble: null
        experiment: null
        files: null
        files_basename: null
        files_dirname: null
        grid: null
        has_ocean_bgc: null
        sequence_order: null
        stream: null
        variable:
        - O2
        year_offset: null
    description: Catalog entry from cesm_dple collection
    driver: cesm
    metadata:
      cache: {}
      catalog_dir: ''

plugins:
  source:
  - module: intake_esm.cesm
sources:
  cesm_dple_12ac81c4-2516-44a2-a538-a434c1d8cf42:
    args:
      chunks:
        time: 1
      

In [15]:
%load_ext watermark

In [16]:
%watermark --iversion -g -h -m -v -u -d

intake 0.4.1
last updated: 2019-02-17 

CPython 3.6.7
IPython 7.1.1

compiler   : GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)
system     : Darwin
release    : 17.7.0
machine    : x86_64
processor  : i386
CPU cores  : 8
interpreter: 64bit
host name  : cisl-sublimity
Git hash   : 16c4342b21d322b0646861c0c7fa358ac63fe922
