# Overview

## Introduction

This page describes the technical design of Intake-esm, motivation behind the project, and components of the package.

## Why Intake-esm?

Project efforts such as [CMIP](https://www.wcrp-climate.org/wgcm-cmip), and [CESM Large Ensembe](http://www.cesm.ucar.edu/projects/community-projects/LENS/) produce a huge of amount data persisted in multiple NetCDF files. Finding, investigating, loading these files into data array containers such as `xarray` can be a difficult task because the number of files a user may be interested in can be large. `Intake-esm` was written to make it easy to seamlessly find, investigate, load, and disseminate earth system data holdings produced by these projects.

`Intake-esm` solves a set of problems:

- It eliminates the need for the user to know specific locations (file path) of their data set of interest. 
- It allows the user to specify simple spec to define data sources and build collection catalogs for these data sources.
- It loads data sets into data array containers such as `xarray`, and gets out of your way.
- It allows reproduciblity, and data provenance.


Intake-esm supports data holdings from the following projects:

- [CMIP](./cmip5.ipynb): Coupled Model Intercomparison Project ([phase 5](https://esgf-node.llnl.gov/projects/cmip5/) and [phase 6](https://esgf-node.llnl.gov/projects/cmip6/))
- [CESM](./cesm.ipynb): [Community Earth System Model Large Ensemble (LENS), and Decadal Prediction Large Ensemble (DPLE)](http://www.cesm.ucar.edu/projects/community-projects/)
- [MPI-GE](./mpige.ipynb): [The Max Planck Institute for Meteorology (MPI-M) Grand Ensemble (MPI-GE)](https://www.mpimet.mpg.de/en/grand-ensemble/)
- [GMET](./gmet.ipynb): [The Gridded Meteorological Ensemble Tool data](https://ncar.github.io/hydrology/models/GMET)
- [ERA5](./era5.ipynb): [ECWMF ERA5 Reanalysis dataset stored on NCAR's GLADE](https://rda.ucar.edu/datasets/ds630.0/#!description) in ``/glade/collections/rda/data/ds630.0``

## Concepts

`Intake-esm` extends functionality provided by `intake`. `Intake-esm` is built out of four core concepts:


- **Collection**: An object that represents a reference to a data holding such as a collection of CESM Large ensemble model output. 

- **Data Source**: An object that represents a reference to a data source. Data source objects have methods for loading the data into `xarray` containers namely `xarray` datasets. 

- **Catalog**: A collection of catalog entries, each of which defines a data source. Like in `intake`, catalog objects can be created from local YAML definitions by some driver that knows how to query a data collection. 

- **Catalog Entry**: A named data source. The catalog entry includes metadata about the source, as well as the name of the driver and arguments. Arguments can be parameterized, allowing one entry to return different subsets of data depending on the user request.