# Defining ESGF Use Cases

## Motivation
It is important to consider the different use-cases scientists might have when using data from the Earth System Grid Federation (ESGF). There are two primary workflows:
- Global/regional statistics
- Timeseries for some given location
- Aggregated spatial averages/temporal means

For this illustration, we will use ILAMB (land model diagnostic suite) output as an example.

### Global or Regional Statistics

The first case is one where the user would use either a single vertical level (ex. the surface of the land, ocean, atmosphere) or the average over several vertical levels (ex. vertically integrated BGC variables in the ocean). Scientists may be interested in bias, general trends, or general performance stastics compared to some baseline.

These datasets are **aggregated over some temporal range**. From the data query side, the user would require:
- All time steps
- The entire spatial domain (or a region if interested in some regional mask ex. North America)

**Sample Query Syntax**

```python
subset = catalog.search(variable='tas', frequency='monthly', experiment_id=['ssp370', 'ssp585'])

temporal_average = subset.mean(dim='time')
```

Which would leave the dataset with the spatial dimensions, and possibly height/depth.


![Global mean plot map](images/global-mean-map.png)

### Timeseries for some location

Another case might be interest in a spatial subset (a single location, or set of locations) from the models. **This is a case where the user does not require the entire global dataset, a possible opportunity for server-side subsetting**. For example, someone may be interested in a timeseries of projected temperature or rainfall values over the grid cell closest to Chicago. Or, they may be interested in a set of locations (Chicago, New York, Los Angeles).

These datasets are **subset from a single location**. From the data query side, the user would require:
- All time steps
- A single location, or set of locations
- A single level, or set of vertical levels

It would be inefficient here for the user to need to **download the entire global dataset, for each model** if they just needed data for a single data point.

The user may want to calculate an average from this timeseries as well.

```python
subset = catalog.search(variable='tas', frequency='monthly', experiment_id=['ssp370', 'ssp585'])

timeseries = subset.sel(lat=41.87, lon=-87.63)
```

![Single Location timeseries](images/sample-location-timeseries.png)

### Aggregated Spatial Averages/Integrals 
The last case here is a combination of the first two. One may be interested in **global or regional statistics** over some time period, or over the entire time period of interest. For example, one might want to determine global or regional carbon sinks or sources. Another common one is global average temperature.

These datasets are **aggregated over some spatial range**. From the data query side, the user would require:
- All time steps
- The entire spatial domain (or a region if interested in some regional mask ex. North America)

**Sample Query Syntax**
```python
subset = catalog.search(variable='tas', frequency='monthly', experiment_id=['ssp370', 'ssp585'])

global_average = subset.weighted.mean(dim=['lat', 'lon']) #.mean(dim='time')
```

![Global integral of carbon over land](images/global-integral-timeseries.png)

---

## Sample Demo with an Intake-ESM Catalog

Let's demo some of these illustrations with Intake-ESM.

For this use-case illustration, we will use:
- CESM2 (NCAR's flagship climate model) and E3SM (DOE's flagship climate model)
- Rainfall (`pr`) and surface temperature (`tas`) data
- Monthly frequency data


In [1]:
import intake
from distributed import Client, LocalCluster
import dask
from cmip6_preprocessing.preprocessing import combined_preprocessing

In [2]:
cluster = LocalCluster(n_workers=20)
client = Client(cluster)
client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status,

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 20
Total threads: 80,Total memory: 503.60 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:41215,Workers: 20
Dashboard: http://127.0.0.1:8787/status,Total threads: 80
Started: Just now,Total memory: 503.60 GiB

0,1
Comm: tcp://127.0.0.1:40943,Total threads: 4
Dashboard: http://127.0.0.1:35202/status,Memory: 25.18 GiB
Nanny: tcp://127.0.0.1:38286,
Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-32bpy2fg,Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-32bpy2fg

0,1
Comm: tcp://127.0.0.1:43483,Total threads: 4
Dashboard: http://127.0.0.1:35317/status,Memory: 25.18 GiB
Nanny: tcp://127.0.0.1:34419,
Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-toodf681,Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-toodf681

0,1
Comm: tcp://127.0.0.1:39916,Total threads: 4
Dashboard: http://127.0.0.1:38202/status,Memory: 25.18 GiB
Nanny: tcp://127.0.0.1:44492,
Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-pqeyxnpl,Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-pqeyxnpl

0,1
Comm: tcp://127.0.0.1:33776,Total threads: 4
Dashboard: http://127.0.0.1:40150/status,Memory: 25.18 GiB
Nanny: tcp://127.0.0.1:33901,
Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-tp59zcyt,Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-tp59zcyt

0,1
Comm: tcp://127.0.0.1:37749,Total threads: 4
Dashboard: http://127.0.0.1:44034/status,Memory: 25.18 GiB
Nanny: tcp://127.0.0.1:44416,
Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-02podv3g,Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-02podv3g

0,1
Comm: tcp://127.0.0.1:36853,Total threads: 4
Dashboard: http://127.0.0.1:37010/status,Memory: 25.18 GiB
Nanny: tcp://127.0.0.1:40079,
Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-l860ko1b,Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-l860ko1b

0,1
Comm: tcp://127.0.0.1:45643,Total threads: 4
Dashboard: http://127.0.0.1:42984/status,Memory: 25.18 GiB
Nanny: tcp://127.0.0.1:41964,
Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-64_bv5p3,Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-64_bv5p3

0,1
Comm: tcp://127.0.0.1:37189,Total threads: 4
Dashboard: http://127.0.0.1:40186/status,Memory: 25.18 GiB
Nanny: tcp://127.0.0.1:43221,
Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-vy8ejjjy,Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-vy8ejjjy

0,1
Comm: tcp://127.0.0.1:35720,Total threads: 4
Dashboard: http://127.0.0.1:42670/status,Memory: 25.18 GiB
Nanny: tcp://127.0.0.1:36804,
Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-0l3zteg9,Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-0l3zteg9

0,1
Comm: tcp://127.0.0.1:37999,Total threads: 4
Dashboard: http://127.0.0.1:44678/status,Memory: 25.18 GiB
Nanny: tcp://127.0.0.1:33090,
Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-97jruc8o,Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-97jruc8o

0,1
Comm: tcp://127.0.0.1:43067,Total threads: 4
Dashboard: http://127.0.0.1:40442/status,Memory: 25.18 GiB
Nanny: tcp://127.0.0.1:38399,
Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-9mx2v0vr,Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-9mx2v0vr

0,1
Comm: tcp://127.0.0.1:34934,Total threads: 4
Dashboard: http://127.0.0.1:43307/status,Memory: 25.18 GiB
Nanny: tcp://127.0.0.1:46423,
Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-xc65h4qb,Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-xc65h4qb

0,1
Comm: tcp://127.0.0.1:37910,Total threads: 4
Dashboard: http://127.0.0.1:35835/status,Memory: 25.18 GiB
Nanny: tcp://127.0.0.1:36286,
Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-k1g_h7ui,Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-k1g_h7ui

0,1
Comm: tcp://127.0.0.1:35245,Total threads: 4
Dashboard: http://127.0.0.1:41882/status,Memory: 25.18 GiB
Nanny: tcp://127.0.0.1:34747,
Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-1ws36ci5,Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-1ws36ci5

0,1
Comm: tcp://127.0.0.1:42015,Total threads: 4
Dashboard: http://127.0.0.1:41078/status,Memory: 25.18 GiB
Nanny: tcp://127.0.0.1:44074,
Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-orgbybry,Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-orgbybry

0,1
Comm: tcp://127.0.0.1:36713,Total threads: 4
Dashboard: http://127.0.0.1:38474/status,Memory: 25.18 GiB
Nanny: tcp://127.0.0.1:40313,
Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-n9b8a7w5,Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-n9b8a7w5

0,1
Comm: tcp://127.0.0.1:33674,Total threads: 4
Dashboard: http://127.0.0.1:34891/status,Memory: 25.18 GiB
Nanny: tcp://127.0.0.1:45810,
Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-kr4nyjz3,Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-kr4nyjz3

0,1
Comm: tcp://127.0.0.1:43133,Total threads: 4
Dashboard: http://127.0.0.1:44961/status,Memory: 25.18 GiB
Nanny: tcp://127.0.0.1:41070,
Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-o0rx9szf,Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-o0rx9szf

0,1
Comm: tcp://127.0.0.1:43914,Total threads: 4
Dashboard: http://127.0.0.1:44496/status,Memory: 25.18 GiB
Nanny: tcp://127.0.0.1:44536,
Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-mw7ge7kk,Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-mw7ge7kk

0,1
Comm: tcp://127.0.0.1:37272,Total threads: 4
Dashboard: http://127.0.0.1:37645/status,Memory: 25.18 GiB
Nanny: tcp://127.0.0.1:44043,
Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-ete0qn_g,Local directory: /home/mgrover4/git_repos/esgf-2/dask-worker-space/worker-ete0qn_g


In [6]:
catalog = intake.open_esm_datastore("anl-cmip6.json")

In [12]:
catalog_subset = catalog.search(variable_id=['tas', 'pr'],
                                experiment_id=['historical'],
                                source_id=['CESM2', 'E3SM-1-1']
                                )

In [15]:
catalog_subset

Unnamed: 0,unique
activity_id,1
institution_id,2
source_id,2
experiment_id,1
member_id,11
table_id,2
variable_id,2
grid_label,2
dcpp_init_year,0
version,7


In [16]:
catalog_subset.df

Unnamed: 0,activity_id,institution_id,source_id,experiment_id,member_id,table_id,variable_id,grid_label,dcpp_init_year,version,time_range,path
0,CMIP,E3SM-Project,E3SM-1-1,historical,r1i1p1f1,Amon,pr,gr,,v20191211,185001-185912,/eagle/projects/ESGF2/esg_dataroot/css03_data/...
1,CMIP,E3SM-Project,E3SM-1-1,historical,r1i1p1f1,Amon,pr,gr,,v20191211,186001-186912,/eagle/projects/ESGF2/esg_dataroot/css03_data/...
2,CMIP,E3SM-Project,E3SM-1-1,historical,r1i1p1f1,Amon,pr,gr,,v20191211,187001-187912,/eagle/projects/ESGF2/esg_dataroot/css03_data/...
3,CMIP,E3SM-Project,E3SM-1-1,historical,r1i1p1f1,Amon,pr,gr,,v20191211,188001-188912,/eagle/projects/ESGF2/esg_dataroot/css03_data/...
4,CMIP,E3SM-Project,E3SM-1-1,historical,r1i1p1f1,Amon,pr,gr,,v20191211,189001-189912,/eagle/projects/ESGF2/esg_dataroot/css03_data/...
...,...,...,...,...,...,...,...,...,...,...,...,...
440,CMIP,NCAR,CESM2,historical,r9i1p1f1,day,tas,gn,,v20190311,19700101-19791231,/eagle/projects/ESGF2/esg_dataroot/css03_data/...
441,CMIP,NCAR,CESM2,historical,r9i1p1f1,day,tas,gn,,v20190311,19800101-19891231,/eagle/projects/ESGF2/esg_dataroot/css03_data/...
442,CMIP,NCAR,CESM2,historical,r9i1p1f1,day,tas,gn,,v20190311,19900101-19991231,/eagle/projects/ESGF2/esg_dataroot/css03_data/...
443,CMIP,NCAR,CESM2,historical,r9i1p1f1,day,tas,gn,,v20190311,20000101-20091231,/eagle/projects/ESGF2/esg_dataroot/css03_data/...


In [None]:
with dask.config.set(**{'array.slicing.split_large_chunks': False}):
    dsets = catalog_subset.to_dataset_dict(preprocess=combined_preprocessing)