## Section 3.4 - Climate Model Output

Up until now, this chapter has focused on acquiring and analyzing climate data from observations, or reanalysis data which assimilates observational data into a weather forecast model. These sources of data are only half of the story. Since this guide is about climate change projections, we must also learn how to download and process output from climate model simulations. 

Output from climate models that was submitted to the CMIP projects has been made publicly available for download from the [Earth System Grid Federation (ESGF)](https://esgf.llnl.gov/). The ESGF is a peer-to-peer network with nodes hosted by several research centres around the world. The archive of data hosted on a given node can be searched and naviagted in a browser. The website for the node hosted at Lawrence Livermore National Laboratory in the USA can be found [at this link](https://esgf-node.llnl.gov/projects/esgf-llnl/), and links for all of the nodes can be found [here](https://esgf.llnl.gov/nodes.html). Via the browser interface, one can populate their data cart like an online store and generate a `wget` script that when executed, will download the selected data. However, this process can be somewhat cumbersome if you are interested in large amounts of data, or if you wish to create an automated workflow. For advanced users, the ESGF offers a RESTful API, which is outside the scope of this guide. In this guide we will use the Python package ([esgf-pyclient](https://esgf-pyclient.readthedocs.io/en/latest/index.html)) (also called `pyesgf`) as an interface to the API. With `pyesgf`, one can either get an OPeNDAP URL to access the data directly, or get a URL from which the data can be downloaded using the Python `requests` library.

[Complete documentation for the ESGF, and all options for accessing the data, can be found here].(https://esgf.github.io/esgf-user-support/index.html).


As with the reanalysis and gridded observations, climate model output is typically very large in volume. For this reason, we will show examples in this notebook using monthly means instead of daily means or 6-hourly instananeous output, which is typically required for climate impact analysis.

### Searching For Data
To search the ESGF archive we must first establish a connection to an ESGF node using the `pyesgf.search.SearchConnection` class. To do this, we pass the URL for the node to `SearchConnection`. Since each node may only host a subset of the complete CMIP archive, the argument `distrib = True` tells the API to search across all nodes, not just the one we connected to.

In [1]:
# import the required packages
from pyesgf.search import SearchConnection
import os
import requests
from tqdm import tqdm

import xarray as xr
import pandas as pd

In [2]:
# establish a connection to the LLNL node
conn = SearchConnection('https://esgf-node.llnl.gov/esg-search', distrib=True)

After connecting to a node we need to decide which data we'd like to access. The search function takes a number of different search parameters, listed and described in the table below. Unfortunately, the names for some of the search parameters are different for the CMIP5 and CMIP6 archives, so both are included in the table.

|                   Search Criterion                   |    CMIP5 Name    |         CMIP6 Name        |
|:----------------------------------------------------:|:----------------:|:-------------------------:|
|          Project (i.e. CMIP6, CORDEX, etc.)          |     `project`    |         `project`         |
|                      Model Name                      |      `model`     |        `source_id`        |
|                    Time Frequency                    | `time_frequency` | `table_id` or `frequency` |
|                   Forcing Scenario                   |   `experiment`   |      `experiment_id`      |
|                       Variable                       |    `variable`    |       `Variable ID`       |
|                    Ensemble Member                   |    `ensemble`    |        `member_id`        |
| Climate domain (i.e. atmosphere, ocean,  land, etc.) |      `realm`     |          `realm`          |


To include multiple values for a search parameter (e.g. if you want data for both historical and SSP5-8.5 forcing scenarios), you can separate the search keywords with a comma (e.g. `experiment_id = 'historical,ssp585'`. Let's do exactly this, searching for `tas` (surface air temperature) data at monthly frequency from the model `CanESM5`, the Canadian model contributing to CMIP6.

In [10]:
query = conn.new_context(latest = True, # search for most recent versions of the file
                         facets = 'project,experiment_id,source_id,frequency,member_id,variable_id',
                         project="CMIP6",
                         experiment_id='historical,ssp585',
                         source_id = "CanESM5",
                         frequency = 'mon', # monthly
                         member_id="r1i1p1f1",
                         variable_id = "tas")
results = query.search()

len(results)

12

In [11]:
files = []
for i in range(len(results)):
    hit = results[i].file_context().search()
    files += list(map(lambda f: {'filename': f.filename, 
                                 'download_url': f.download_url, 
                                 'opendap_url': f.opendap_url}, hit))


-------------------------------------------------------------------------------

This behavior is kept for backward-compatibility, but ESGF indexes might not
successfully perform a distributed search when this option is used, so some
results may be missing.  For full results, it is recommended to pass a list of
facets of interest when instantiating a context object.  For example,

      ctx = conn.new_context(facets='project,experiment_id')

Only the facets that you specify will be present in the facets_counts dictionary.

or explicitly use  conn.new_context(facets='*')

-------------------------------------------------------------------------------

-------------------------------------------------------------------------------

This behavior is kept for backward-compatibility, but ESGF indexes might not
successfully perform a distributed search when this option is used, so some
results may be missing.  For full results, it is recommended to pass a list of
facets of interest when ins


-------------------------------------------------------------------------------

This behavior is kept for backward-compatibility, but ESGF indexes might not
successfully perform a distributed search when this option is used, so some
results may be missing.  For full results, it is recommended to pass a list of
facets of interest when instantiating a context object.  For example,

      ctx = conn.new_context(facets='project,experiment_id')

Only the facets that you specify will be present in the facets_counts dictionary.

or explicitly use  conn.new_context(facets='*')

-------------------------------------------------------------------------------

-------------------------------------------------------------------------------

This behavior is kept for backward-compatibility, but ESGF indexes might not
successfully perform a distributed search when this option is used, so some
results may be missing.  For full results, it is recommended to pass a list of
facets of interest when ins

In [12]:
files = pd.DataFrame.from_dict(files)
files.head()

Unnamed: 0,filename,download_url,opendap_url
0,tas_Amon_CanESM5_ssp585_r1i1p1f1_gn_201501-210...,http://crd-esgf-drc.ec.gc.ca/thredds/fileServe...,http://crd-esgf-drc.ec.gc.ca/thredds/dodsC/esg...
1,tas_Amon_CanESM5_ssp585_r1i1p1f1_gn_210101-218...,http://crd-esgf-drc.ec.gc.ca/thredds/fileServe...,http://crd-esgf-drc.ec.gc.ca/thredds/dodsC/esg...
2,tas_Amon_CanESM5_ssp585_r1i1p1f1_gn_218101-230...,http://crd-esgf-drc.ec.gc.ca/thredds/fileServe...,http://crd-esgf-drc.ec.gc.ca/thredds/dodsC/esg...
3,tas_Amon_CanESM5_historical_r1i1p1f1_gn_185001...,http://aims3.llnl.gov/thredds/fileServer/css03...,http://aims3.llnl.gov/thredds/dodsC/css03_data...
4,tas_Amon_CanESM5_ssp585_r1i1p1f1_gn_201501-210...,http://aims3.llnl.gov/thredds/fileServer/css03...,http://aims3.llnl.gov/thredds/dodsC/css03_data...
