# Search and Access CMIP5 data from NCI

In this notebook, we demonstrate how to use CleF and Xarray to search and access CMIP5 data from NCI. You need to run this notebook in VDI so as to access data through file system. 

The following material uses Coupled Model Intercomparison Project (CMIP5) collections. The CMIP5 terms of use are found [here](https://cmip.llnl.gov/cmip5/terms.html). For more information on the collection, please click [here](https://geonetwork.nci.org.au/geonetwork/srv/eng/catalog.search#/metadata/f3525_9322_8600_7716).

## Set up environment

In [1]:
!module use /g/data3/hh5/public/modules
!module load conda/analysis3

/bin/sh: module: command not found
/bin/sh: module: command not found


We use Clef to search CMIP5 data path. Clef is now loaded for our use. 

Clef searches the Earth System Grid Federation datasets stored at the Australian National Computational Infrastructure, both data published on the NCI ESGF node as well as files that are locally replicated from other ESGF nodes.

Currently it searches for the following datasets:

- CMIP5 raijin projects: rr3, where NCI is the primary publisher and al33 for replicas
- CMIP6 raijin projects: 0i10 for replicas

For detailed information about using 'Clef', check out this [webpage](https://clef.readthedocs.io/en/latest/gettingstarted.html).

## Import python packages

In [2]:
import xarray as xr
%matplotlib inline

  data = yaml.load(f.read()) or {}
  defaults = yaml.load(f)


## Use Clef -cmip5 to serach data

First, check out help information to see search options in clef cmip5.

In [54]:
!clef cmip5 --help

Usage: clef cmip5 [OPTIONS] [QUERY]...

  Search ESGF and local database for CMIP5 files

  Constraints can be specified multiple times, in which case they are
  combined    using OR: -v tas -v tasmin will return anything matching
  variable = 'tas' or variable = 'tasmin'. The --latest flag will check ESGF
  for the latest version available, this is the default behaviour

Options:
  -e, --experiment x              CMIP5 experiment: piControl, rcp85, amip ...
  --experiment_family [Atmos-only|Control|Decadal|ESM|Historical|Idealized|Paleo|RCP]
                                  CMIP5 experiment family: Decadal, RCP ...
  -m, --model x                   CMIP5 model acronym: ACCESS1.3, MIROC5 ...
  -t, --table, --mip [Amon|Omon|OImon|LImon|Lmon|6hrPlev|6hrLev|3hr|Oclim|Oyr|aero|cfOff|cfSites|cfMon|cfDay|cf3hr|day|fx|grids]
  -v, --variable x                Variable name as shown in filanames: tas,
                                  pr, sic ...
  -en, --ensemble, --member TE

Then, we will search available temperature and precipitation data by defining all the parameters. See example below:

In [None]:
!clef cmip5  --activity ScenarioMIP  --source_id  CNRM-CM6-1 --table Amon  --variable tas  --variable pr   --grid gr 

<div class="alert alert-info">
<b>NOTE: </b>Due to different DRS (Directory Reference Structure) between CMIP5 and CMIP6, search syntax using clef is slightly different between two datasets. They have to be strictly consistent with each DRS tree and they are also case sensitive.   
</div>

So below instead: 

In [None]:
!clef cmip5  --experiment rcp26  --ensemble r1i1p1 --table Amon   --variable tas --variable pr

We can then set up values for CMIP5 attributions according to the CleF search results.

CMIP5 data are organised according to its global attributes. We can access different data by changing attributes in the directory below:
**/g/data/rr3//product/institute/model/experiment/frequency/realm/table/ensemble/version/variable**

There are four Representative Concentration Pathways (RCPs) in CMIP5. These are greenhouse gas concentration (not emissions) trajectory adopted by the IPCC for its fifth Assessment Report (AR5) in 2014. It supersedes the Special Report on Emissions Scenarios (SRES) projections published in 2000. For more information, see [here]( https://sedac.ciesin.columbia.edu/ddc/ar5_scenario_process/RCPs.html).
 

Below, we have set up these attributes in oreder to get the future projection data under the rcp26 senario using member 'r1i1p1' of 'MIROC-ESM' model simulations as an example. 

## Show data file names

In [None]:
!ls /g/data1b/al33/replicas/CMIP5/combined/MIROC/MIROC-ESM/rcp26/mon/atmos/Amon/r1i1p1/v20120710/tas/

In [None]:
!ls /g/data1b/al33/replicas/CMIP5/combined/MIROC/MIROC-ESM/rcp26/mon/atmos/Amon/r1i1p1/v20120710/pr/

## Use Xarray to open data

#### temperature

In [None]:
cmip5Dir='/g/data1b/al33/replicas/CMIP5'
product='combined'
institute='MIROC'
model='MIROC-ESM'
experiment='rcp26' 
frenquency='mon'
realm='atmos'
table='Amon'
ensemble='r1i1p1'
version='v20120710'
variable='tas'  
period='200601-210012'
ds=xr.open_dataset(''+cmip5Dir+'/'+product+'/'+institute+'/'+model+'/'+experiment+'/'+frenquency+'/'+realm+'/'+table+'/'+ensemble+'/'+version+'/'+variable+'/'+variable+'_'+table+'_'+model+'_'+experiment+'_'+ensemble+'_'+period+'.nc')
tas=ds.tas
tas

#### precipitation

In [None]:
cmip5Dir='/g/data1b/al33/replicas/CMIP5'
product='combined'
institute='MIROC'
model='MIROC-ESM'
experiment='rcp26' 
frenquency='mon'
realm='atmos'
table='Amon'
ensemble='r1i1p1'
version='v20120710'
variable='pr'  
period='200601-210012'
ds=xr.open_dataset(''+cmip5Dir+'/'+product+'/'+institute+'/'+model+'/'+experiment+'/'+frenquency+'/'+realm+'/'+table+'/'+ensemble+'/'+version+'/'+variable+'/'+variable+'_'+table+'_'+model+'_'+experiment+'_'+ensemble+'_'+period+'.nc')
pr=ds.pr
pr