# Search and Access CMIP5 data from NCI

In this notebook, we demonstrate how to use CleF and Xarray to search and access CMIP5 data from NCI. You need to run this notebook in VDI so as to access data through file system. 

The following material uses Coupled Model Intercomparison Project (CMIP5) collections. The CMIP5 terms of use are found [here](https://cmip.llnl.gov/cmip5/terms.html). For more information on the collection, please click [here](https://geonetwork.nci.org.au/geonetwork/srv/eng/catalog.search#/metadata/f3525_9322_8600_7716).

## Set up environment

In [None]:
!module use /g/data3/hh5/public/modules
!module load conda/analysis3-unstable

We use Clef to search CMIP5 data path. Clef is available within the analysis3-unstable packages. 

Clef searches the Earth System Grid Federation datasets stored at the Australian National Computational Infrastructure, both data published on the NCI ESGF node as well as files that are locally replicated from other ESGF nodes.

Currently it searches for the following datasets:

- CMIP5 raijin projects: rr3, where NCI is the primary publisher and al33 for replicas
- CMIP6 raijin projects: 0i10 for replicas

For more detailed information about using 'Clef', check out this [website](https://clef.readthedocs.io/en/latest/gettingstarted.html).

## Import python packages

In [2]:
import xarray as xr
%matplotlib inline

## Use Clef -cmip5 to serach data

First, check out help information to see search options in clef cmip5.

In [None]:
!clef cmip5 --help

Then, we will search available temperature and precipitation data by defining all the parameters. See example below:

In [None]:
!clef cmip5  --experiment rcp26  --ensemble r1i1p1 --table Amon   --variable tas --variable pr

We can then set up values for CMIP5 attributions according to the CleF search results.

CMIP5 data are organised according to its global attributes. We can access different data by changing attributes in the directory below:
**/g/data1b/al33/replicas/CMIP5/product/institute/model/experiment/frequency/realm/table/ensemble/version/variable**

There are four Representative Concentration Pathways (RCPs) in CMIP5. These are greenhouse gas concentration (not emissions) trajectory adopted by the IPCC for its fifth Assessment Report (AR5) in 2014. It supersedes the Special Report on Emissions Scenarios (SRES) projections published in 2000. For more information, see [here]( https://sedac.ciesin.columbia.edu/ddc/ar5_scenario_process/RCPs.html).
 

Below, we have set up these attributes in oreder to get the future projection data under the rcp26 senario using member 'r1i1p1' of 'MIROC-ESM' model simulations as an example. 

<div class="alert alert-info">
<b>NOTE: </b>Due to different DRS (Directory Reference Structure) between CMIP5 and CMIP6, search syntax using clef is slightly different between two datasets. They have to be strictly consistent with each DRS tree and they are also case sensitive.   
</div>

See a wrong request as below: 

In [None]:
!clef cmip5  --activity ScenarioMIP  --source_id  CNRM-CM6-1 --table Amon  --variable tas  --variable pr   --grid gr 

## Show data file names

In [6]:
!ls /g/data1b/al33/replicas/CMIP5/combined/MIROC/MIROC-ESM/rcp26/mon/atmos/Amon/r1i1p1/v20120710/tas/

tas_Amon_MIROC-ESM_rcp26_r1i1p1_200601-210012.nc


In [7]:
!ls /g/data1b/al33/replicas/CMIP5/combined/MIROC/MIROC-ESM/rcp26/mon/atmos/Amon/r1i1p1/v20120710/pr/

pr_Amon_MIROC-ESM_rcp26_r1i1p1_200601-210012.nc


## Use Xarray to open data

#### temperature

In [8]:
cmip5Dir='/g/data1b/al33/replicas/CMIP5'
product='combined'
institute='MIROC'
model='MIROC-ESM'
experiment='rcp26' 
frenquency='mon'
realm='atmos'
table='Amon'
ensemble='r1i1p1'
version='v20120710'
variable='tas'  
period='200601-210012'
ds=xr.open_dataset(''+cmip5Dir+'/'+product+'/'+institute+'/'+model+'/'+experiment+'/'+frenquency+'/'+realm+'/'+table+'/'+ensemble+'/'+version+'/'+variable+'/'+variable+'_'+table+'_'+model+'_'+experiment+'_'+ensemble+'_'+period+'.nc')
tas=ds.tas
tas

<xarray.DataArray 'tas' (time: 1140, lat: 64, lon: 128)>
[9338880 values with dtype=float64]
Coordinates:
  * time     (time) datetime64[ns] 2006-01-16T12:00:00 2006-02-15 ...
  * lat      (lat) float64 -87.86 -85.1 -82.31 -79.53 -76.74 -73.95 -71.16 ...
  * lon      (lon) float64 0.0 2.812 5.625 8.438 11.25 14.06 16.88 19.69 ...
    height   float64 2.0
Attributes:
    standard_name:     air_temperature
    long_name:         Near-Surface Air Temperature
    units:             K
    original_name:     T2
    cell_methods:      time: mean
    cell_measures:     area: areacella
    history:           2011-09-13T04:34:56Z altered by CMOR: Treated scalar d...
    associated_files:  baseURL: http://cmip-pcmdi.llnl.gov/CMIP5/dataLocation...

#### precipitation

In [9]:
cmip5Dir='/g/data1b/al33/replicas/CMIP5'
product='combined'
institute='MIROC'
model='MIROC-ESM'
experiment='rcp26' 
frenquency='mon'
realm='atmos'
table='Amon'
ensemble='r1i1p1'
version='v20120710'
variable='pr'  
period='200601-210012'
ds=xr.open_dataset(''+cmip5Dir+'/'+product+'/'+institute+'/'+model+'/'+experiment+'/'+frenquency+'/'+realm+'/'+table+'/'+ensemble+'/'+version+'/'+variable+'/'+variable+'_'+table+'_'+model+'_'+experiment+'_'+ensemble+'_'+period+'.nc')
pr=ds.pr
pr

<xarray.DataArray 'pr' (time: 1140, lat: 64, lon: 128)>
[9338880 values with dtype=float64]
Coordinates:
  * time     (time) datetime64[ns] 2006-01-16T12:00:00 2006-02-15 ...
  * lat      (lat) float64 -87.86 -85.1 -82.31 -79.53 -76.74 -73.95 -71.16 ...
  * lon      (lon) float64 0.0 2.812 5.625 8.438 11.25 14.06 16.88 19.69 ...
Attributes:
    standard_name:     precipitation_flux
    long_name:         Precipitation
    comment:           at surface; includes both liquid and solid phases from...
    units:             kg m-2 s-1
    original_name:     PRCP
    original_units:    kg/m**2/s
    history:           2011-09-13T04:36:18Z altered by CMOR: Converted units ...
    cell_methods:      time: mean
    cell_measures:     area: areacella
    associated_files:  baseURL: http://cmip-pcmdi.llnl.gov/CMIP5/dataLocation...