# Search and Access CMIP6 data from NCI

In this notebook, we demonstrate how to use CleF and Xarray to search and access CMIP5 data from NCI. You need to run this notebook in VDI so as to access data through file system. 

The following material uses Coupled Model Intercomparison Project (CMIP5) collections. The CMIP5 terms of use are found[here](https://pcmdi.llnl.gov/CMIP6/TermsOfUse/TermsOfUse6-1.html). For more information on the collection, please click [here]( https://geonetwork.nci.org.au/geonetwork/srv/eng/catalog.search#/metadata/f6600_2266_8675_3563).

## Set up environment

In [None]:
!module use /g/data3/hh5/public/modules
!module load conda/analysis3

We use Clef to search CMIP5 data path. Clef is now loaded for our use. 

Clef searches the Earth System Grid Federation datasets stored at the Australian National Computational Infrastructure, both data published on the NCI ESGF node as well as files that are locally replicated from other ESGF nodes.

Currently it searches for the following datasets:

- CMIP5 raijin projects: rr3, where NCI is the primary publisher and al33 for replicas
- CMIP6 raijin projects: 0i10 for replicas

For detailed information about using 'Clef', check out this [webpage](https://clef.readthedocs.io/en/latest/gettingstarted.html).

## Import python package 

In [46]:
import xarray as xr
%matplotlib inline

## Use Clef -cmip6 to serach data

First, check out help information to see search options in clef cmip5.

In [49]:
!clef cmip6 --help

Usage: clef cmip6 [OPTIONS] [QUERY]...

  Search ESGF and local database for CMIP6 files

  Constraints can be specified multiple times, in which case they are
  combined    using OR: -v tas -v tasmin will return anything matching
  variable = 'tas' or variable = 'tasmin'. The --latest flag will check ESGF
  for the latest version available, this is the default behaviour

Options:
  -mip, --activity [AerChemMIP|C4MIP|CDRMIP|CFMIP|CMIP|CORDEX|DAMIP|DCPP|DynVarMIP|FAFMIP|GMMIP|GeoMIP|HighResMIP|ISMIP6|LS3MIP|LUMIP|OMIP|PAMIP|PMIP|RFMIP|SIMIP|ScenarioMIP|VIACSAB|VolMIP]
  -e, --experiment x              CMIP6 experiment, list of available depends
                                  on activity
  --source_type [AER|AGCM|AOGCM|BGC|CHEM|ISM|LAND|OGCM|RAD|SLAB]
  -t, --table x                   CMIP6 CMOR table: Amon, SIday, Oday ...
  -m, --model, --source_id x      CMIP6 model id: GFDL-AM4, CNRM-CM6-1 ...
  -v, --variable x                CMIP6 variable name as in filenames
 

Then, we will search available temperature and precipitation data in ScenarioMIP. See example below:

In [50]:
!clef cmip6  --activity ScenarioMIP  --experiment  ssp126 --member r1i1p1f2 --table Amon  --variable tas   --grid gr 

/g/data1b/oi10/replicas/CMIP6/ScenarioMIP/CNRM-CERFACS/CNRM-CM6-1/ssp126/r1i1p1f2/Amon/tas/gr/v20190219/
/g/data1b/oi10/replicas/CMIP6/ScenarioMIP/CNRM-CERFACS/CNRM-ESM2-1/ssp126/r1i1p1f2/Amon/tas/gr/v20190328/

Everything available on ESGF is also available locally


We can then set up values for CMIP6 attributions according to the CleF search results.

CMIP6 data are organised according to its global attributes. We can access different data by changing attributes from the below directory:
**/g/data1b/oi10/replicas/CMIP6/activity_id/institution_id/source_id/experiment_id
/member_id/table_id/variable/grid_label/version/**

For more information about CMIP6 drs tree (for more information, see this [documentation](https://docs.google.com/document/d/1h0r8RZr_f3-8egBMMh7aqLwy3snpD6_MrDz1q8n5XUk/edit).

In SenarioMIP there are four future pathways of societal development, the Shared Socioeconomic Pathways (SSPs), in which ssp126 is the lowest emission senario. 
For information about SenarioMIP senarios, see here https://www.geosci-model-dev.net/9/3461/2016/   

Below, we have set up these attributes in oreder to get the future projection data under the ssp126 senario using member 'r1i1p1f2' of CNRM_CM6-1 model simulations as an example. 

## Use Xarray to open data

#### temperature

In [51]:
cmip6Dir='/g/data1b/oi10/replicas/CMIP6'
activity='ScenarioMIP'
institute='CNRM-CERFACS'
source='CNRM-CM6-1'
experiment='ssp126' 
member='r1i1p1f2'
table='Amon'
variable='tas'  
grid='gr'
version='v20190219'
period='201501-210012'
ds=xr.open_dataset(''+cmip6Dir+'/'+activity+'/'+institute+'/'+source+'/'+experiment+'/'+member+'/'+table+'/'+variable+'/'+grid+'/'+version+'/'+variable+'_'+table+'_'+source+'_'+experiment+'_'+member+'_'+grid+'_'+period+'.nc')
tas=ds.tas
tas

<xarray.DataArray 'tas' (time: 1032, lat: 128, lon: 256)>
[33816576 values with dtype=float32]
Coordinates:
  * lat      (lat) float64 -88.93 -87.54 -86.14 -84.74 ... 86.14 87.54 88.93
  * lon      (lon) float64 0.0 1.406 2.812 4.219 ... 354.4 355.8 357.2 358.6
    height   float64 ...
  * time     (time) datetime64[ns] 2015-01-16T12:00:00 ... 2100-12-16T12:00:00
Attributes:
    online_operation:    average
    cell_methods:        area: time: mean
    interval_operation:  900 s
    interval_write:      1 month
    standard_name:       air_temperature
    description:         Near-Surface Air Temperature
    long_name:           Near-Surface Air Temperature
    history:             none
    units:               K
    cell_measures:       area: areacella

#### precipitation

In [None]:
cmip6Dir='/g/data1b/oi10/replicas/CMIP6'
activity='ScenarioMIP'
institute='CNRM-CERFACS'
source='CNRM-CM6-1'
experiment='ssp126' 
member='r1i1p1f2'
table='Amon'
variable='pr'  
grid='gr'
version='v20190219'
period='201501-210012'
ds=xr.open_dataset(''+cmip6Dir+'/'+activity+'/'+institute+'/'+source+'/'+experiment+'/'+member+'/'+table+'/'+variable+'/'+grid+'/'+version+'/'+variable+'_'+table+'_'+source+'_'+experiment+'_'+member+'_'+grid+'_'+period+'.nc')
pr=ds.pr
pr