# ARCDAP-3 ESMValTool Worksheet 

---
## Module 1+ 

---

In [None]:
# created by Gerald on 20 Jan 2020

# copy & paste the ESM_Worksheets folder into the ~/ESMValtool/ directory and open it there.

Welcome to Module 1+ of the ESMValTool Worksheet designed for the hands-on sessions during the ARCDAP-3 workshop. Module 1+ will cover CMORizing an observation/reanalysis dataset into a CF/CMOR-compliant file that can be read by ESMValTool. Please only start this worksheet after you've completed Module 1. 

Enter your details in the cell below:

In [None]:
# Name: 
# Organisation: 
# GCMs used: 

In [None]:
#imports 

import numpy as np
import scipy as sp
import xarray as xr

from IPython.display import Image, display

import fnmatch # find match

import glob
from pprint import pprint
from pathlib import Path

## Task 1+: CMORizing a Raw Observation file  
---

**Data and Scripts needed:** <br>
Raw Observation output: <br>
- 1x HadISST (*HadISST_sst.nc*) <br>
- Variables: ts/SST <br>
- Frequency: Monthly mean <br>

Script: */home/arcdap/miniconda3/envs/esmvaltool/lib/python3.7/site-packages/esmvaltool/cmorizers/obs/cmorize_obs_hadisst.ncl*

---

ESMValTool accepts input data from various models as well as observations and reanalysis data, provided that they adhere to the Climate and Forecast (CF)/ Climate Model Output Rewriter (CMOR) format. Observational and reanalysis products in the standard CF/CMOR format used in CMIP and required by the ESMValTool are available via the [obs4mips](https://esgf-node.llnl.gov/projects/obs4mips/) and [ana4mips](https://esgf.nccs.nasa.gov/projects/ana4mips/) proejcts, respectively. 

To process datasets that are not available in these archives, they can first be obtained by the user from the respective sources and thenreformatted to the CF/CMOR standard using the cmorizers included in the ESMValTool. The cmorizers are dataset-specific scripts that can be run once to generate a local pool of observational datasets for usage with the ESMValTool. For more info see [Acquiring Input Data](https://esmvaltool.readthedocs.io/en/latest/getting_started/inputdata.html)

The ERA-Interim files that you downloaded have already been CMORized, hence they were stored in the *~/Obs* folder in Module 1, Task 0. Raw Observations (non-CMORized) like the *HadISST_sst.nc* file you downloaded are typically placed in a different folder (e.g. *~/RawObs*) to avoid confusion. This task will go through the process of CMORizing a raw observation file into one that can be read by ESMValTool. 

### T1+.1. Open and inspect the HadISST CMORizing script 

As mentioned above, CMORizers are dataset-specific scripts which typically provide information on what raw observation data files to download and how to name them, etc. It is important to understand what each script needs from the user (you) as this may differ between different datasets. 

In [None]:
!geany /home/arcdap/miniconda3/envs/esmvaltool/lib/python3.7/site-packages/esmvaltool/cmorizers/obs/cmorize_obs_hadisst.ncl

1. You should see the ```Downloading and processing instructions``` on line 14-15. For this exercise you've already been provided with the unzipped .nc file and stored it in the *~/RawObs/Tier2/HadISST* directory which you did in Module 1. No renaming of the file names is needed.

2. Scroll down and you'll see that there are options to declare the period of data and selected variables you wish to CMORIze. You can see in line 42 that the ```standard name``` refers to the variable names in the CF/CMOR convention used by CMIP models. ```Name in raw data``` are then simply what the variables are natively called by the raw dataset itself. 

### T1+.2. Edit the HadISST CMORizer script

1. For this particular script, the default lines 43, 46, 49, 52 are such that the script will search for HadISST data files of the ```sst``` and ```ice``` variables and CMORize them. Since we do not have the *HadISST_ice.nc* dataset for this exercise, we have to edit the code such that it only searches for the *HadISST_sst.nc* file. To do so, remove ```"sic"``` from ```VAR``` in line 43 and likewise for its corresponding arguments in lines 46, 49 and 52. Save your recipe file when done.

*Alternatively you can comment out these lines in NCL (NCAR Command Language) using a semicolon ```;```.*

<img src='Images/HadISST_cmor.jpg'>

### T1+.3. CMORize the HadISST data

1. Run the CMORizer script for HadISST in the cell below: 

In [None]:
!  cmorize_obs -c config-user-example.yml -o HadISST

*Note how you only have to specify the name of the raw observation/reanalysis dataset in the second (-o) argument for the cmorize_obs programme.*

2. Navigate again to the directory containing the CMORizer output.

In [None]:
# Insert output directory
! ls 

4. The output for your CMORizer script will be stored in a sub-directory structured as *cmorize_obs _ [YYYYMMDD] _ [HHMMSS]*. Use the code below to inspect the output .nc files that have been CMORized. 

In [None]:
home = str(Path.home())  #Your home directory

# Insert the sub-folder containing the cmorizer output into the ''
# e.g. dirname1='cmorize_obs_20200117_070100'
dirname1='';  
# Full path of the folder containing the plots produced by the recipe 
dirname=home+'/ESMValTool/esmvaltool_output/'+ dirname1 +'/Tier2/HadISST/';

# List of .png image files in the dirname directory
listing = glob.glob(dirname+'*.nc')

pprint(listing)

You'll see that the output files follow the naming convention ```OBS_[dataset]_[type]_[version]_[mip]_[short_name]_YYYYMM_YYYYMM.nc```, where type may be sat (satellite data), reanaly (reanalysis data), ground (ground observations), clim (derived climatologies), campaign (aircraft campaign). The file names may already be familiar from the CMORized ERA-Interim files that you downloaded (which have the ```OBS6``` tag instead). 

### T1+.4. Inspect the CMORized HadISST.nc files

1. Use the xarray library to open and inspect the 'ts' dataset. 

In [None]:
# Insert in the quotations the file path of the CMORized .nc file 
xr.open_dataset("")

2. Compare this against the raw dataset. Comment on some differences you observe between the raw and CMORized data files. 

In [None]:
# Insert in the quotations the file path of the raw .nc file 
xr.open_dataset("")

**Comments:**

### T1+.5. Plot the global mean SST from 1979 to 2014 from the HadISST dataset

1. You'll use the same *recipe_python.yml* that you used in Module 1. However, the first thing you need to do is to move your CMORized HadISST .nc files to the */Obs* directory. Remember that was where you set the path for OBS files in the *config-user-example.yml* file, and that your CMORized output is currently in the *esmvaltool_output* directory. 

In [None]:
# Use this command to move the entire 'Tier2' folder in the cmorizer 
# output directory to the ~/Obs directory 
# Insert your output directory in between esmvaltool_output/  and  /Tier2
! mv esmvaltool_output/  /Tier2/HadISST /home/arcdap/Obs/Tier2

2. Check that the files have been successfully moved and are in the right directory. 

In [None]:
! ls ~/Obs/Tier2/HadISST

Now modify the *recipe_python.yml* file from Module 1 to plot the mean ts from 1979 to 2014. 

3. First thing to do is to edit the ```datasets``` section to include only the HadISST data. Observation/Reanalysis datasets are specified slightly different than Models here due to the different "keys" used in the naming convention. For the HadISST data, the entry under ```datasets```  would be: 


```- {dataset: HadISST,  project: OBS, mip: Amon, type: reanaly,  version: 1,  start_year: 1979,  end_year: 2014,  tier: 2}```

4. Next thing to modify is the variables, replace the lines 

```ta:
    preprocessor: preprocessor1
pr:```

with simply

``` ts:```

5. Run the recipe in the cell below! 

In [None]:
# Type in the esmvaltool recipe to run after the config-user-example
# file.  
! esmvaltool -c config-user-example.yml 

6. Once successful, navigate again to the relevant output directory and plot the mean sea surface temperature (ts) from 1979 to 2014 for the HadISST datasset. 

In [None]:
! ls esmvaltool_output/ 

In [None]:
# Insert the sub-folder containing the recipe output into the ''
# e.g. dirname2='recipe_python_20200117_070100'
dirname2='';  
# Full path of the folder containing the plots produced by the recipe 
dirname3=home+'/ESMValTool/esmvaltool_output/'+ dirname2 +'/plots/diagnostic1/script1/';

# List of .png image files in the dirname directory
listing2 = glob.glob(dirname3+'*.png')

pprint(listing2)

In [None]:
# Display the plot .png file 
display(Image(listing2[0]))

---

In [None]:
# end of file, Gerald, last edited 21/1/2020.