# Metadata-aware data analysis with cf-python and cf-plot

----

## Context and learning objectives

### What are the NCAS CF Data Tools and why do they all have 'cf' in the name?

The _NCAS CF Data Tools_ are a suite of complementary Python libraries which are designed to facilitate working with data for research in the earth sciences and aligned domains. The two that are of most relevance to the average user, and those wanting to process, analyse and visualise atmospheric data, are *cf-python* (https://ncas-cms.github.io/cf-python/) and *cf-plot* (https://ncas-cms.github.io/cf-plot/build/). We will be focusing on use of cf-python and cf-plot today.

The 'cf' in the names of the NCAS CF Data Tools corresponds to the _CF Conventions_, a metadata standard, because they are built around this standard in the form of using the CF Data Model, which as well as performance is considered a 'unique selling point' of the tools.


### What are the CF Conventions?

The _CF Conventions_, usually referred to in this way but also know by the full name of the **C**limate and **F**orecast (CF) metadata conventions, are a metadata standard which is becoming the de-facto convention to cover the description of geoscientific data so that sharing and intercomparison is simpler. See https://cfconventions.org/ for more information.


### What are we going to learn in this session?

Our **learning aim** is to be able to use the NCAS CF Data Tools Python libraries, namely cf-python and cf-plot to process, analyse and visualise netCDF and PP datasets, whilst appreciating the context and 'unique selling point' of the libraries as being built to use the CF Conventions, a metadata standard for earth science data, to make it simpler to do what you want to do with the datasets, by working on top of a Data Model for CF.

We have **six distinct objectives**, matching the sections in this notebook. By the end of this lesson you should be familiar and have practiced using cf-python and cf-plot to do the following.

1. **From netCDF to field constructs and back**: read in netCDF files, create a new field construct by modification of data and metadata and then write out the new field to a new netCDF file.
2. **Basic data analysis, with plotting of results**: Plot the data before and after applying statistical collapses.
3. **Regridding domains, with plotting of results**:  plot the data before and after regridding across spherical and cartesian coordinate systems.

<div class="alert alert-block alert-info">
<i>Note:</i> much of what you can do with cf-python you can do with the xarray library. Use whichever approach, the cf-python/cf-plot way, or the xarray way, works best for you! However, we want to emphasise that the NCAS CF Data Tools are built around the CF Conventions whereas xarray is not, so cf-python and cf-plot offer better metadata awareness to xarray, which could be a core advantage to our approach for users in/from geoscience. (If you have suggestions for how we can improve cf-python and/or cf-plot for you or your work, please let us know through the Issue Trackers linked at the end of this Notebook.)
</div>

## 0. Setting up

**In this section we set up this Notebook, import the libraries and check the data we will work with, ready to use the libraries within this notebook.**

Run some set up for nice outputs in this Jupyter Notebook (not required in interactive Python or a script):

In [2]:
%matplotlib inline

import warnings
warnings.filterwarnings("ignore")

Import cf-python and cf-plot:

In [3]:
import cfplot as cfp
import cf

Inspect the versions of cf-python and cf-plot and the version of the CF Conventions those are matched to:

<div class="alert alert-block alert-info">
<i>Note:</i> you can work with data compliant by any other version of the CF Conventions, or without (much) compliance, but the CF Conventions version gives the maximum version that these versions of the tools understand the features of.
</div>

Finally, see what datasets we have to explore:

<div class="alert alert-block alert-info">
<i>Note:</i> in a Jupyter Notebook, '!' precedes a shell command, so this is a terminal command and not Python
</div>

In [6]:
!ls ../../ncas_data

160by320griddata.nc			   precip_2010.nc
aaaaoa.pmh8dec.pp			   precip_DJF_means.nc
alpine_precip_DJF_means.nc		   qbo.nc
data1.nc				   regions.nc
data1-updated.nc			   rgp.nc
data2.nc				   sea_currents_backup.nc
data3.nc				   sea_currents.nc
data5.nc				   ta.nc
ggas2014121200_00-18.nc			   tripolar.nc
IPSL-CM5A-LR_r1i1p1_tas_n96_rcp45_mnth.nc  two_fields.nc
land.nc					   ua.nc
model_precip_DJF_means_low_res.nc	   u_n216.nc
model_precip_DJF_means.nc		   u_n96.nc
n2o_emissions.nc			   vaAMIPlcd_DJF.nc
POLCOMS_WAM_ZUV_01_16012006.nc		   va.nc
precip_1D_monthly.nc			   wapAMIPlcd_DJF.nc
precip_1D_yearly.nc


----

## 1. From netCDF to field constructs and back

### Read in netCDF files, create a new field construct by modification of data and metadata and then write out the new field to a new netCDF file

In [None]:
# Read a data file
field_list = cf.read('../../ncas_data/ua.nc')

In [None]:
field = field_list[0]

In [None]:
field

In [None]:
print(field)  # more detail

In [None]:
field.dump()   # maximal (metadata) detail!

In [None]:
squared_field = field * field

In [None]:
print(field.data)
print(squared_field.data)

In [None]:
print(field.units)
print(squared_field.units)

In [None]:
print(field.standard_name)
print(squared_field.standard_name)  # this will fail! (explanation to follow!)

In [None]:
squared_field.standard_name = 'square_of_eastward_wind'

In [None]:
print(field.standard_name)
print(squared_field.standard_name)  # this now does not fail, as we have re-assigned a standard name

We can write out field constructs into netCDF files in any combination we wish. Let's squared field to a netCDF file:

In [None]:
cf.write(squared_field, 'squared_e_wind.nc')

In [None]:
# Note that in IPython ! preceeeds a shell command
!ls

In [None]:
! ncdump -h squared_e_wind.nc

*Here we have read in a field construct from netCDF, created a new field based on the other field's data and metadata, modified the metadata of the new field, and then written in out to a netCDF file.*

----

## 2. Basic data analysis, with plotting of results

### Plot the data before and after applying statistical collapses

In [None]:
a = cf.read('ncas_data/qbo.nc')[0]

In [None]:
print(a)

In [None]:
b = a.collapse('maximum', axes='T')  # temporal maximum

In [None]:
print(b)

In [None]:
b_sub = b.subspace(X=30)
print(b)

In [None]:
cfp.con(b_sub)

In [None]:
cfp.con(b.subspace(X=0))

In [None]:
c = a.collapse('mean', axes='X')  # horizontal mean

In [None]:
print(c)

In [None]:
c_sub = c.subspace(T=cf.dt('1979-01-16 09:00:00'))

cfp.con(c_sub)

*That was a demo of some very basic statistical collapsing and sub-spacing.**

----

## 3. Regridding domains, with plotting of results

### Plot the data before and after regridding across spherical and cartesian coordinate systems

#### a) Regridding across spherical coordinate systems: conservative method as an example

Read in two fields, ``f`` and ``g``, where ``f`` is gridded at about twice the resolution of ``g``:

In [None]:
# Read in a precipitation field and inspect it
f = cf.read('ncas_data/precip_2010.nc')[0]
print(f)

In [None]:
# Read in another, lower-resolution, precipitation field and inspect it
g = cf.read('ncas_data/model_precip_DJF_means_low_res.nc')[0]
print(g)

Regrid the first field to the grid of the second. We use the `regrids` method of cf-python.

In [None]:
h_1 = f.regrids(g, method='patch')
h_2 = f.regrids(g, method='conservative')
h_1.equals(h_2)

Now let's inspect what we have, by plotting the field "before and after" (though actually we keep two different fields) the regridding:

In [None]:
# Take some subspaces first:
f_sub = f[0]
h_1_sub = h_1[0]
h_2_sub = h_2[0]


# Customising the plots to look nicer
cfp.mapset()
#cfp.mapset(proj='robin')
cfp.cscale('rh_19lev')

cfp.gopen(rows=1, columns=2)
cfp.gpos(1)
cfp.con(f_sub, blockfill=True, lines=False, colorbar_orientation='vertical',
        title='Precipitation field before regridding')
cfp.gpos(2)
cfp.con(h_1_sub, blockfill=True, lines=False, colorbar_orientation='vertical',
        title='...and after regridding with patch recovery')
cfp.gclose()

print("Comparing results fom different regridding methods:")
cfp.con(h_2_sub - h_1_sub, lines=False)

As we expect, the regridded field resembles the original in its nature, but is at lower-resolution due to its new grid.

#### b) Regridding across cartesian coordinate systems: time series as an example

The term 'regridding' brings to mind a multi-dimensional grid e.g. over the earth's surface, but a 'grid' is really just a set of points in a multi-dimensional space. In 1D, this is just a series of data points.

Cartesian regridding can be used for 1 to 3 dimensions, so we can use it to "regrid" such a series, and let's use a time series as an example.

Again, start by reading in some (different) precipitation fields, in this case ``i`` and ``j`` which form a pair of time series with different domains/grids i.e. numbers of time data points:

In [None]:
# Read in a precipitation field and inspect it
i = cf.read('ncas_data/precip_1D_yearly.nc')[0]
print(i)

In [None]:
j = cf.read('ncas_data/precip_1D_monthly.nc')[0]
print(j)

Regrid linearly along the time axis 'T' and summarise the resulting field. This time, because we are working with cartesian coordinates, we need to use the `regridc` method on the field acting as the source domain.

For diversity, we use a different regridding method. Let's use linear interpolation, by setting `method='linear'`:

In [None]:
k = i.regridc(j, axes='T', method='linear')
print(k)

Plot the time series before and after regridding

In [None]:
cfp.gopen(rows=1, columns=2)
cfp.gpos(1)
cfp.lineplot(i, marker='o', color='red',
             title='Original time series... before regridding')
cfp.gpos(2)
cfp.lineplot(k, marker='o', color='blue', title='... and after regridding')
cfp.gclose()

In this case, we've seen that regridding can apply not just to multi-dimensional coordinates but to *data series* (which are *1D "grids"*).

As you can see, again the nature of the regridding output is preserved, but the granularity has changed, in this case becoming higher.

----

## Where to find more information and resources on the NCAS CF Data Tools

Here are some links relating to the NCAS CF Data Tools and this training. The **first two are the official documentation pages**
which we advise you consult first if you want to know more:

* **The cf-python documentation lives at https://ncas-cms.github.io/cf-python/.**
* **The cf-plot documentation lives at https://ncas-cms.github.io/cf-plot/build/.**
* This training, with further material, is hosted online and there are instructions for setting up the environment so you can work through it in your own time: https://github.com/NCAS-CMS/cf-tools-training.
* The cf-python code lives on GitHub at https://github.com/NCAS-CMS/cf-python. There is an Issue Tracker to report queries or questions at https://github.com/NCAS-CMS/cf-python/issues.
* The cf-plot code lives on GitHub at https://github.com/NCAS-CMS/cf-plot. There is an Issue Tracker to report queries or questions at https://github.com/NCAS-CMS/cf-plot/issues.
* There is a technical presentation about the NCAS CF Data Tools avaialble from https://hps.vi4io.org/_media/events/2020/summer-school-cfnetcdf.pdf.
* The website of the CF Conventions can be found at https://cfconventions.org/.
* The landing page for training into the CF Conventions is found here within the website above: https://cfconventions.org/Training/.

If you have any queries after this course, please either use the Issue Trackers linked above or you can email Sadie at: sadie.bartholomew@ncas.ac.uk.

----