# P3. Reducing datasets by subspacing and collapsing

## Practical Notebook 3 (of 6) for *Intro to the NCAS CF Data Tools, cf-python and cf-plot*

**In this section we show how multi-dimensional data can be tamed using cf-python so that you can get a reduced form that can be analysed or plotted, by reducing the dimensions by selecting a subset of point(s) along the axes or collapsing down according to some statistic such as the mean or an extrema.**

***

<div class="alert alert-block alert-success">
<i>Practical instructions:</i> run the cell in this section to do the set up, which is not part of the practical proper.
</div>

## Setting up

**In this short prelude we set up this Notebook, import the libraries and check the data we will work with, ready to use the libraries and the data (exactly as per the first Notebook setup but in one cell only for quick execution).**

In [1]:
# Set up for inline plots - only needed inside a Notebook environment - and to ignore some repeating warnings
%matplotlib inline

import warnings
warnings.filterwarnings("ignore")

# Import the two CF Data Tools libraries and inspect the versions
import cfplot as cfp
import cf
print("--- Version report: ---")
print("cf-python version is:", cf.__version__)
print("cf-plot version is:", cfp.__version__)
print("CF Conventions version is:", cf.CF())

# See what datasets we have to explore within the data directory we use throughout this course
print("--- Datasets available from the path '../data': ---")
# Note that in a Jupyter Notebook, '!' precedes a shell command - so this is a command, not Python
!ls ../data

--- Version report: ---
cf-python version is: 3.18.1
cf-plot version is: 3.4.0
CF Conventions version is: 1.12
--- Datasets available from the path '../data': ---
160by320griddata.nc			   precip_2010.nc
aaaaoa.pmh8dec.pp			   precip_DJF_means.nc
alpine_precip_DJF_means.nc		   qbo.nc
data1.nc				   regions.nc
data1-updated.nc			   rgp.nc
data2.nc				   sea_currents_backup.nc
data3.nc				   sea_currents.nc
data5.nc				   ta.nc
ggas2014121200_00-18.nc			   tripolar.nc
IPSL-CM5A-LR_r1i1p1_tas_n96_rcp45_mnth.nc  two_fields.nc
land.nc					   ua.nc
model_precip_DJF_means_low_res.nc	   u_n216.nc
model_precip_DJF_means.nc		   u_n96.nc
n2o_emissions.nc			   vaAMIPlcd_DJF.nc
POLCOMS_WAM_ZUV_01_16012006.nc		   va.nc
precip_1D_monthly.nc			   wapAMIPlcd_DJF.nc
precip_1D_yearly.nc


***

<div class="alert alert-block alert-success">
<i>Practical instructions:</i> now we can start the practical. We will follow the same sectioning as in the teaching notebook, so please consult the notes there in the matching section for guidance and you can also consult the cf-python and cf-plot documentation linked above.
</div>

## 3. Reducing datasets by subspacing and collapsing

### a) Subspacing using metadata conditions

**3.a.1)** Read in the file `ggas2014121200_00-18.nc` which is under `../data` and save the corresponding field list to a variable called `fieldlist`. Inspect it with medium level of detail.

**3.a.2)** Extract the field representing the `cloud_area_fraction` to a variable which we will call `cloud_field`. Inspect that also with medium level of detail.

**3.a.3)** Save the data of the `cloud_field` to a new variable we will call `cloud_field_data`. Do a `print` on the `shape` method of this to confirm the shape of the data, and compare this to the insepction report from the previous cell to see the same information represented in different ways.

**3.a.4)** Make a subspace of the `cloud_field` from the cells above to subspace on the *first* time point in order. Note: doing date-time subspaces requires an extra step due to the nature of specifying dates and times which can be ambiguous otherwise: you need to wrap a quoted datetime string in the call `cf.dt()` to notify cf-python that you are providing a valid datetime string, e.g. `field1.subspace(time=cf.dt("2020-01-01 12:15:00"))`.

Assign the subspace operation resulting field to a variable `cloud_field_subspace_1` and inspect it with medium detail.

*Extra task, for those who have studied section 4 before doing this practical**: make a contour plot of this subspace of the field to see what it looks like.*

**3.a.5** Make a subspace of the `cloud_field` from the cells above to subspace on the *last* point on the latitude axis.

Assign the subspace operation resulting field to a variable `cloud_field_subspace_2` and inspect it with medium detail.

*Extra task, for those who have studied section 4 before doing this practical: make a contour plot of this subspace of the field to see what it looks like.*

### b) Subspacing using indexing, including equivalency to the above

**3.b.1)** Take the cloud field from (3.a.2) which we have been subspacing in the previous cells and make a subspace which takes the first time point, leaving all other axes unchanged, but this time do it using indexing. Assign that to the variable name `cloud_field_collapse_1_by_index`.

Use the `equals` method of a field to check that the result is the same as that derived from the 'subspacing by metadata' approach in section (3.a.4).

**3.b.2)** Now make a subspace on the original `cloud_field`, leaving all other axes unchanged, to subspace on the *last* point on the latitude axis, like before, but this time use subspacing by indexing. Assign that to the variable name `cloud_field_collapse_2_by_index`.

Use the `equals` method of a field to check that the result is the same as that derived from the 'subspacing by metadata' approach in section (3.a.4).

**3.b.3)** Using indexing, do both of the subspaces from the previous sub-questions in one call on the original cloud field in one operation. Assign that to the variable name `cloud_field_collapse_3_by_index`. Confirm that it gives the same result as performing the operation with indices in two operations, i.e. applying the indices from (3.b.2) to the output field of (3.b.1).

Extra: do the same operation using the 'subspace by metadata' approach and use the `equals` method to show that the results are the same. Assign that to the variable name `cloud_field_collapse_3_by_subspace`. 

**3.b.4)** Do a single subspace on the original cloud field that takes the first 100 latitude and the first 200 longitude values. Use whichever method (subspacing by metadata, or indexing) you prefer, in order to do so. We won't use this again, so there is no need to assign it to a variable (but you may do so if you wish, in which case you'll need another line to inspect it).

### c) Statistical collapses

**3.c.1)** Take the original `cloud_field` from (3.a.2) and do a collapse over the time axis to reduce it down to the minimum value. Assign that to the variable name `cloud_field_collapse_1`.

**3.c.2)** Take the original `cloud_field` from (3.a.2) and do a collapse over the latitude axis to reduce it down to the mean value. Assign that to the variable name `cloud_field_collapse_2`.

**3.c.3)** Take the original `cloud_field` from (3.a.2) and do a collapse over the longitude axis to reduce it down to the maximum value. Assign that to the variable name `cloud_field_collapse_3`.

**3.c.4)** Finally, take the original `cloud_field` from (3.a.2) again and do a collapse over all horizontal space via the pair of horizontal spatial axes, latitude and longitude, to reduce them down to the standard deviation value. Assign that to the variable name `cloud_field_collapse_4`.

<div class="alert alert-block alert-success">
<i>Practical instructions:</i> this is the end of the section. Please check your work, review the material and then move on to Practical 4 (see the Notebook 'cf_data_tools_practical_04.ipynb').
</div>

***