In [None]:
%matplotlib inline

## Iris introduction course
# 3. Cube Control and Subsetting

**Learning outcome**: by the end of this section, you will be able to apply Iris functionality to take a useful subset of an Iris cube and to combine multiple Iris cubes into a new larger cube.

**Duration:** 1 hour

**Overview:**<br>
3.1 [Indexing and Slicing](#indexing)<br>
3.2 [Constraints and Extraction](#constrain_extract)<br>
3.3 [Merge](#merge)<br>
3.4 [Concatenate](#concatenate)<br>
3.5 [Summary of the Section](#summary)

## Setup

In [4]:
import iris
import numpy as np

In [5]:
print(iris.__version__)
print(np.__version__)

2.3.0dev0
1.15.4


----

## 3.1 Indexing and Slicing<a id='indexing'></a>

Cubes can be indexed in a familiar manner to that of NumPy arrays:

In [6]:
fname = iris.sample_data_path('uk_hires.pp')
cube = iris.load_cube(fname, 'air_potential_temperature')
print(cube.summary(shorten=True))

air_potential_temperature / (K)     (time: 3; model_level_number: 7; grid_latitude: 204; grid_longitude: 187)


In [7]:
subcube = cube[..., ::2, 15:35, :10]
subcube.summary(shorten=True)

'air_potential_temperature / (K)     (time: 3; model_level_number: 4; grid_latitude: 20; grid_longitude: 10)'

Note: the result of indexing a cube is *always* a copy and never a *view* on the original data.

### Iteration<a id='iteration'></a>

We can loop through all desired subcubes in a larger cube using the cube methods ``slices`` and ``slices_over``.

In [8]:
fname = iris.sample_data_path('uk_hires.pp')
cube = iris.load_cube(fname,
                      iris.Constraint('air_potential_temperature',
                                      model_level_number=1))
print(cube.summary(True))

air_potential_temperature / (K)     (time: 3; grid_latitude: 204; grid_longitude: 187)


The **``slices``** method returns all the slices of a cube on the dimensions specified by the coordinates passed to the slices method.

So in this example, each `grid_latitude` / `grid_longitude` slice of the cube is returned:

In [9]:
for subcube in cube.slices(['grid_latitude', 'grid_longitude']):
    print(subcube.summary(shorten=True))

air_potential_temperature / (K)     (grid_latitude: 204; grid_longitude: 187)
air_potential_temperature / (K)     (grid_latitude: 204; grid_longitude: 187)
air_potential_temperature / (K)     (grid_latitude: 204; grid_longitude: 187)


We can use **``slices_over``** to return one subcube for each coordinate value in a specified coordinate. This helps us when trying to retrieve all the slices along a given cube dimension.

For example, let's consider retrieving all the slices over the time dimension (i.e. each time step in its own cube with a scalar time coordinate) using ``slices``. As per the above example, to achieve this using ``slices`` we would have to specify all the cube's dimensions _except_ the time dimension.

Let's take a look at ``slices_over`` providing this functionality:

In [10]:
fname = iris.sample_data_path('uk_hires.pp')
cube = iris.load_cube(fname, 'air_potential_temperature')
for subcube in cube.slices_over('model_level_number'):
    print(subcube.summary(shorten=True))

air_potential_temperature / (K)     (time: 3; grid_latitude: 204; grid_longitude: 187)
air_potential_temperature / (K)     (time: 3; grid_latitude: 204; grid_longitude: 187)
air_potential_temperature / (K)     (time: 3; grid_latitude: 204; grid_longitude: 187)
air_potential_temperature / (K)     (time: 3; grid_latitude: 204; grid_longitude: 187)
air_potential_temperature / (K)     (time: 3; grid_latitude: 204; grid_longitude: 187)
air_potential_temperature / (K)     (time: 3; grid_latitude: 204; grid_longitude: 187)
air_potential_temperature / (K)     (time: 3; grid_latitude: 204; grid_longitude: 187)


### Discussion: Indexing and slicing

* What are the similarities between indexing and slicing?
* What are the differences?
* Which cube slicing method would be easiest to use to return all subcubes along the realization dimension?
* Which cube slicing method would be easiest to use to return all horizontal 2D slices in a 4D cube?
* In what situations would indexing be the best way to subset a cube? What about slicing?

----

## 3.2 Constraints and Extraction<a id='constrain_extract'></a>

We've already seen the basic ``load`` function, but we can also control which cubes are actually loaded with *constraints*. The simplest constraint is just a string, which filters cubes based on their name:

In [13]:
fname = iris.sample_data_path('uk_hires.pp')
print(iris.load(fname, 'air_potential_temperature'))

0: air_potential_temperature / (K)     (time: 3; model_level_number: 7; grid_latitude: 204; grid_longitude: 187)


Iris's constraints mechanism provides a powerful way to filter a subset of data from a larger collection. We've already seen that constraints can be used at load time to return data of interest from a file, but we can also apply constraints to a single cube, or a list of cubes, using their respective ``extract`` methods:

In [14]:
cubes = iris.load(fname)
print(cubes.extract('air_potential_temperature'))

0: air_potential_temperature / (K)     (time: 3; model_level_number: 7; grid_latitude: 204; grid_longitude: 187)


The simplest constraint, namely a string that matches a cube's name, is conveniently converted into an actual ``iris.Constraint`` instance wherever needed. However, we could construct this constraint manually and compare with the previous result:

In [15]:
pot_temperature_constraint = iris.Constraint('air_potential_temperature')
print(cubes.extract(pot_temperature_constraint))

0: air_potential_temperature / (K)     (time: 3; model_level_number: 7; grid_latitude: 204; grid_longitude: 187)


The Constraint constructor also takes arbitrary keywords to constrain coordinate values. For example, to extract model level number 10 from the air potential temperature cube:

In [16]:
pot_temperature_constraint = iris.Constraint('air_potential_temperature',
                                             model_level_number=10)
print(cubes.extract(pot_temperature_constraint))

0: air_potential_temperature / (K)     (time: 3; grid_latitude: 204; grid_longitude: 187)


We can pass a list of possible values, and even combine two constraints with ``&``:

In [17]:
print(cubes.extract('air_potential_temperature' & 
                    iris.Constraint(model_level_number=[4, 10])))

0: air_potential_temperature / (K)     (time: 3; model_level_number: 2; grid_latitude: 204; grid_longitude: 187)


We can define arbitrary functions that operate on each cell of a coordinate. This is a common thing to do for floating point coordinates, where exact equality is non-trivial.

In [20]:
def less_than_10(cell):
    """Return True for values that are less than 10."""
    return cell < 10

print(cubes.extract(iris.Constraint('air_potential_temperature',
                                    model_level_number=less_than_10)))

0: air_potential_temperature / (K)     (time: 3; model_level_number: 3; grid_latitude: 204; grid_longitude: 187)


### Time Constraints<a id='time_constraints'></a>

It is common to want to build a constraint for time.  
This can be achieved by comparing cells containing datetimes

There are a few different approaches for producing time constraints in Iris. We will focus here on one approach for constraining on time in Iris. 

This approach allows us to access individual components of cell datetime objects and run comparisons on those:

In [23]:
time_constraint = iris.Constraint(time=lambda cell: cell.point.hour == 11)
print(cube.extract(time_constraint).summary(True))

air_potential_temperature / (K)     (model_level_number: 7; grid_latitude: 204; grid_longitude: 187)


### Exercise 2

Cell methods are a part of cube metadata that record statistical operations that have been applied to a cube. For example, "`mean: time (6hrs)`" tells us that the cube has had a time mean over a 6hr interval applied.

We can determine what, if any, cell methods a cube has with the attribute `cube.cell_methods`. The following function, then, tells us whether or not a cube has cell methods:

```python
def has_cell_methods(cube):
    return len(cube.cell_methods) > 0
```

1\. With the cubes loaded from ``[iris.sample_data_path('A1B_north_america.nc'), iris.sample_data_path('uk_hires.pp')]`` use the CubeList's **``extract``** method to filter only the cubes that have cell methods. (Hint: Look at the ``iris.Constraint`` documentation for the **cube_func** keyword). You should find that the 3 cubes are whittled down to just 1.

2\. Using the file found at ``iris.sample_data_path('A1B_north_america.nc')`` filter the cube, using constraints, such that only data between 1860 and 1980 remains (hint: This data has a 360-day calendar with yearly data from 1860 to 2100, so we will need to access the individual components of the cell point's datetime, to return a time dimension of length 120).

## 3.3 Merge<a id='merge'></a>

When Iris loads data it tries to reduce the number of cubes returned by collecting together multiple fields with
shared metadata into a single multidimensional cube. In Iris, this is known as merging.

In order to merge two cubes, they must be identical in everything but a scalar dimension, which goes on to become a new data dimension.

The ``iris.load_raw`` function can be used as a diagnostic tool to identify the individual "fields" that Iris identifies in a given set of filenames before any merge takes place:

In [None]:
fname = iris.sample_data_path('GloSea4', 'ensemble_008.pp')
raw_cubes = iris.load_raw(fname)

print(len(raw_cubes))

When we look in detail at these cubes, we find that they are identical in every coordinate except for the scalar forecast_period and time coordinates:

In [None]:
print(raw_cubes[0])
print('--' * 50)
print(raw_cubes[1])

Any CubeList can be merged with the ``merge`` method, and the resulting CubeList from load_raw is no different.
The ``merge`` method *always* returns another CubeList:

In [None]:
merged_cube, = raw_cubes.merge()
print(merged_cube)

When we look in more detail, we can see that the time coordinate has become a new dimension, as well as gaining another forecast_period auxiliary coordinate:

In [None]:
print(merged_cube.coord('time'))
print(merged_cube.coord('forecast_period'))

### Identifying merge problems

In order to avoid the Iris merge functionality making often inappropriate assumptions about incoming data, merge is strict with regards to the uniformity of the incoming cubes.

For example, if we load the fields from two ensemble members from the GloSea4 model sample data, we see we have 12 fields before any merge takes place:

In [None]:
fname = iris.sample_data_path('GloSea4', 'ensemble_00[34].pp')
cubes = iris.load_raw(fname, 'surface_temperature')
print(len(cubes))

If we try to merge these 12 cubes we get 2 cubes rather than one:

In [None]:
incomplete_cubes = cubes.merge(unique=False)
print(incomplete_cubes)

When we look in more detail at these two cubes, what is different between the two? (Hint: One value changes, another is completely missing)

In [None]:
print(incomplete_cubes[0])
print('--' * 50)
print(incomplete_cubes[1])

By adding the missing coordinate, we can trigger a merge of the 12 cubes into a single cube, as expected:

In [None]:
for cube in cubes:
    if not cube.coords('realization'):
        cube.add_aux_coord(iris.coords.DimCoord(np.int32(3),
                                                'realization'))

merged_cubes = cubes.merge()
print(merged_cubes)

Iris includes functionality to simplify the identification process for causes of failed merges. The ``merge_cube`` method of a CubeList expects the list of cubes to contain only cubes that can be merged to produce a single cube. If they do not merge to a single cube, a descriptive exception will be raised. For instance:

```
   >>> cubes.merge_cube()
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     ...
   iris.exceptions.MergeError: failed to merge into a single cube.
     Coordinates in cube.aux_coords (scalar) differ: realization.
```

## 3.4 Concatenate<a id='concatenate'></a>

We have seen that merge combines a list of cubes with a common scalar coordinate to produce a single cube with a new dimension created from these scalar values.

But what happens if you try to combine cubes along a common dimension?

In [None]:
fname = iris.sample_data_path('A1B_north_america.nc')
cube = iris.load_cube(fname)

cube_1 = cube[:10]
cube_2 = cube[10:20]
cubes = iris.cube.CubeList([cube_1, cube_2])
print(cubes)

These cubes should be able to be merged; after all, they have both come from the same original cube!

In [None]:
print(cubes.merge())

Merge cannot be used to combine common non-scalar coordinates. Instead we must use concatenate, which joins together ("concatenates") common non-scalar coordinates to produce a single cube with the common dimension extended:

In [None]:
print(cubes.concatenate())

As with merge, Iris contains functionality to simplify the identification process for causes of failed concatenations. The ``concatenate_cube`` method of a CubeList expects the list of cubes to contain only cubes that can be concatenated to produce a single cube. If they do not concatenate to a single cube, a descriptive error will be raised. For instance:

```
    >>> print cubes.concatenate_cube()
    Traceback (most recent call last):
      ...
    iris.exceptions.ConcatenateError: failed to concatenate into a single cube.
      Scalar coordinates differ: forecast_reference_time, height != forecast_reference_time
```

### Exercise 3 : Solving Merge problems

The following exercise is designed to give you experience of solving issues that prevent a merge from taking place.
The output from ``merge_cube`` is included to help with identification, and once a fix has been identified, ``raw_cubes.merge()`` should result in a CubeList containing a single cube:

The first exercise is completed below:

1\. Identify and resolve the issue preventing the merge of ``air_potential_temperature`` cubes from ``resources/merge_exercise.1.*.nc``.

    >>> raw_cubes = iris.load_raw('resources/merge_exercise.1.*.nc', 'air_potential_temperature')
    >>> raw_cubes.merge_cube()
    Traceback (most recent call last):
    ...
    iris.exceptions.MergeError: failed to merge into a single cube.
      cube.attributes keys differ: 'History'


In [None]:
raw_cubes = iris.load_raw('resources/merge_exercise.1.*.nc', 'air_potential_temperature')

# Print the attributes, clearly one is different.
for cube in raw_cubes:
    print(cube.attributes)

# Remove the history attribute from the first cube.
del raw_cubes[0].attributes['History']

# Check that this has meant that a merge now results in a single cube.
print('--' * 50)
print(raw_cubes.merge())

2\. Identify and resolve the issue preventing the merge of ``air_potential_temperature`` cubes from ``resources/merge_exercise.5.*.nc`` (hint: can these cubes be merged?).

    >>> raw_cubes = iris.load_raw('resources/merge_exercise.5.*.nc', 'air_potential_temperature')
    >>> raw_cubes.merge_cube()
    Traceback (most recent call last):
    ...
    iris.exceptions.MergeError: failed to merge into a single cube.
      Coordinates in cube.dim_coords differ: time.

## 3.5 Section Summary : Cube Control<a id='summary'></a>

In this section we learnt:
* cubes can be indexed like arrays to produce sub-cubes
* 'constraint' objects can be used to load only part of the data
* particular methods are used to extract data by dates and times
* Merging is used to join cubes into a larger combined dataset
* Concatenation is a similar operation, used in slightly different circumstances to merging.
