In [None]:
%matplotlib inline

## Iris introduction course
# 5. Data Processing

**Learning Outcome**: by the end of this section, you will be able to apply arithmetic and statistical operations on cube data.

**Duration:** 1.5 hour

**Overview:**<br>
5.1 [Cube Arithmetic](#arithmetic)<br>
5.2 [Aggregation and Statistics](#agg_and_stats)<br>
5.3 [Exercise : statistics and visualisation](#ex_5)<br>
5.4 [Summary of the Section](#summary)

## Setup

In [None]:
import iris
import numpy as np

## 5.1 Cube Arithmetic<a id='arithmetic'></a>

Basic mathematical operators exist on the cube to allow one to add, subtract, divide, multiply and perform other mathematical operations on cubes of a similar shape to one another:

In [None]:
a1b = iris.load_cube(iris.sample_data_path('A1B_north_america.nc'))
e1 = iris.load_cube(iris.sample_data_path('E1_north_america.nc'))

print(e1.summary(True))
print(a1b)

In [None]:
scenario_difference = a1b - e1
print(scenario_difference)

Notice that the resultant cube's name is now `unknown` and that resultant cube's `attributes` and `cell_methods` have disappeared; this is because these all differed between the two input cubes.

----

<div class="alert alert-block alert-warning">
    <b><font color='brown'>Exercise: </font></b>
    <p>Work out which aspects of the cube metadata have changed in the difference result, and print them out.</p>
    <p>What do you think is the purpose of these changes in the result ?</p>
    <p>What other metadata <i>is</i> preserved ?</p>
</div>

In [None]:
names, cubes = ('A1B', 'E1', 'difference'), (a1b, e1, scenario_difference)

print('Cube names:')
print('    A1b  = {}'.format(a1b.name()))
print('    E1   = {}'.format(e1.name()))
print('    diff = {}'.format(scenario_difference.name()))
print('result name was cleared')

print('')
print('Cube attribute names:')
print('    A1b  = {}'.format(a1b.attributes))
print('    E1   = {}'.format(e1.attributes))
print('    diff = {}'.format(scenario_difference.attributes))
print('result attributes were cleared')

print('')
print('Cube cell-methods:')
print('    A1b  = {}'.format(a1b.cell_methods))
print('    E1   = {}'.format(e1.cell_methods))
print('    diff = {}'.format(scenario_difference.cell_methods))
print('result cell methods were removed')

print('')
print('The name, cell-methods and attributes of the result cube are empty, '
      'because these generally will not be correct for the result cube. '
      'In this case, for example, the difference result is not an "air_temperature", '
      'nor is it a specific STASH code, nor is it a mean of 6-hourly values.')
print('')
print('The coordinates of the original data are preserved in the result cube.')

----

It is also possible to operate on cubes with numeric scalars, NumPy arrays and even cube coordinates.

<div class="alert alert-block alert-warning">
    <b><font color='brown'>Exercise: </font></b>
    <p>Can you multiply the 'e1' air temperature cube by its own latitude coordinate ?</p>
    <p>What are the units of the result ?</p>
</div>

In [None]:
product = e1 * e1.coord('latitude')
print(product)
print('')
print('New units are : "{}"'.format(product.units))

----

Although a cube's units can be freely set to any valid unit, the calculation of result units and compatibility checking is built into the arithmetic operations.

For example:

In [None]:
six_feet = iris.cube.Cube(6.0, units='feet')
twelve_days = iris.cube.Cube(12.0, units='days')
print(six_feet / twelve_days)

<div class="alert alert-block alert-warning">
    <b><font color='brown'>Exercise: </font></b>
    <p>What do you predict will result from adding together the 'six_feet' and 'twelve_days' cubes ?</p>
</div>

In [None]:
try:
    result = six_feet + twelve_days
except Exception as err:
    print('Exception occurred : {}'.format(err))

----

Note that you can update the cube's data and metadata directly, for instance by assigning to `cube.data`, `cube.standard_name` or `cube.units`.  When you do this, though, you need to be careful that the metadata is still an accurate description.  By changing data explicitly you are basically stating that the result is correct.

<div class="alert alert-block alert-warning">
    <b><font color='brown'>Exercise: </font></b>
    <p>What happens if you change the name of `e1` to 'potential_temperature' ?</p>
    <p>What is the meaning of the resulting data cube ?</p>
    <p>What happens if you then set the units of this to a time period ?</p>
</div>

In [None]:
e1_copy = e1.copy()
e1_copy.rename('potential_temperature')
print(e1_copy.summary(shorten=True))
print("The cube name is simply changed to the requested name.")
print("NOTE: however these aren't really potential temperatures at all, the values were absolute temperatures.")
print('')
e1_copy.units = 'days'
print(e1_copy.summary(shorten=True))
print("NOTE: 'potential_temperature' can't be measured in 'days' at all, but Iris takes no notice of this."
      "\n  - ** HANDLE WITH CARE !!** ")

----

Another function of cube arithmetic is to support 'broadcasting', in the numpy sense :  operations between data with different shapes.

In fact we already saw this above, with `product = e1 * e1.coord('latitude')`.

Broadcasting is simpler in Iris than in numpy, because how the dimensions "line up" is determined by matching coordinates, rather than depending on the ordering of dimensions.

<div class="alert alert-block alert-warning">
    <b><font color='brown'>Exercise: </font></b>
    <p>The basic result values in the above example,
        <b><font face=courier color="black">product = e1 * e1.coord('latitude')</font></b>, 
        come from multiplying <font face=courier color="black">e1<b>.data</b></font> 
        times <font face=courier color="black">e1.coord('latitude')<b>.points</b></font>.</p>
    <p>What happens if you simply multiply those two arrays, and are the values the same ?</p>
</div>

In [None]:
# Remembering this calculation, from above ...
product = e1 * e1.coord('latitude')

# Equivalent plain array multiply does not work.
try:
    array = e1.data * e1.coord('latitude').points
except Exception as err:
    print('Simple array multiply gives error : {}'.format(err))

# To get it right in numpy, adjust shapes to align the correct dimensions.
lats_adjusted = e1.coord('latitude').points.reshape(1, 37, 1)
print('Corrected latitudes shape : {}'.format(lats_adjusted.shape))
array = e1.data * lats_adjusted
print('Same result ? {}'.format(np.all(array == product.data)))

An even simpler example of broadcasting is doing arithmetic between a cube and a scalar value.

<div class="alert alert-block alert-warning">
    <b><font color='brown'>Exercise: </font></b>
    <p>What happens if you add <b><font face='courier' color='black'>5.2</font></b> to the <b><font face='courier' color='black'>e1</font></b> cube ?</p>
    <p>What is the meaning of the result ?</p>
</div>

In [None]:
print('e1 is :\n  {}'.format(e1.summary(shorten=True)))
result = e1 + 5.2
print('e1 plus 5.2 is:\n  {}'.format(result.summary(shorten=True)))
print('These are all the temperature values, offset by another 5.2 (Kelvin).'
      '\n  (Note : standard name is removed).')

If the scalar is just a value, like this, then it is assumed to have the same units as the cube.

However, if the scalar is a cube or coordinate then it has its own units which take part in the calculation,
as seen above in the "feet per day" calculation.

## 5.2 Cube aggregation and statistics<a id='agg_and_stats'></a>

Many standard univariate aggregations exist in Iris. Aggregations allow one or more dimensions of a cube to be statistically collapsed for the purposes of statistical analysis of the cube's data. Iris uses the term 'aggregators' to refer to the statistical operations that can be used for aggregation.

A list of aggregators is available at http://scitools.org.uk/iris/docs/latest/iris/iris/analysis.html.

In [None]:
fname = iris.sample_data_path('uk_hires.pp')
cube = iris.load_cube(fname, 'air_potential_temperature')
print(cube.summary(True))

To take the vertical mean of this cube:

In [None]:
print(cube.collapsed('model_level_number', iris.analysis.MEAN))

NOTE: the printout shows that the result has a cell method of "mean: model_level_number".  Cell methods are a [CF metadata convention](http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html#cell-methods) which records that data are the results of statistical operations.

----

<div class="alert alert-block alert-warning">
    <b><font color='brown'>Exercise: </font></b>
    <p>How can you calculate all-time minimum temperatures for this data, and what is the form of the result ?</p>
</div>

In [None]:
min_temps = e1.collapsed('time', iris.analysis.MIN)
print(min_temps)
print('')
print('The result is 2D, shape={}'.format(min_temps.shape))

----

In addition to "collapse", other types of statistical reductions are also possible.  These also use aggregators to define the statistic.  See the following documentation areas :

 * [Cube.collapsed](https://scitools.org.uk/iris/docs/latest/iris/iris/cube.html?highlight=collapsed#iris.cube.Cube.collapsed), as discussed above.
 * [Cube.rolling_window](https://scitools.org.uk/iris/docs/latest/iris/iris/cube.html?highlight=rolling#iris.cube.Cube.rolling_window).
 * [Cube.aggregated_by](https://scitools.org.uk/iris/docs/latest/iris/iris/cube.html?highlight=aggregated_by#iris.cube.Cube.aggregated_by), used with the [coord_categorisation module](https://scitools.org.uk/iris/docs/latest/iris/iris/coord_categorisation.html?highlight=categor#module-iris.coord_categorisation).  
This provides calculations of a "group-by-and-reduce" pattern  -- these are explained later in section 6, "Advanced Concepts".

## 5.3 : Section Review Exercise : Arithmetic and Statistics<a id='ex_5'></a>

Let's apply all that we've learned about data processing and visualisation in Iris. We will perform data processing and visualisation to compare two possible climate futures scenarios, called the A1B scenario and the E1 scenario.

1\. Load as cubes the datasets found at `iris.sample_data_path('E1_north_america.nc')` and `iris.sample_data_path('A1B_north_america.nc')`. Print the summary of each cube.

In [None]:
cube_e1 = iris.load_cube(iris.sample_data_path('E1_north_america.nc'))
cube_a1b = iris.load_cube(iris.sample_data_path('A1B_north_america.nc'))
print(cube_e1.summary())
print(cube_a1b.summary())

2a\. Plot the following data in a single figure with three maps in one row:

 * the air temperature in the E1 scenario for the year 2099, 
 * the air temperature in the A1B scenario for the year 2099, and
 * the difference between the two scenarios.

Think about the most appropriate matplotlib colormap(s) to use for each plot. Hint: the different matplotlib colormaps can be seen at https://matplotlib.org/1.5.3/examples/color/colormaps_reference.html. 

2b\. What information do your plots show? 

In [None]:
# First extract only one year from each.
def date_in_year_2099(time_cell):
    return time_cell.point.year == 2099

year_2099_constraint = iris.Constraint(time=date_in_year_2099)
e1_2099 = cube_e1.extract(year_2099_constraint)
a1b_2099 = cube_a1b.extract(year_2099_constraint)

# Print the results : only one timepoint, so they are 2-D.
print(e1_2099.summary(shorten=True))
print(a1b_2099.summary(shorten=True))

# Calculate the difference.
difference = a1b_2099 - e1_2099

# Plot them ...
import matplotlib.pyplot as plt
import iris.quickplot as qplt

plt.figure(figsize=(16,6))

originals_min = min(e1_2099.data.min(), a1b_2099.data.min())
originals_max = max(e1_2099.data.max(), a1b_2099.data.max())

# Display forecast temperatures on an absolute colour scale, with the same value range for both.
# The difference is rather hard to see, so specify a high-contrast colormap to help with that.
plt.subplot(1, 3, 1)
qplt.pcolormesh(e1_2099, vmin=originals_min, vmax=originals_max, cmap='brg')
plt.title('Scenario E1, year 2099')

plt.subplot(1, 3, 2)
qplt.pcolormesh(a1b_2099, vmin=originals_min, vmax=originals_max, cmap='brg')
plt.title('Scenario A1B, year 2099')

plt.subplot(1, 3, 3)
# Calculate symmetrical limits for a zero-centred value range.
diff_min, diff_max = np.min(difference.data), np.max(difference.data)
diff_maxscale = max(diff_max, -diff_min)
# Plot with a 'diverging' colormap, suitable to the zero-centred value range.
qplt.pcolormesh(difference, vmin=-diff_maxscale, vmax=diff_maxscale, cmap='bwr')
plt.title('Temperature difference A1B - E1,\n year 2099')

print('Note: the last plot is the difference.\n'
      'This uses a "diverging" colour scale to show values above and below zero.\n',
      'All the values are positive, so it actually only has shades of red.')


3\. Produce cubes that describe the area-averaged air temperature over time for each scenario. Calculate the model difference between these two cubes.

HINT: see the documentation on [iris.cube.Cube.collapsed](https://scitools.org.uk/iris/docs/latest/iris/iris/cube.html#iris.cube.Cube.collapsed)
and [iris.analysis.cartography.area_weights](https://scitools.org.uk/iris/docs/latest/iris/iris/analysis/cartography.html#iris.analysis.cartography.area_weights)


In [None]:
import iris.analysis.cartography as icart
for cube in (cube_e1, cube_a1b):
    for axis in ('x', 'y'):
        coord = cube.coord(axis=axis)
        if not coord.has_bounds():
            coord.guess_bounds()

area_weights = icart.area_weights(cube_e1)

e1_areamean = cube_e1.collapsed(['latitude', 'longitude'], iris.analysis.MEAN, weights=area_weights)
a1b_areamean = cube_a1b.collapsed(['latitude', 'longitude'], iris.analysis.MEAN, weights=area_weights)
difference_areamean = a1b_areamean - e1_areamean
difference_areamean.rename('Temperature difference, A1B - E1')
print(difference_areamean)

4\. Produce a single line plot with the data from the two absolute temperature cubes you produced in part 3. Make sure you label the lines you plot.  Also plot the difference "e1 - a1b" for comparison.

In [None]:
# Use iplt rather than qplt, to avoid overlapping titles.
import iris.plot as iplt

plt.figure(figsize=(12,12))
plt.subplot(211)
iplt.plot(e1_areamean, '+-', color='blue', label='E1')
iplt.plot(a1b_areamean, '+-', color='red', label='A1B')
ax = plt.gca()
plt.legend()
# plt.show()

# Show the difference also.
plt.subplot(212)
iplt.plot(difference_areamean, '+-')
plt.title('Difference : scenarios E1 - A1B')

----

## 5.4 Summary of Section: Data processing<a id='summary'></a>

In this section we learnt:
* cubes can be combined with arithmetic operators like addition, as for numpy arrays.  Broadcasting also works.
* coordinates can also be used in cube arithmetic. 
* operators are provided to perform statistical aggregations of cube data.
* statistics can be calculated over selected dimensions, identified by coordinates.
