## Shapefile area averaging

### Introduction

This example demonstrates how to find the area weighting of a cube's data points that fall within a given geometry from a shapefile. 

Area weighting allows us to account for data cells that only fall partially within a given geometry, by determining the percentage of the cell's area that falls within the geometry.

We can use that percentage to define its overall contribution when we perform a mathematical operation on the values within the geometry (for example by using a function from `iris.analysis` that may be used in conjunction with `cube.collapsed`).

Let's look at this pictorially:

![Illustration of a shapefile geometry with cube cells overlapping as indicated by the grid](files/img/area_weighting.png)

In the image the black grid is the grid of data cells from the cube, the blue line is the extent of the shapefile geometry and the green filled area is the area of the grid that sits within the shapefile. Clearly there are numerous cells toward the centre of the grid that entirely sit within the geometry, so the values of these cells will contribute 100% to the mathematical operation to be performed. However the top centre cell only overlaps the geometry by about 10%, so will only contribute 10% of its value, and so on.

An example of when this might be valuable is determining the total rainfall within a UK county over a given time period. This would be done by collapsing the cube over the time dimension and finding the sum of all the values that lie within the geometry of the county we are interested in. As in the image above, not all cells will be fully contained within the geometry and so in order to determine the amount of rainfall within the county's geometry, we need to find the area-weighted amount of total rainfall for that cell that fell within the geometry.

### Worked Example

In this example we will determine the area-weighted average air temperature of the UK in the year 2098 for one of our climate future scenarios (A1B).

Let's start by importing the libraries we will need for this example, and checking the version of Iris being used.

In [1]:
import cartopy.io.shapereader as shpreader
import iris
import iris.analysis.geometry as iag

In [2]:
print('Iris version: {}'.format(iris.__version__))

Iris version: 3.10.0


Now we will import some data; namely the A1B scenario global air temperatures file.

In [3]:
a1b_cube = iris.load_cube(iris.sample_data_path('A1B.2098.pp'))

lats = a1b_cube.coord('latitude')
lons = a1b_cube.coord('longitude')
if not lats.has_bounds():
    lats.guess_bounds()
if not lons.has_bounds():
    lons.guess_bounds()

Calculating the geometry weights will require known bounds to our grid cells so if either of our cube's lateral coordinates are not bounded, we use the `guess_bounds` method to set appropriate bounds on each lateral coordinate.

Now we will import a shapefile to use, specifically the Natural Earth global country boundaries shapefile.

In [4]:
filename = shpreader.natural_earth(resolution='110m',
                                   category='cultural',
                                   name='admin_0_countries')
borders = shpreader.Reader(filename)
uk_geom, = [g for g in borders.records() if g.attributes['NAME_LONG']=='United Kingdom']

Here the shapefile is specified and read using the `cartopy.io.shapereader` module. The UK's geometry is determined by iterating over all the records in the shapefile, choosing the record where the attribute `name_long` matches 'United Kingdom'.

This Natural Earth shapefile was chosen here for simplicity's sake, but any other shapefile can be accessed using the methods demonstrated here. It is worth noting that whilst `shpreader.natural_earth()` will download Natural Earth shapefiles as required, all it returns is a filepath to a shapefile.

A very similar process should be able to be followed with any standard shapefile, as all shapefiles should be loaded with both `geometries()` and `records()` methods. The former returns a generator object of each geometry within the shapefile while the latter returns a generator object of each record within the shapefile.

A record is made up of a geometry (one of the geometries from the geometries method) and an attributes dictionary. The keys within the attributes dictionary are dependant on the shapefile loaded, but are simple to locate:

In [5]:
print(uk_geom.attributes.keys())

dict_keys(['featurecla', 'scalerank', 'LABELRANK', 'SOVEREIGNT', 'SOV_A3', 'ADM0_DIF', 'LEVEL', 'TYPE', 'TLC', 'ADMIN', 'ADM0_A3', 'GEOU_DIF', 'GEOUNIT', 'GU_A3', 'SU_DIF', 'SUBUNIT', 'SU_A3', 'BRK_DIFF', 'NAME', 'NAME_LONG', 'BRK_A3', 'BRK_NAME', 'BRK_GROUP', 'ABBREV', 'POSTAL', 'FORMAL_EN', 'FORMAL_FR', 'NAME_CIAWF', 'NOTE_ADM0', 'NOTE_BRK', 'NAME_SORT', 'NAME_ALT', 'MAPCOLOR7', 'MAPCOLOR8', 'MAPCOLOR9', 'MAPCOLOR13', 'POP_EST', 'POP_RANK', 'POP_YEAR', 'GDP_MD', 'GDP_YEAR', 'ECONOMY', 'INCOME_GRP', 'FIPS_10', 'ISO_A2', 'ISO_A2_EH', 'ISO_A3', 'ISO_A3_EH', 'ISO_N3', 'ISO_N3_EH', 'UN_A3', 'WB_A2', 'WB_A3', 'WOE_ID', 'WOE_ID_EH', 'WOE_NOTE', 'ADM0_ISO', 'ADM0_DIFF', 'ADM0_TLC', 'ADM0_A3_US', 'ADM0_A3_FR', 'ADM0_A3_RU', 'ADM0_A3_ES', 'ADM0_A3_CN', 'ADM0_A3_TW', 'ADM0_A3_IN', 'ADM0_A3_NP', 'ADM0_A3_PK', 'ADM0_A3_DE', 'ADM0_A3_GB', 'ADM0_A3_BR', 'ADM0_A3_IL', 'ADM0_A3_PS', 'ADM0_A3_SA', 'ADM0_A3_EG', 'ADM0_A3_MA', 'ADM0_A3_PT', 'ADM0_A3_AR', 'ADM0_A3_JP', 'ADM0_A3_KO', 'ADM0_A3_VN', 'ADM0_A

These operations can be performed on any shapefile record.

Note, however, that the keys within an attributes dictionary are dependent on the shapefile, and that code that expects a certain key to appear within the attributes may fail if a different shapefile is processed.

With the UK's geometry selected from the loaded shapefile, it is ready to be passed to the area weights calculating function:

In [6]:
weights = iag.geometry_area_weights(a1b_cube, uk_geom.geometry)
cube_collapsed = a1b_cube.collapsed(['latitude', 'longitude'],
                                    iris.analysis.MEAN,
                                    weights=weights)
print('Average UK air temp in 2098: {:.3f}K'.format(cube_collapsed.data))

Average UK air temp in 2098: 288.180K




These steps calculate the area-weighted values for the cube based on the UK's geometries. These are passed to the collapse call, which collapses to find the area-weighted mean over both lateral coordinates. Finally, the resultant cube's single data point (average UK air temperature in 2098 according to the A1B scenario) is printed.

### Something to be aware of

A note of caution to end on. The example shown here works in Euclidean space; curvature of the earth is not accounted for in the weighted area calculations performed.