### Inspect performance of iris interpolation schemes

In the Aerocom IDL tools, collocation of model data with point observations is done using neirest neighbour interpolation. The pyaerocom `ModelData` class is based on (but not inherited from) the iris `Cube` class, which includes an interpolation method that takes one or multiple coordinates on input. The iris interpolation interface supports neirest neighbour and linear grid interpolation. 

This notebook was developed as a result of former tests, that revealed, that the `Cube` interpolation method can be dramatically slow, since it loads the whole grid into memory (even if only a single point is accessed).

In [1]:
import numpy as np
import pyaerocom
import time
import iris

def load_model_data():
    data = pyaerocom.ModelData()
    data._init_testdata_default()
    return data.grid

Let's start with running tests for extracting a time series at a single location using neirest neighbour. This is done in 3 ways:

1. Using the original Cube
2. Using a Cube that is cropped around the point of interest before interpolation
3. Extracting directly using the closest indices

We use the following coordinates:

In [2]:
#single coordinate
lon, lat = 10, 10

#### Case 1 - Iris interface original grid

Using the iris interpolation interface on the original data grid stored in the `Cube`.

In [3]:
%%time
cube = load_model_data()
s0_case1 = cube.interpolate(sample_points = [("longitude", lon), ("latitude", lat)], scheme=iris.analysis.Nearest()).data

Rolling longitudes to -180 -> 180 definition
CPU times: user 1.57 s, sys: 3.25 s, total: 4.82 s
Wall time: 15.6 s


This took quite a while, the reason for this is, that the Interpolator instance loads the whole grid into memory before doing the interpolation. So for a single point (or only a few points) this does not make a lot of sense. However, now that we have everything ready in memory, we can do the whole thing for a lot more points, without reloading the data, which is fast:

In [4]:
# whole globe in 1degree resolution
more_lons = np.arange(-180, 180, 1)
more_lats = np.arange(-90, 90, 1)

In [5]:
%time
sub = cube.interpolate(sample_points = [("longitude", more_lons), ("latitude", more_lats)], scheme=iris.analysis.Nearest())
print(sub.shape)

CPU times: user 2 µs, sys: 2 µs, total: 4 µs
Wall time: 7.15 µs
(365, 180, 360)


Now, this gives us an instance of the `Cube` class. Let's just make sure, accessing the data does not take ages again

In [6]:
%%time
data1 = sub.data

CPU times: user 10 µs, sys: 9 µs, total: 19 µs
Wall time: 22.9 µs


#### Case 2 - Iris interface cropped grid

Using the iris interpolation interface after cropping the `Cube` within a suitable region-of-interest. For this, we first write a little helper function that determines a suitable crop interval.

In [7]:
def crop_around(lon, lat, stepdeg=2):
    lon_range = (lon-stepdeg, lon+stepdeg)
    lat_range = (lat-stepdeg, lat+stepdeg)
    return lon_range, lat_range

print(crop_around(lon, lat))

((8, 12), (8, 12))


In [8]:
%%time
cube = load_model_data()
#get grid resolution
lonr, latr = crop_around(lon, lat)
cropped = cube.intersection(longitude=lonr, latitude=latr)
s0_case2 = cropped.interpolate(sample_points=[("longitude", lon), ("latitude", lat)], scheme=iris.analysis.Nearest()).data

Rolling longitudes to -180 -> 180 definition
CPU times: user 181 ms, sys: 295 ms, total: 476 ms
Wall time: 4.14 s


Well, this was considerably faster, but only got us one point (in about 2s). I spare you the time to loop over all points in the additional arrays. Make sure, the extracted arrays of both cases are equal.

In [9]:
np.testing.assert_array_equal(s0_case1, s0_case2)

#### Case 3 - Finding and extracting closest point using numpy

Let's load the data again and extract the lon / lat coordinate arrays

In [10]:
cube = load_model_data()
lons, lats = cube.coord("longitude").points, cube.coord("latitude").points
print(cube.shape)

Rolling longitudes to -180 -> 180 definition
(365, 451, 900)


Write a helper method that finds the current index

In [11]:
def get_closest_index(lons, lats, lon, lat):
    return (np.argmin(np.abs(lons - lon)), np.argmin(np.abs(lats - lat)))

And extract time series for the first point:

In [12]:
%%time
idx_lon, idx_lat = get_closest_index(lons, lats, lon, lat)
s0_case3 = cube[:, idx_lat, idx_lon].data

CPU times: user 111 ms, sys: 51 ms, total: 162 ms
Wall time: 507 ms


Again, before applying this method to all coordinates in ``more_lons, more_lats``, make sure the numbers are right.

In [13]:
np.testing.assert_array_equal(s0_case1, s0_case3)

Now for the whole thing:

In [14]:
%%time
data = np.empty((365, len(more_lats), len(more_lons)))
for i in range(len(more_lons)):
    for j in range(len(more_lats)):
        lon, lat = more_lons[i], more_lats[j]
        idx_lon, idx_lat = get_closest_index(lons, lats, lon, lat)
        data[:,j,i] = cube[:, idx_lat, idx_lon].data
        

CPU times: user 2h 33min 23s, sys: 1h 11min 10s, total: 3h 44min 33s
Wall time: 8h 30min 48s


Now, this took more than 3 times as long as the case iris interface on the original grid. Note, however, that for a single or only a few points this method outperforms the iris method by far.

### Comparing Case 1 and Case 3 based on realistic obsdata case

Now comparing the two cases (i.e. iris, vs custom numpy method) based on the approximate number of stations of the AeroNet network (i.e. using 400 data points, i.e. a 20 x 20 point grid).

In [15]:
little_less_lons = np.linspace(-180, 180, 20)
little_less_lats = np.linspace(-90, 90, 20)

#### Case1 (400 datapoints)

In [16]:
%%time
cube = load_model_data()
cube.interpolate(sample_points = [("longitude", little_less_lons), 
                                             ("latitude", little_less_lats)], scheme=iris.analysis.Nearest()).data

Rolling longitudes to -180 -> 180 definition
CPU times: user 1.28 s, sys: 3.67 s, total: 4.95 s
Wall time: 15 s


#### Case 3 (400 datapoints)

In [21]:
%%time
cube = load_model_data()
lons, lats = cube.coord("longitude").points, cube.coord("latitude").points

data = np.empty((365, len(little_less_lats), len(little_less_lons)))
for i in range(len(little_less_lons)):
    print(i)
    for j in range(len(little_less_lats)):
        lon, lat = little_less_lons[i], little_less_lats[j]
        idx_lon, idx_lat = get_closest_index(lons, lats, lon, lat)
        data[:,j,i] = cube[:, idx_lat, idx_lon].data

Rolling longitudes to -180 -> 180 definition
0
CPU times: user 114 ms, sys: 52.8 ms, total: 167 ms
Wall time: 390 ms
CPU times: user 146 ms, sys: 61.2 ms, total: 208 ms
Wall time: 425 ms
CPU times: user 122 ms, sys: 92.8 ms, total: 215 ms
Wall time: 453 ms
CPU times: user 147 ms, sys: 159 ms, total: 307 ms
Wall time: 3.1 s
CPU times: user 211 ms, sys: 96 ms, total: 307 ms
Wall time: 2.99 s
CPU times: user 189 ms, sys: 112 ms, total: 301 ms
Wall time: 2.59 s
CPU times: user 175 ms, sys: 85.8 ms, total: 261 ms
Wall time: 2.56 s
CPU times: user 173 ms, sys: 114 ms, total: 287 ms
Wall time: 3.43 s
CPU times: user 129 ms, sys: 131 ms, total: 260 ms
Wall time: 2.58 s
CPU times: user 188 ms, sys: 134 ms, total: 322 ms
Wall time: 2.92 s
CPU times: user 173 ms, sys: 128 ms, total: 302 ms
Wall time: 2.51 s
CPU times: user 145 ms, sys: 173 ms, total: 317 ms
Wall time: 2.49 s
CPU times: user 198 ms, sys: 133 ms, total: 331 ms
Wall time: 2.45 s
CPU times: user 189 ms, sys: 133 ms, total: 322 ms
Wal

CPU times: user 168 ms, sys: 43.1 ms, total: 211 ms
Wall time: 466 ms
CPU times: user 150 ms, sys: 66.9 ms, total: 217 ms
Wall time: 445 ms
CPU times: user 134 ms, sys: 61.7 ms, total: 196 ms
Wall time: 445 ms
6
CPU times: user 200 ms, sys: 82.4 ms, total: 283 ms
Wall time: 524 ms
CPU times: user 172 ms, sys: 73 ms, total: 245 ms
Wall time: 496 ms
CPU times: user 139 ms, sys: 76.8 ms, total: 215 ms
Wall time: 448 ms
CPU times: user 130 ms, sys: 64.3 ms, total: 194 ms
Wall time: 443 ms
CPU times: user 102 ms, sys: 53.7 ms, total: 156 ms
Wall time: 445 ms
CPU times: user 135 ms, sys: 56.3 ms, total: 192 ms
Wall time: 452 ms
CPU times: user 163 ms, sys: 49.3 ms, total: 212 ms
Wall time: 472 ms
CPU times: user 142 ms, sys: 27.4 ms, total: 169 ms
Wall time: 425 ms
CPU times: user 120 ms, sys: 58.1 ms, total: 178 ms
Wall time: 426 ms
CPU times: user 130 ms, sys: 64.2 ms, total: 195 ms
Wall time: 466 ms
CPU times: user 135 ms, sys: 62.2 ms, total: 198 ms
Wall time: 498 ms
CPU times: user 165 

CPU times: user 127 ms, sys: 88.3 ms, total: 216 ms
Wall time: 432 ms
CPU times: user 125 ms, sys: 95 ms, total: 220 ms
Wall time: 436 ms
CPU times: user 154 ms, sys: 53.8 ms, total: 207 ms
Wall time: 424 ms
CPU times: user 119 ms, sys: 99.9 ms, total: 219 ms
Wall time: 428 ms
CPU times: user 147 ms, sys: 66.4 ms, total: 213 ms
Wall time: 438 ms
12
CPU times: user 133 ms, sys: 77.1 ms, total: 210 ms
Wall time: 437 ms
CPU times: user 149 ms, sys: 74.1 ms, total: 224 ms
Wall time: 434 ms
CPU times: user 148 ms, sys: 71 ms, total: 219 ms
Wall time: 439 ms
CPU times: user 134 ms, sys: 85.1 ms, total: 219 ms
Wall time: 432 ms
CPU times: user 137 ms, sys: 76.5 ms, total: 214 ms
Wall time: 423 ms
CPU times: user 140 ms, sys: 58.6 ms, total: 198 ms
Wall time: 425 ms
CPU times: user 162 ms, sys: 57.4 ms, total: 219 ms
Wall time: 429 ms
CPU times: user 135 ms, sys: 65.2 ms, total: 201 ms
Wall time: 423 ms
CPU times: user 181 ms, sys: 47.7 ms, total: 228 ms
Wall time: 636 ms
CPU times: user 158 m

CPU times: user 160 ms, sys: 52.2 ms, total: 212 ms
Wall time: 429 ms
CPU times: user 130 ms, sys: 30.2 ms, total: 161 ms
Wall time: 399 ms
CPU times: user 139 ms, sys: 51.3 ms, total: 191 ms
Wall time: 399 ms
CPU times: user 141 ms, sys: 72.2 ms, total: 213 ms
Wall time: 427 ms
CPU times: user 128 ms, sys: 91.6 ms, total: 220 ms
Wall time: 430 ms
CPU times: user 185 ms, sys: 40.8 ms, total: 226 ms
Wall time: 469 ms
CPU times: user 146 ms, sys: 70.1 ms, total: 216 ms
Wall time: 452 ms
18
CPU times: user 151 ms, sys: 56.7 ms, total: 207 ms
Wall time: 424 ms
CPU times: user 147 ms, sys: 65.7 ms, total: 213 ms
Wall time: 434 ms
CPU times: user 155 ms, sys: 60.9 ms, total: 216 ms
Wall time: 427 ms
CPU times: user 128 ms, sys: 61.8 ms, total: 190 ms
Wall time: 420 ms
CPU times: user 131 ms, sys: 79.9 ms, total: 210 ms
Wall time: 408 ms
CPU times: user 149 ms, sys: 68.6 ms, total: 218 ms
Wall time: 434 ms
CPU times: user 162 ms, sys: 62 ms, total: 224 ms
Wall time: 430 ms
CPU times: user 147

In [18]:
print(data.shape)

(365, 20, 20)
