# Resampling

In the case of the images that we have loaded so far, all the data have the same resolution (30m). In the section above we saw that it is straightforward to align these datasets even though they cover slightly different areas. In some cases though the resolution of the image is different. This is the case for band 8 (the pancromatic band). The resolution of that band is 15m. This section will demonstrate *aggregation(down-sampling)* and *interpolation(up-sampling)*. In practice, aggregation is much more common.

In [None]:
import intake
import numpy as np
import xarray as xr

import hvplot.xarray
import holoviews as hv

import cartopy.crs as ccrs
import geoviews as gv

hv.extension('bokeh', width=80)

We'll be using `datashader` operations `rasterize` and `regrid` to handle our multidimensional regridding.

In [None]:
from holoviews.operation.datashader import regrid, rasterize
from datashader import transfer_functions as tf, reductions as rd

## Recap: Loading data

In [None]:
cat = intake.open_catalog('../catalog.yml')
l5_da = cat.l5().read_chunked()
l5_da

In [None]:
l8_da = cat.l8().read_chunked()
l8_da

We can use this EPSG value shown above under the ``crs`` key to create a cartopy coordinate reference system that we will be using later on in this notebook:

In [None]:
crs=ccrs.epsg(32611)

## Compute NDVI

Now we will calculate NDVI for each of these image sets.

In [None]:
NDVI_1988 = (l5_da.sel(band=5) - l5_da.sel(band=4)) / (l5_da.sel(band=5) + l5_da.sel(band=4))
NDVI_2017 = (l8_da.sel(band=5) - l8_da.sel(band=4)) / (l8_da.sel(band=5) + l8_da.sel(band=4))

## Aligning the data

In [None]:
NDVI_by_year = xr.concat([NDVI_1988, NDVI_2017], dim=xr.DataArray([1988, 2017], dims=('year'), name='year'))
NDVI_by_year

## Select region of interest

We'll use the area around the central point as the Region of Interest (ROI). In this case we'll use a 30 km box around the center point.

In [None]:
x_center, y_center = crs.transform_point(-118.7081, 38.6942, ccrs.PlateCarree())
buffer = 1.5e4

In [None]:
xmin = x_center - buffer
xmax = x_center + buffer
ymin = y_center - buffer
ymax = y_center + buffer

In [None]:
ROI = NDVI_by_year.sel(x=slice(xmin, xmax), y=slice(ymin, ymax))

In [None]:
%%opts Image [width=400 height=300] (cmap='viridis')
p = ROI.hvplot('x','y', col='year', crs=crs, shared_axes=True)

### Aggregation

We'll define a new resolution that is visibly different from 30m.

In [None]:
res = 1e3

Just to make things pretty and as a sanity check, let's turn the colorbar back on for both plots and set the width of the first plot slightly higher to account for the extra axis that is being portrayed.

In [None]:
p[1988] = p[1988].options(width=370, height=300)
p[2017] = p[2017].options(width=310, height=300)

In [None]:
p_1000 = regrid(p, x_sampling=res, y_sampling=res)
p_1000

Notice how fast it was to generate these plots. Aggregation is by mean by default, but there are other ways to aggregate. Here are some:

In [None]:
regrid(p, x_sampling=res, y_sampling=res, aggregator=rd.std()).relabel(f'Aggregated by std')

In [None]:
# Exercise: Try to regrid using a different aggregator. Use tab completion to find other methods on `rd`.

In [None]:
# Exercise: Now using std try changing the resolution.

This view could certainly help us pick out the bounds of the lake at least in 2017.

### Similar workflow in `xarray`

To accomplish a similar thing in `xarray` by grouping the values into bins based on the desired resolution and taking the mean on each of those bins.

In [None]:
res = 1e3

In [None]:
x = np.arange(ROI.x.min(), ROI.x.max(), res)
y = np.arange(ROI.y.min(), ROI.y.max(), res)

We'll use the left edge as the label for now

In [None]:
da_1000 = (ROI
    .groupby_bins('x', x, labels=x[:-1]).mean(dim='x')
    .groupby_bins('y', y, labels=y[:-1]).mean(dim='y')
    .rename(x_bins='x',y_bins='y')
)
da_1000

### Compare

We can compare this to the results from using datashader regridding by getting the data from p_1000 and subtracting the nearest data from da_1000.

In [None]:
def get_data(p):
    df = p.dframe()
    pivotted = df.pivot(index='y', columns='x', values='value')
    stacked = pivotted.stack()
    return xr.DataArray.from_series(stacked)

In [None]:
(da_1000.sel(year=2017).reindex(get_data(p_1000[2017]).indexes, method='nearest') - get_data(p_1000[2017])).hvplot('x','y')

### Handling band with different resolution

First we need to load the band-8 data. We'll grab it straight from google cloud storage:

In [None]:
from random import random
from time import sleep

def get_band_with_exponential_backoff(path, row, product_id, band, 
                                      maximum_backoff=32):
    """
    Google Cloud Storage recommends using exponential backoff 
    when accessing the API. 
    
    https://cloud.google.com/storage/docs/exponential-backoff
    """
    n = backoff = 0
    while backoff < maximum_backoff:
        try:
            return cat.google_landsat_band(path=path, row=row, 
                                           product_id=product_id,
                                           band=band).to_dask()
        except:
            backoff = min(2**n + random(), maximum_backoff)
            sleep(backoff)
            n += 1

In [None]:
da_8 =  get_band_with_exponential_backoff(product_id='LC08_L1TP_042033_20171022_20171107_01_T1', path=42, row=33, band=8)

Since the y coordinates are in decreasing order, the slice needs to be between max and min.  In the case where coordinates are increasing in order (x) the slice is between min and max.

In [None]:
ROI_8 = da_8.sel(x=slice(xmin, xmax), y=slice(ymax, ymin))
ROI_8 = ROI_8.drop('band').squeeze().persist()
ROI_8

In [None]:
p_8 = ROI_8.hvplot('x', 'y', width=500, height=400)
p_8

Let's define a little helper function to determine the resolution of plots

In [None]:
def get_res(p, x='x', y='y'):
    df = p.dframe()
    pivotted = df.pivot(index=y, columns=x, values='value')
    stacked = pivotted.stack()
    da = xr.DataArray.from_series(stacked)
    print(f'{x} res:', np.unique(np.around(da[x].diff(x), 2)))
    print(f'{y} res:', np.unique(np.around(da[y].diff(y), 2)))

In [None]:
get_res(p_8)

We can use `xarray` to merge this band with the rest of our data and we will get a union of all the coordinates. In this case the shape expands to (1000, 1000) to (2000, 2000).

In [None]:
ds = xr.merge([{'NDVI': ROI, '2017_band_8': ROI_8}])
ds

All of our data our properly represented, but using methods like selecting the nearest value to a certain, x, y might yield nans:

In [None]:
ds.sel(x=x_center, y=y_center, method='nearest').compute()

We can regrid the band 8 to a 30m resolution or we can regrid the NDVI to a 15m resolution.

In [None]:
res = 30
p_8_30 = regrid(p_8, x_sampling=res, y_sampling=res, width=500, height=400, 
                x_range=(x_center-1e3, x_center+1e3), y_range=(y_center-1e3, y_center+1e3))
p_8_30 

In [None]:
get_res(p_8_30)

**NOTE:** `x_sampling` and `y_sampling` set the minimum allowable resolution, so the resolution of a given plot might not be exactly `x_sampling` and `y_sampling` unless it is sufficiently zoomed in.

## Interpolation
Now let's quickly take a look at up-sampling. For this we will use `regrid` since up-sampling is not allowed in `rasterize`

In [None]:
p_ndvi_15 = regrid(p, upsample=True, 
                   x_sampling=15, y_sampling=15, 
                   x_range=(x_center-1e3, x_center+1e3), y_range=(y_center-1e3, y_center+1e3))
p_ndvi_15

In [None]:
get_res(p_ndvi_15[1988])

This doesn't look any more resolved than 30m, but that is because it is using nearest by default so the grid cells look the same size. The resolution becomes more apparent when using linear interpolation.

In [None]:
p_ndvi_15 = regrid(p, interpolation='linear', upsample=True,
                   x_sampling=15, y_sampling=15, 
                   x_range=(x_center-1e3, x_center+1e3), y_range=(y_center-1e3, y_center+1e3))
p_ndvi_15.relabel('Using linear interpolation')

In [None]:
get_res(p_ndvi_15[1988])

### Similar workflow in `xarray`

`xarray` supports a number of interpolations for up-sampling data. Here is what it takes to re-scale the ndvi images from res=30 to res=15 to match the pancromatic band. The options are `nearest` and `linear` with linear being selected by default.

**NOTE:** Interpolation is not supported on `dask` arrays, so we need to load that data into memory first. We'll use `.load()` for this.

In [None]:
ROI.load()

In [None]:
ndvi_15 = ROI.interp_like(ROI_8)
ndvi_15

In [None]:
# Exercise: use hvplot to make a plot of these computed values

### Next:

Now that you have learned how to regrid your data, you are likely ready to learn more about [Machine Learning](05_Machine_Learning.ipynb).