# Walker Lake

## The Disappearing Walker Lake
While the loss of the Aral Sea in Kazakhstan and Lake Urmia in Iran have received a lot of attention over the last few decades, this trend is a global phenomena.  Reciently a number of __[papers](https://earthobservatory.nasa.gov/IOTD/view.php?id=91921)__ have been published including one focusing on the __[Decline of the world's saline lakes](https://www.nature.com/articles/ngeo3052)__.  Many of these lakes have lost the majority of their volume over the last century, including Walker Lake (Nevada, USA) which has lost 90 percent of its volume over the last 100 years.

The following example is intended to replicate the typical processing required in change detection studies similar to the __[Decline of the world's saline lakes](https://www.nature.com/articles/ngeo3052)__.

In [1]:
from pathlib import Path

import geoviews as gv
import holoviews as hv
import numpy as np
import rioxarray
import xarray as xr

import cartopy.crs as ccrs
from colorcet import coolwarm
from holoviews import opts
from holoviews.operation.datashader import rasterize
from IPython.display import display

hv.extension('bokeh')

In this example, we would like to use [Dask](https://dask.org/) to
demonstrate how image processing can be distributed across workers,
either running locally or across a cluster. In the next cell, we
instantiate a Dask distributed Client where we request eight,
single-threaded workers and declare a memory limit of 8GB per
worker. You can experiment with different memory limits (e.g 4GB) and
different numbers of workers but note that each worker should only use
one thread as [Datashader](https://datashader.org) manages its own
parallelization using [Numba](https://numba.pydata.org/):

In [2]:
# arbitrarily choose a memory limit (8GB) to demonstrate the out of core
# processing infrastructure
from dask.distributed import Client
client = Client(memory_limit=8*1e9, n_workers=8, threads_per_worker=1)
# As Datashader uses parallel Numba for raster rendering, we need to use
# single threaded Dask workers on each CPU to avoid contention.
client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status,

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 8
Total threads: 8,Total memory: 59.60 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:43825,Workers: 8
Dashboard: http://127.0.0.1:8787/status,Total threads: 8
Started: Just now,Total memory: 59.60 GiB

0,1
Comm: tcp://127.0.0.1:38195,Total threads: 1
Dashboard: http://127.0.0.1:34901/status,Memory: 7.45 GiB
Nanny: tcp://127.0.0.1:36025,
Local directory: /tmp/dask-scratch-space/worker-k5q8ms0w,Local directory: /tmp/dask-scratch-space/worker-k5q8ms0w

0,1
Comm: tcp://127.0.0.1:44179,Total threads: 1
Dashboard: http://127.0.0.1:35855/status,Memory: 7.45 GiB
Nanny: tcp://127.0.0.1:34325,
Local directory: /tmp/dask-scratch-space/worker-s1513o67,Local directory: /tmp/dask-scratch-space/worker-s1513o67

0,1
Comm: tcp://127.0.0.1:34977,Total threads: 1
Dashboard: http://127.0.0.1:42603/status,Memory: 7.45 GiB
Nanny: tcp://127.0.0.1:44841,
Local directory: /tmp/dask-scratch-space/worker-1osy1g00,Local directory: /tmp/dask-scratch-space/worker-1osy1g00

0,1
Comm: tcp://127.0.0.1:38991,Total threads: 1
Dashboard: http://127.0.0.1:37667/status,Memory: 7.45 GiB
Nanny: tcp://127.0.0.1:36657,
Local directory: /tmp/dask-scratch-space/worker-1_j2raya,Local directory: /tmp/dask-scratch-space/worker-1_j2raya

0,1
Comm: tcp://127.0.0.1:42243,Total threads: 1
Dashboard: http://127.0.0.1:36987/status,Memory: 7.45 GiB
Nanny: tcp://127.0.0.1:35037,
Local directory: /tmp/dask-scratch-space/worker-qy9fb4c1,Local directory: /tmp/dask-scratch-space/worker-qy9fb4c1

0,1
Comm: tcp://127.0.0.1:37977,Total threads: 1
Dashboard: http://127.0.0.1:33643/status,Memory: 7.45 GiB
Nanny: tcp://127.0.0.1:42981,
Local directory: /tmp/dask-scratch-space/worker-n8ts7sjb,Local directory: /tmp/dask-scratch-space/worker-n8ts7sjb

0,1
Comm: tcp://127.0.0.1:46241,Total threads: 1
Dashboard: http://127.0.0.1:42141/status,Memory: 7.45 GiB
Nanny: tcp://127.0.0.1:44385,
Local directory: /tmp/dask-scratch-space/worker-z82ktfoc,Local directory: /tmp/dask-scratch-space/worker-z82ktfoc

0,1
Comm: tcp://127.0.0.1:42393,Total threads: 1
Dashboard: http://127.0.0.1:36033/status,Memory: 7.45 GiB
Nanny: tcp://127.0.0.1:35791,
Local directory: /tmp/dask-scratch-space/worker-vv0jt76x,Local directory: /tmp/dask-scratch-space/worker-vv0jt76x


### Landsat Image Data

To replicate this study, we first have to obtain the data from primary sources.  The conventional way to obtain Landsat image data is to download it through USGS's 
__[EarthExplorer](https://earthexplorer.usgs.gov/)__ or NASA's __[Giovanni](https://giovanni.gsfc.nasa.gov/giovanni/)__, but to facilitate the example two images have been downloaded from EarthExployer and cached.  

The two images used by the original study are LT05_L1TP_042033_19881022_20161001_01_T1 and 
LC08_L1TP_042033_20171022_20171107_01_T1 from 1988/10/22 and 2017/10/22 respectively.  These images contain Landsat Surface Reflectance Level-2 Science Product images.

## Loading into xarray

In the next cells, we load the Landsat-5 and Landsat-8 files into xarray `DataArray` objects, reading them locally using `rioxarray`.

In [3]:
def read_landsat_files(pattern):
    data_dir = Path('data')
    data = {
        int(file.stem[-1]): rioxarray.open_rasterio(file, chunks={"x": 1200, "y": 1200}, masked=True)
        for file in sorted(data_dir.glob(pattern))
    }
    dataset = xr.concat(data.values(), dim="band")
    dataset = dataset.assign_coords({"band": list(data)})
    return dataset

In [4]:
landsat_5_img = read_landsat_files('LT05*')
landsat_5_img

Unnamed: 0,Array,Chunk
Bytes,439.80 MiB,5.49 MiB
Shape,"(2, 7241, 7961)","(1, 1200, 1200)"
Dask graph,98 chunks in 5 graph layers,98 chunks in 5 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 439.80 MiB 5.49 MiB Shape (2, 7241, 7961) (1, 1200, 1200) Dask graph 98 chunks in 5 graph layers Data type float32 numpy.ndarray",7961  7241  2,

Unnamed: 0,Array,Chunk
Bytes,439.80 MiB,5.49 MiB
Shape,"(2, 7241, 7961)","(1, 1200, 1200)"
Dask graph,98 chunks in 5 graph layers,98 chunks in 5 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


In [5]:
landsat_8_img = read_landsat_files('LC08*')
landsat_8_img

Unnamed: 0,Array,Chunk
Bytes,473.84 MiB,5.49 MiB
Shape,"(2, 7941, 7821)","(1, 1200, 1200)"
Dask graph,98 chunks in 5 graph layers,98 chunks in 5 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 473.84 MiB 5.49 MiB Shape (2, 7941, 7821) (1, 1200, 1200) Dask graph 98 chunks in 5 graph layers Data type float32 numpy.ndarray",7821  7941  2,

Unnamed: 0,Array,Chunk
Bytes,473.84 MiB,5.49 MiB
Shape,"(2, 7941, 7821)","(1, 1200, 1200)"
Dask graph,98 chunks in 5 graph layers,98 chunks in 5 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


We create a cartopy coordinate reference system ([EPSG:32611](https://epsg.io/32611)) that we will be using later on in this notebook:

In [6]:
assert landsat_5_img.rio.crs == landsat_8_img.rio.crs
print(landsat_5_img.rio.crs)
crs = ccrs.epsg(landsat_5_img.rio.crs.to_epsg())

EPSG:32611


## Computing the NDVI (1988)

Now let us compute the [NDVI](https://en.wikipedia.org/wiki/Normalized_difference_vegetation_index) for the 1988 image.

In [7]:
ndvi5 = (landsat_5_img.sel(band=5) - landsat_5_img.sel(band=4))/(landsat_5_img.sel(band=5) + landsat_5_img.sel(band=4))
client.persist(ndvi5)

Unnamed: 0,Array,Chunk
Bytes,219.90 MiB,5.49 MiB
Shape,"(7241, 7961)","(1200, 1200)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 219.90 MiB 5.49 MiB Shape (7241, 7961) (1200, 1200) Dask graph 49 chunks in 1 graph layer Data type float32 numpy.ndarray",7961  7241,

Unnamed: 0,Array,Chunk
Bytes,219.90 MiB,5.49 MiB
Shape,"(7241, 7961)","(1200, 1200)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray


## Computing the NDVI (2017)

Now we can do this for the Landsat 8 files for the 2017 image:

In [8]:
ndvi8 = (landsat_8_img.sel(band=5) - landsat_8_img.sel(band=4))/(landsat_8_img.sel(band=5) + landsat_8_img.sel(band=4))

# Resampling to same size

The two images share the same coordinate system but do not have the exact same dimensions or coordinates. Previous versions of this notebook resampled the images to the same size, and optionally allowed to regrid them, all using  Datashader. In this version, we now interpolate the Landsat-8 image to fit onto the coordinates of the Landsat-5 one using `xarray`, approach that provides a similar result.

In [9]:
ndvi8 = ndvi8.interp_like(ndvi5, method="nearest")
client.persist(ndvi8)

Unnamed: 0,Array,Chunk
Bytes,219.90 MiB,219.90 MiB
Shape,"(7241, 7961)","(7241, 7961)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 219.90 MiB 219.90 MiB Shape (7241, 7961) (7241, 7961) Dask graph 1 chunks in 1 graph layer Data type float32 numpy.ndarray",7961  7241,

Unnamed: 0,Array,Chunk
Bytes,219.90 MiB,219.90 MiB
Shape,"(7241, 7961)","(7241, 7961)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray


## Viewing change via dropdown

Using [Datashader](https://datashader.org/) together with [GeoViews](https://geoviews.org/), we can now easily build an interactive visualization where we select between the 1988 and 2017 images. The use of datashader allows these images to be dynamically updated according to zoom level (Note: it can take datashader a minute to 'warm up' before it becomes fully interactive). For more information on how the dropdown widget was created using `HoloMap`, please refer to the [HoloMap reference](https://holoviews.org/reference/containers/bokeh/HoloMap.html#bokeh-gallery-holomap).

In [10]:
opts.defaults(
    opts.Curve(width=600, tools=['hover']),
    opts.Image(cmap='viridis', width=450, height=450, tools=['hover'], colorbar=True))

In [11]:
hmap = hv.HoloMap({'1988':gv.Image(ndvi5, crs=crs, vdims=['ndvi'], rtol=10), 
                   '2017':gv.Image(ndvi8, crs=crs, vdims=['ndvi'], rtol=10)}, 
                  kdims=['Year']).redim(x='lon', y='lat') # Mapping 'x' and 'y' from rasterio to 'lon' and 'lat'

In [12]:
%%time
display(rasterize(hmap))

CPU times: user 4.31 s, sys: 599 ms, total: 4.91 s
Wall time: 17.6 s


## Computing statistics and projecting display

The rest of the notebook shows how statistical operations can reduce the dimensionality of the data that may be used to compute new features that may be used as part of an ML pipeline. 

### The mean and sum over the two time points

The next plot (may take a minute to compute) shows the mean of the two NDVI images next to the sum of them:

In [13]:
mean_avg = hmap.collapse(dimensions=['Year'], function=np.mean)
mean_img = gv.Image(mean_avg.data, crs=crs, kdims=['lon', 'lat'], 
                    vdims=['ndvi']).relabel('Mean over Year')

summed = hmap.collapse(dimensions=['Year'], function=np.sum)
summed_image = gv.Image(summed.data, crs=crs, kdims=['lon', 'lat'], 
                        vdims=['ndvi']).relabel('Sum over Year')

In [14]:
%%time
display(rasterize(mean_img) + rasterize(summed_image))

CPU times: user 15.5 s, sys: 1.48 s, total: 17 s
Wall time: 44.5 s


### Difference in NDVI between 1988 and 2017

The change in Walker Lake as viewed using the NDVI can be shown by subtracting the NDVI recorded in 1988 from the NDVI recorded in 2017:

In [15]:
diff = hmap['1988'].data - hmap['2017'].data
difference = gv.Image(diff, crs=crs, kdims=['lon', 'lat'], vdims=['ndvi'])
difference = difference.relabel('Difference in NDVI').redim(ndvi='delta_ndvi')

In [16]:
%%time
display(rasterize(difference).redim.range(delta_ndvi=(-1.0,1.0)).opts(cmap=coolwarm))

CPU times: user 1.87 s, sys: 105 ms, total: 1.98 s
Wall time: 3.63 s


You can see a large change (positive delta) in the areas where there is water, indicating a reduction in the size of the lake over this time period.

## Slicing across `lon` and `lat`

As a final example, we can use the `sample` method to slice across the difference in NDVI along (roughly) the midpoint of the latitude and the midpoint of the longitude. To do this, we define the following helper function to convert latitude/longitude into the appropriate coordinate value used by the `DataSet`:

In [17]:
def from_lon_lat(x,y):
    return crs.transform_point(x,y, ccrs.PlateCarree())

lon_y, lat_x = from_lon_lat(-118, 39)  # Longitude of -118 and Latitude of 39

In [18]:
%%time
display((difference.sample(lat=lat_x) + difference.sample(lon=lon_y)).cols(1))

CPU times: user 374 ms, sys: 12.5 ms, total: 386 ms
Wall time: 479 ms
