# Alignment and Preprocessing
Once the data is made available via `intake` as detailed in the  [Data_Ingestion_with_Intake](./02_Data_Ingestion_with_Intake.ipynb) user guide, the next step is to ensure the data has been appropriately reshaped and aligned across data sources for consumption by the machine learning pipeline, which you can learn about in the next user guide [Machine_Learning](./04_Machine_Learning.ipynb).

In [None]:
import intake
import numpy as np
import xarray as xr


import cartopy.crs as ccrs

import hvplot.xarray
import holoviews as hv
hv.extension('bokeh', width=80)

## Recap: Loading data

In [None]:
cat = intake.open_catalog('../catalog.yml')
l5_da = cat.l5().read_chunked()
l5_da

In [None]:
l8_da = cat.l8().read_chunked()
l8_da

We can use this EPSG value shown above under the ``crs`` key to create a cartopy coordinate reference system that we will be using later on in this notebook:

In [None]:
crs=ccrs.epsg(32611)

## Preprocessing
The first step in processing data is to remove the missing values. In this case the xarray self-reports the values assigned to `nodatavals`. We can use this information to set the missing values to `NaN`.

In [None]:
l5_da = l5_da.where(l5_da > l5_da.nodatavals[0])
l8_da = l8_da.where(l8_da > l8_da.nodatavals[0])

We can make sure that no more -9999s show up in the data, but calculating the minimum value in each dataarray as follows:

In [None]:
l5_da.min().compute()

In [None]:
l8_da.min().compute()

**NOTE:** These operations take a non-trivial amount of time because they require that the data actually be loaded. 

## Compute NDVI

Now we will calculate NDVI for each of these image sets and persist the output in memory for speedy calculations later.

In [None]:
NDVI_1988 = (l5_da.sel(band=5) - l5_da.sel(band=4)) / (l5_da.sel(band=5) + l5_da.sel(band=4)).persist()
NDVI_1988.shape

In [None]:
NDVI_2017 = (l8_da.sel(band=5) - l8_da.sel(band=4)) / (l8_da.sel(band=5) + l8_da.sel(band=4)).persist()
NDVI_2017.shape

## Aligning the data

These two sets of landsat bands cover roughly the same area but were taken in 1988 and 2017. While they have the same resolution (30m) they have different numbers of grid cells and were taken at slightly different angles (transform).

In [None]:
l8_da.transform is l5_da.transform

When these data are loaded into geoviews, the transform is automatically applied and the data can be overlayed on top of each other or operations can be performed on matching grid cells.

In [None]:
NDVI_1988_p = NDVI_1988.hvplot(crs=crs, rasterize=True, width=500, height=500).relabel('1988')
NDVI_2017_p = NDVI_2017.hvplot(crs=crs, rasterize=True, width=500, height=500).relabel('2017')

NDVI_1988_p + NDVI_2017_p

See [Walker_Lake](../Walker_Lake.ipynb) for more work on calculating the difference between the water levels over time.