## Case Study: Using 'Local' functions to analyze urban development and the frequency of landcover change in South Jordan Utah.

**Modified from: https://planetarycomputer-staging.microsoft.com/dataset/naip#Example-Notebook**


## Accessing NAIP data with the Planetary Computer STAC API

### Environment setup

This notebook works with or without an API key, but you will be given more permissive access to the data with an API key.
The [Planetary Computer Hub](https://planetarycomputer.microsoft.com/compute) is pre-configured to use your API key.

In [None]:
from pystac_client import Client
import planetary_computer as pc

# Set the environment variable PC_SDK_SUBSCRIPTION_KEY, or set it here.
# The Hub sets PC_SDK_SUBSCRIPTION_KEY automatically.
# pc.settings.set_subscription_key(<YOUR API Key>)

### Choose our region and times of interest

This area is in the Simplot Family farm, one of the largest farms in the US, located outside of Boise Idaho .  Let's see whether we can see some development in this area in the time spanned by our NAIP collection.

In [None]:
area_of_interest = {
    "type": "Polygon",
    "coordinates": [
        [
            [-116.05,43.03],
            [-115.97,43.03],
            [-115.97,43.07],
            [-116.05,43.07],
            [-116.05,43.03]            
        ]
    ],
}

### Search the collection and choose scenes to render

Use [pystac-client](https://github.com/stac-utils/pystac-client) to search for data from the [NAIP](http://aka.ms/ai4edata-naip) collection.  This collection includes data from 2010 to 2019, so we'll search for one image near the beginning of that range, one near the middle, and one near the end.

In [None]:
range1 = '2010-01-01/2013-01-01'
range2 = '2013-01-01/2017-01-01'
range3 = '2018-01-01/2020-01-01'

In [None]:
catalog = Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")

search1 = catalog.search(
    collections=['naip'], 
    intersects=area_of_interest,
    datetime = range1
)

search2 = catalog.search(
    collections=['naip'], 
    intersects=area_of_interest,
    datetime = range2
)

search3 = catalog.search(
    collections=['naip'], 
    intersects=area_of_interest,
    datetime = range3
)

print(f"{search1.matched()} Items found in the range 1")
print(f"{search2.matched()} Items found in the range 2")
print(f"{search3.matched()} Items found in the range 3")

As seen above, there are multiple items that intersect our area of interest for each year. The following code will choose the item that has the most overlap:

In [None]:
from shapely.geometry import shape

area_shape = shape(area_of_interest)
target_area = area_shape.area

def area_of_overlap(item):
    overlap_area = shape(item.geometry).intersection(shape(area_of_interest)).area    
    return overlap_area / target_area

item1 = sorted(search1.items(), key=area_of_overlap, reverse=True)[0]
item2 = sorted(search2.items(), key=area_of_overlap, reverse=True)[0]
item3 = sorted(search3.items(), key=area_of_overlap, reverse=True)[0]

### Render images

Each Item has an `href` field containing a URL to the underlying image. For NAIP, these URLs are publicly-accessible, but for some data sets, these URLs may point to private containers, so we demonstrate the use of the [planetary-computer](https://github.com/microsoft/planetary-computer-sdk-for-python) package's `pc.sign` method, which adds a [Shared Access Signature](https://docs.microsoft.com/en-us/azure/storage/common/storage-sas-overview) to the URL, after which it can be used by any tooling that expects a standard URL.

In [None]:
import rasterio
from rasterio import windows
from rasterio import features
from rasterio import warp

import numpy as np
from PIL import Image

red_band = 1
green_band = 2
blue_band = 3
nir_band = 4

def create_image(item):
    print(item.datetime)
    href = pc.sign(item.assets['image'].href)
    with rasterio.open(href) as ds:    
        aoi_bounds = features.bounds(area_of_interest)
        warped_aoi_bounds = warp.transform_bounds('epsg:4326', ds.crs, *aoi_bounds)
        aoi_window = windows.from_bounds(transform=ds.transform, *warped_aoi_bounds)
        band_data = ds.read(indexes=[nir_band, red_band], window=aoi_window)

    img = Image.fromarray(np.transpose(band_data, axes=[1, 2, 0]))
    w = img.size[0]; h = img.size[1]; aspect = w/h


    # Downscale a bit for plotting
    target_w = 800; target_h = (int)(target_w/aspect)

    return img.resize((target_w,target_h),Image.BILINEAR)

In [None]:
%%time

img1 = create_image(item1)
img2 = create_image(item2)
img3 = create_image(item3)

## Using xarray-spatial to gather statistical information

First we will conver the images to `xarray.DataArray` format and then merge them together for easier viewing with `xarray.merge()`

In [None]:
import xarray as xr
import numpy as np
from datashader.transfer_functions import shade, stack, Images
from datashader.colors import Elevation, Hot

In [None]:
%%time

# Convert to DataArrays of the same size
agg1 = xr.DataArray(data = img1, name = '2011-07-01', dims = ['x', 'y','band'])[:700, :800, :]
agg2 = xr.DataArray(data = img2, name = '2013-08-18', dims = ['x', 'y','band'])[:700, :800, :]
agg3 = xr.DataArray(data = img3, name = '2019-07-14', dims = ['x', 'y','band'])[:700, :800, :]

# Merge Arrays into single dataset
ds = xr.merge([agg1, agg2, agg3])

# Near-Infrared Band
Images(shade(ds['2011-07-01'][:,:,1]),
       shade(ds['2013-08-18'][:,:,1]),
       shade(ds['2019-07-14'][:,:,1]))

# Red Band
Images(shade(ds['2011-07-01'][:,:,0]),
       shade(ds['2013-08-18'][:,:,0]),
       shade(ds['2019-07-14'][:,:,0]))

### Create NDVI Images
The Normalized Difference Vegetation Index (NDVI) is an indicator used to detect live green vegetation.

In [None]:
%%time

from xrspatial.multispectral import ndvi

# Create arrays using the NIR band (0) and the red band (1)
ndvi_2011 = ndvi(nir_agg = ds['2011-07-01'][:, :, 1], red_agg = ds['2011-07-01'][:, :, 0], name = '2011-07-01')
ndvi_2013 = ndvi(nir_agg = ds['2013-08-18'][:, :, 1], red_agg = ds['2013-08-18'][:, :, 0], name = '2013-08-18')
ndvi_2019 = ndvi(nir_agg = ds['2019-07-14'][:, :, 1], red_agg = ds['2019-07-14'][:, :, 0], name = '2019-07-14')

# Merge Arrays into single dataset
ds_ndvi = xr.merge([ndvi_2011, ndvi_2013, ndvi_2019])

Images(shade(ndvi_2011, cmap = Hot),
       shade(ndvi_2013, cmap = Hot),
       shade(ndvi_2019, cmap = Hot))


Here we see that there appears to be more vegetation in July and that July of 2019 appeared to have a better growing season than 2011. The dinsity and shape of the NDVI values indicate that we are likely looking at farmland.

## Mean & Median
Let's find the average and median value of each cell over each time period. These methods are useful for determining the 'true' value of a particular cell, the closer the average/median value is to the value we are looking for, the more likely we are to see that value over time periods outside of our study. While the functions are similar, median will remove outliers which in some cases will prove usefull when large amounts of wild vegetation are present.

In [None]:
%%time

mean = ds_ndvi.to_array(dim = 'new').mean('new').rename('mean')
median = ds_ndvi.to_array(dim = 'new').median('new').rename('median')

Images(shade(mean, cmap = Hot),
       shade(median, cmap = Hot))

## Majority & Minority
Let's find the value that occurs most often on a cell-by-cell basis over each time period. Different crops are likely to produce different NDVI values, determining the most common values can help us distinguish crops. Crops are also more likely to be highly to be clustered, resulting in more of that value within the image.

In [None]:
%%time

def maximum(array1, array2, array3):
    a = np.fmax(array1.data, array2.data)
    b = np.fmax(a, array3.data)
    out = xr.DataArray(b, name = 'maximum')
    return out

def minimum(array1, array2, array3):
    a = np.fmin(array1.data, array2.data)
    b = np.fmin(a, array3.data)
    out = xr.DataArray(b, name = 'minimum')
    return out

min_agg = minimum(ndvi_2011, ndvi_2013, ndvi_2019)
max_agg = maximum(ndvi_2011, ndvi_2013, ndvi_2019)

Images(shade(min_agg, cmap = Hot),
       shade(max_agg, cmap = Hot))

In [None]:
# # First lets stack the data using `xarray.concat()`
# concat = xr.concat([ds_ndvi[i] for i in ds_ndvi], 'year')
# concat = concat.rename('2011-2019')
# 
# # Then iterate through the year axis (0) and look for unique values using `numpy.unique()`.
# axis = 0
# u, indices = np.unique(concat.data, return_inverse=True)
# 
# # Then create an array showing the most common value on a cell-by-cell basis.
# majority = u[np.argmax(np.apply_along_axis(np.bincount, axis, indices.reshape(concat_data.shape), None, np.max(indices) + 1), axis = axis)]
# majority = xr.DataArray(majority)
# 
# # First lets stack the data using `xarray.concat()`
# concat = xr.concat([ds_ndvi[i] for i in ds_ndvi], 'year')
# concat = concat.rename('2011-2019')
# 
# # Then iterate through the year axis (0) and look for unique values using `numpy.unique()`.
# axis = 0
# u, indices = np.unique(concat.data, return_inverse=True)
# 
# # Then create an array showing the least common value on a cell-by-cell basis.
# minority = u[np.argmin(np.apply_along_axis(np.bincount, axis, indices.reshape(concat_data.shape), None, np.min(indices) + 1), axis = axis)]
# minority = xr.DataArray(minority)

## Maximum, Minimum and Range
Let's find the largest and smallest value of each cell over each time period, as well as the difference between them. These functions could be useful in distinguishing productive and barren land.

In [None]:
%%time

maximum = ds_ndvi.to_array(dim = 'new').max('new').rename('maximum')
minimum = ds_ndvi.to_array(dim = 'new').min('new').rename('minimum')
range_agg = maximum - minimum
range_agg = range_agg.rename('range')

Images(shade(maximum, cmap= Hot),
      shade(minimum, cmap = Hot),
      shade(range_agg, cmap = Hot))

## Standard Deviation & Sum
Let's find the standard deviation and sum of each cell over each band and time period.

In [None]:
%%time

standard_deviation = ds_ndvi.to_array(dim = 'new').std('new').rename('standard deviation')
sum_agg = ds_ndvi.to_array(dim = 'new').sum('new').rename('sum')

Images(shade(standard_deviation, cmap = Hot),
       shade(sum_agg, cmap = Hot))

## Variety
Let's find all of the unique values across each time period.

In [None]:
# First let's concatenate the ndvi arrays 
concat_ndvi = xr.concat([ds_ndvi[i] for i in ds_ndvi], 'year')
concat_ndvi = concat_ndvi.rename('2011-2019')

# Then find all unique values and count them
(unique, counts) = np.unique(concat_ndvi.data, return_counts=True)
frequencies = np.asarray((unique, counts)).T

np.sort(frequencies)

## Combine
Let's combine the rasters by band and look for unique combinations. If our hypothesis is correct, these crops should be producing the same NDVI value and we should see some recurring combinations.

In [None]:
%%time

def combine_arrays(array1, array2, array3):
    unique_combos = {} 
    unique_values = {}
    all_combos = []
    all_values = []
    value = 1

    # Iterate through each array simultaneously
    for a, b, c in np.nditer([array1.data, array2.data, array3.data]):
        combo = (a.item(), b.item(), c.item())
        if np.isnan(combo).any() == True:   # skip nan
            all_values.append(np.nan)
            all_combos.append('NAN')
            continue
        if combo in unique_combos.keys():   # apply 0 combos already found
            all_combos.append(combo)
            all_values.append(0)
        else:                               # apply new value to unique combos
            unique_combos[combo] = value
            unique_values[value] = combo
            all_combos.append(combo)
            all_values.append(value)
            value += 1

    # apply new value to matching combos
    k = 0
    for value in all_values:
        if value == 0:
            combo = all_combos[k]
            all_values[k] = [unique_combos[combo]][0]
        k += 1

    # create new array
    new_array = np.array(all_values)
    new_array = np.reshape(new_array, (-1, array1.shape[1]))

    out = xr.DataArray(
        data = new_array,
        attrs = dict(
            key = unique_values
        )
    )

    return out

combine_agg = combine_arrays(ndvi_2011, ndvi_2013, ndvi_2019)
shade(combine_agg, cmap = Hot)

Here the gradient represents each unique combination, the dotted areas of the same color represent combinations that were found lower in the image.

## Equal to Frequency

Let's find how many times the values of the rasters equal another.

In [None]:
def equal_frequency(val_agg, agg_list):

    out = []
    in_aggs = [val_agg]
    for agg in agg_list:
        in_aggs.append(agg)

    # Iterate through each array simultaneously
    for v, a, b, c in np.nditer(in_aggs):
        count = 0
        if np.isnan((a, b, c)).any() == True:   # skip nan
            out.append(np.nan)
            continue
        if v == a:
            count += 1
        if v == b:
            count += 1
        if v == c:
            count += 1
        out.append(count)

    # create new array
    out = np.array(out)
    out = np.reshape(out, (-1, agg_list[0].shape[1]))
    out = xr.DataArray(out)
    return out

val_arr = np.zeros_like(ndvi_2011.data)
val_agg = xr.DataArray(val_arr)
agg_list = [ndvi_2011, ndvi_2011, ndvi_2019]
ef = equal_frequency(val_agg, agg_list)
shade(ef, cmap = Hot)

Here we see the how many times a cell is equal to 0 among the three rasters. Black represents no occurances, yellow represents one occurance and white represents two. Note that in this example 'ndvi_2011' is listed twice. Because there are a limited number of cells where all values are the same, this is done to show change in frequency.

## Greater than Frequency

Let's find how many times the values of the rasters greater than another.

In [None]:
def greater_frequency(val_agg, agg_list):

    out = []
    in_aggs = [val_agg]
    for agg in agg_list:
        in_aggs.append(agg)

    # Iterate through each array simultaneously
    for v, a, b, c in np.nditer(in_aggs):
        count = 0
        if np.isnan((a, b, c)).any() == True:   # skip nan
            out.append(np.nan)
            continue
        if v < a:
            count += 1
        if v < b:
            count += 1
        if v < c:
            count += 1
        out.append(count)

    # create new array
    out = np.array(out)
    out = np.reshape(out, (-1, agg_list[0].shape[1]))
    out = xr.DataArray(out)
    return out

val_arr = np.full_like(ndvi_2011.data, 0.5)
val_agg = xr.DataArray(val_arr)
agg_list = [ndvi_2011, ndvi_2013, ndvi_2019]
gf = greater_frequency(val_agg, agg_list)
shade(gf, cmap = Hot)

Here we see the how many times a cell is greater than 0.5 among the three rasters. Black represents no occurances, orange represents one, yellow represents two and white represents three.

## Highest Position

Let's find the position of the raster with the maximum value.

In [None]:
def highest_array(agg_list):
    i = -1
    for a in agg_list:
        highest = xr.ufuncs.maximum(agg_list[i], agg_list[i+1])
        i+= 1

    out = []
    in_aggs = [highest]
    for agg in agg_list:
        in_aggs.append(agg)

    temp_vars = []
    for number in range(0, len(agg_list)):
        temp_vars.append(number)

    for h, a, b, c in np.nditer(in_aggs):
        if np.isnan((a, b, c)).any() == True:
            out.append(np.nan)
            continue
        if h == a:
            out.append(1)
        elif h == b:
            out.append(2)
        elif h == c:
            out.append(3)

    out = np.array(out)
    out = np.reshape(out, (-1, agg_list[0].shape[1]))
    out = xr.DataArray(out)
    return(out)

agg_list = [ndvi_2011, ndvi_2013, ndvi_2019]
ha = highest_array(agg_list)
shade(ha, cmap = Hot)


Here we see the position of the raster with the highest value. Black represents the position of first raster, in this case 'ndvi_2011' yellow is 1, white represents the position of 'ndvi_2013' or 2, and white represents the position of 'ndvi_2019' or 3.

## Less than Frequency

Let's find how many times the values of the rasters less than another.

In [None]:
def lesser_frequency(val_agg, agg_list):

    out = []
    in_aggs = [val_agg]
    for agg in agg_list:
        in_aggs.append(agg)

    # Iterate through each array simultaneously
    for v, a, b, c in np.nditer(in_aggs):
        count = 0
        if np.isnan((a, b, c)).any() == True:   # skip nan
            out.append(np.nan)
            continue
        if v > a:
            count += 1
        if v > b:
            count += 1
        if v > c:
            count += 1
        out.append(count)

    # create new array
    out = np.array(out)
    out = np.reshape(out, (-1, agg_list[0].shape[1]))
    out = xr.DataArray(out)
    return out

val_arr = np.full_like(ndvi_2011.data, 0.5)
val_agg = xr.DataArray(val_arr)
agg_list = [ndvi_2011, ndvi_2013, ndvi_2019]
lf = lesser_frequency(val_agg, agg_list)
shade(lf, cmap = Hot)

Here we see the how many times a cell is greater than 0.5 among the three rasters. Black represents no occurances, dark red represents one, red represents two and white represents three.

## Lowest Position

Let's find the position of the raster with the minimum value.

In [None]:
def lowest_array(agg_list):
    for i in agg_list:
        lowest = xr.ufuncs.minimum(i, i+1)

    out = []
    in_aggs = [lowest]
    for agg in agg_list:
        in_aggs.append(agg)

    for l, a, b, c in np.nditer(in_aggs):
        if np.isnan((a, b, c)).any() == True:
            out.append(np.nan)
            continue
        if l == a:
            out.append(1)
        elif l == b:
            out.append(2)
        else:
            out.append(3)

    out = np.array(out)
    out = np.reshape(out, (-1, agg_list[0].shape[1]))
    out = xr.DataArray(out)
    return(out)

agg_list = [ndvi_2011, ndvi_2013, ndvi_2019]
la = lowest_array(agg_list)
shade(ha, cmap = Hot)


Here we see the position of the raster with the highest value. Black represents the position of first raster, in this case 'ndvi_2011' yellow is 1, white represents the position of 'ndvi_2013' or 2, and white represents the position of 'ndvi_2019' or 3.

## Popularity


In [None]:
def popularity(pop_agg, agg_list):

    out = []
    in_aggs = [pop_agg]
    for agg in agg_list:
        in_aggs.append(agg)

    for p, a, b, c in np.nditer(in_aggs):
        if np.isnan((a, b, c)).any() == True:   # skip nan
            out.append(np.nan)
            continue

        inputs = np.array([a, b, c])

        count_a = np.count_nonzero(inputs == a)
        count_b = np.count_nonzero(inputs == b)
        count_c = np.count_nonzero(inputs == c)
        counts = np.array([count_a, count_b, count_c])

        countsI = counts.argsort()
        sorted_inputs = inputs[countsI][::-1]
        sorted_counts = counts[countsI][::-1]

        first = 0
        second = 0
        third = 0

        if sorted_counts[0] == 1:
            out.append(np.nan)
            continue
        elif sorted_counts[0] == 2:
            first = sorted_inputs[0]
            second = sorted_inputs[2]
            third = np.nan
        elif sorted_counts[0] == 3:
            first = sorted_inputs[0]
            second = sorted_inputs[1]
            third = sorted_inputs[2]

        if p == 1:
            out.append(first)
        elif p == 2:
            out.append(second)
        elif p == 3:
            out.append(third)

    # create new array
    out = np.array(out)
    out = np.reshape(out, (-1, agg_list[0].shape[1]))
    out = xr.DataArray(out)
    return out

val_arr = np.full_like(ndvi_2011.data, 2)
val_agg = xr.DataArray(val_arr)
agg_list = [ndvi_2011, ndvi_2013, ndvi_2019]
p = popularity(val_agg, agg_list)
shade(p, cmap = Hot)

Here we see the position of the raster with the 2nd most popular value. Here black represents raster 1, orange raster , red raster three and white represents a cell where all values are the same level of popularity.

## Rank

In [None]:
def rank(val_agg, agg_list):
    out = []
    in_aggs = [val_agg]
    for agg in agg_list:
        in_aggs.append(agg)

    # Iterate through each array simultaneously
    for v, a, b, c in np.nditer(in_aggs):
        if np.isnan((a, b, c)).any() == True:   # skip nan
            out.append(np.nan)
            continue

        sort = np.sort((a, b, c))[::-1]
        if v == 1:
            out.append(sort[2])
        elif v == 3:
            out.append(sort[0])
        else:
            out.append(sort[1])
    
    # create new array
    out = np.array(out)
    out = np.reshape(out, (-1, agg_list[0].shape[1]))
    out = xr.DataArray(out)
    return out

val_arr = np.full_like(ndvi_2011.data, 3)
val_agg = xr.DataArray(val_arr)
agg_list = [ndvi_2011, ndvi_2013, ndvi_2019]
r = rank(val_agg, agg_list)
shade(r, cmap = Hot)

Here we see the 2nd ranked value of each cell. Here black represents raster 1, orange raster , red raster three and white represents a cell where al values are of equal rank (equal value).