## Working with Landsat 8 and NDVI

In this exercise, we will be analyzing the Landsat 8 data. The layer
we will be using is an ingested subset of the Landsat on AWS data, 
which contains data over 2016, over the continental US, and with
30% or less cloud cover.

There are 3 objectives in this exercise:

- __Objective 1__: Cloud mask and mosaic images for your county and view it on the map.
- __Objective 2__: Find the time in the layer that has the highest average NDVI.
- __Objective 3__: View the NDVI over the county for that date (where data is available).

In [None]:
import geopyspark as gps
from pyspark import SparkContext
import numpy as np
from datetime import datetime
from shapely.geometry import mapping, shape
import pyproj
from shapely.ops import transform
from functools import partial
import urllib.request, json
from geonotebook.wrappers import TMSRasterData
from PIL import Image
import pandas as pd
import matplotlib.pyplot as plt

### Setup: State data and Spark initialization

The next 2 cells grab the shapes for our state and start up the spark context.

In [None]:
# Grab data for New Mexico
state_name, county_name = "NM", "Colfax"
def get_state_shapes(state, county):
    project = partial(
        pyproj.transform,
        pyproj.Proj(init='epsg:4326'),
        pyproj.Proj(init='epsg:3857'))

    state_url = "https://raw.githubusercontent.com/johan/world.geo.json/master/countries/USA/{}.geo.json".format(state)
    county_url = "https://raw.githubusercontent.com/johan/world.geo.json/master/countries/USA/{}/{}.geo.json".format(state,county)
    read_json = lambda url: json.loads(urllib.request.urlopen(url).read().decode("utf-8"))
    state_ll = shape(read_json(state_url)['features'][0]['geometry'])
    state_wm = transform(project, state_ll)
    county_ll = shape(read_json(county_url)['features'][0]['geometry'])
    county_wm = transform(project, county_ll)
    return (state_ll, state_wm, county_ll, county_wm)

(state_ll, state_wm, county_ll, county_wm) = get_state_shapes(state_name, county_name) 

In [None]:
# Set up our spark context
conf = gps.geopyspark_conf(appName="Exercise 1") \
          .setMaster("local[*]") \
          .set(key='spark.ui.enabled', value='true') \
          .set(key="spark.driver.memory", value="8G") \
          .set("spark.hadoop.yarn.timeline-service.enabled", False)
sc = SparkContext(conf=conf)

### Setup: Band names and color ramp

The ingested layers have the RGB, near infrared, and QA bands of landsat 8 data.
This dict maps the band names to band index, for more readable code.

We also define a color ramp for viewing NDVI data.


In [None]:
bands = { "Blue": 0,
          "Green": 1,
          "Red": 2,
          "NIR": 3,
          "QA": 4 }
ndvi_breaks_dict = {0.05:0xffffe5aa, 0.1:0xf7fcb9ff, 0.2:0xd9f0a3ff, 0.3:0xaddd8eff, 0.4:0x78c679ff, 0.5:0x41ab5dff, 0.6:0x238443ff, 0.7:0x006837ff, 1.0:0x004529ff}
ndvi_color_map = gps.ColorMap.from_break_map(ndvi_breaks_dict)

## Objective 1: Cloud mask and mosaic images for your county and view it on the map.

Query the layer for your county during the summer months (6 - 8). Mosaic the images together using the functions defined below. Show the mosaiced layer on the map.

In [None]:
def mask_clouds(tile):
    # Use the Landsat QA band to mask out cloud values
    qa = tile.cells[bands["QA"]]
    #cloud = np.bitwise_and(qa, 0x4000)
    #cirrus = np.bitwise_and(qa, 0x2000)
    cloud = np.right_shift(qa, 14)
    result_bands = []
    for band in tile.cells[:-1]:
        band[cloud == 3] = 0
        result_bands.append(band)
    return gps.Tile.from_numpy_array(np.array(result_bands), no_data_value=0)

def mosaic(tiles):
    # Mosiac by taking the youngest pixel.
    sorted_tiles = sorted(list(tiles), key=lambda x: x[0], reverse=True)
    result = sorted_tiles[0][1].cells.copy()
    no_data_value = sorted_tiles[0][1].no_data_value
    for _, tile_to_merge in sorted_tiles[1:]:
        cells_to_merge = tile_to_merge.cells
        left_merge_condition = result[0] == no_data_value
        right_merge_condition = cells_to_merge[0] != tile_to_merge.no_data_value
        
        # We want to merge in data that is not already set
        # in the result (where all pixels are set to the no_data_value),
        # and where the incoming pixel represents data
        # (where any pixel does not equal the no_data_value)
        for band_idx in range(1, result.shape[0] - 1):
            left_merge_condition = left_merge_condition & \
                                   (result[band_idx] == no_data_value)
            right_merge_condition = right_merge_condition | \
                                    (cells_to_merge[band_idx] != tile_to_merge.no_data_value)
            
        result_bands = []
        for band_idx in range(0, result.shape[0]):
            band = result[band_idx]
            np.copyto(band, 
                      cells_to_merge[band_idx], 
                      where=(left_merge_condition) & \
                            (right_merge_condition))
            result_bands.append(band)
        result = np.array(result_bands)    

    return gps.Tile.from_numpy_array(result, no_data_value=no_data_value)

def render_image(tile):
    cells = tile.cells
    # Color correct - use magic numbers
    magic_min, magic_max = 4000, 15176
    norm_range = magic_max - magic_min
    cells = cells.astype('int32')
    # Clamp cells
    cells[(cells != 0) & (cells < magic_min)] = magic_min
    cells[(cells != 0) & (cells > magic_max)] = magic_max
    colored = ((cells - magic_min) * 255) / norm_range
    (r, g, b) = (colored[2], colored[1], colored[0])
    alpha = np.full(r.shape, 255)
    alpha[(cells[0] == tile.no_data_value) & \
          (cells[1] == tile.no_data_value) & \
          (cells[2] == tile.no_data_value)] = 0
    rgba = np.dstack([r,g,b, alpha]).astype('uint8')
    #return Image.fromarray(colored[1], mode='P')
    return Image.fromarray(rgba, mode='RGBA')

## Objective 2: Find the time in the layer that has the highest average NDVI.

Compute the NDVI values over your county for summer (don't forget to convert the cell type!). View that timeseries in a matplotlib graph. Then use the date with the highest average NDVI value to filter the layer into a spatial layer, and paint the NDVI values on the map.

Remember that NDVI is:

![ndvi eq](files/ndvi.png)

## Objective 3: View the NDVI over the county for that date (where data is available).

## Extra Credit: Mosaic over the county, taking each pixel that has the higher NDVI

Rewrite the mosaic function to always take the pixel with the higher NDVI value, and display that mosaic on the map. Use the numpy version NDVI provided below.

In [None]:
def compute_ndvi(cells):
    cells = cells.astype(float)
    red = cells[2]
    ir = cells[3]
    return  (ir - red) / (ir + red)