## Grand Canyon elevation classification using the NASADEM dataset

In this tutorial, you'll learn how to use different classification methods with [xarray-spatial](https://github.com/makepath/xarray-spatial) to classify the terrain elevation levels of the Grand Canyon.

Geo-spatial data [classification](http://wiki.gis.com/wiki/index.php/Classification) algorithms assign groups of data to categories, or classes, for further processing. Classification is used when grouping data points into classes for different colored areas on a choropleth map, for example. [xarray-spatial](https://github.com/makepath/xarray-spatial) is a raster analysis tool and contains different classification methods.

This tutorial walks you through:
1. Loading and rendering the area of interest data using the Grand Canyon's latitude and longitude.
2. Classifying the data using xarray-spatial's [natural breaks](https://xarray-spatial.org/reference/_autosummary/xrspatial.classify.natural_breaks.html), [equal interval](https://xarray-spatial.org/reference/_autosummary/xrspatial.classify.equal_interval.html), [quantile](https://xarray-spatial.org/reference/_autosummary/xrspatial.classify.quantile.html), and [reclassify](https://xarray-spatial.org/reference/_autosummary/xrspatial.classify.reclassify.html) functions.


This tutorial uses the [NASADEM](https://github.com/microsoft/AIforEarthDatasets#nasadem) dataset from the [Microsoft Planetary Computer Data Catalog](https://planetarycomputer.microsoft.com/catalog). The area of interest roughly covers the Grand Canyon National Park. The [NASADEM](https://github.com/microsoft/AIforEarthDatasets#nasadem) dataset provides global topographic data at 1 arc-second (~30m) horizontal resolution. The data is derived primarily from data captured via the [Shuttle Radar Topography Mission](https://www2.jpl.nasa.gov/srtm/) (SRTM) and is stored on Azure Storage in [cloud-optimized GeoTIFF](https://www.cogeo.org/) format.


### 1. Load the area of interest data

To load NASADEM data for the Grand Canyon, use the following approach described in [Accessing NASADEM data on Azure (NetCDF)](https://github.com/microsoft/AIforEarthDataSets/blob/main/data/nasadem-nc.ipynb).

First, set up the necessary constants and generate a list of all available GeoTIFF files:

In [None]:
import requests

nasadem_blob_root = "https://nasademeuwest.blob.core.windows.net/nasadem-cog/v001/"
nasadem_file_index_url = nasadem_blob_root + "index/nasadem_cog_list.txt"
nasadem_content_extension = ".tif"
nasadem_file_prefix = "NASADEM_HGT_"
nasadem_file_list = None

nasadem_file_list = requests.get(nasadem_file_index_url).text.split("\n")

Next, define a function that selects a filename from the list generated in the previous step. This function accepts a list of latitude and longitude coordinates and returns the name of the file matching these coordinates:

In [None]:
import math


def get_nasadem_filename(coord):
    """
    Get the NASADEM filename for a specified latitude and longitude.
    """
    lat = coord[0]
    lon = coord[1]

    ns_token = "n" if lat >= 0 else "s"
    ew_token = "e" if lon >= 0 else "w"

    lat_index = abs(math.floor(lat))
    lon_index = abs(math.floor(lon))

    lat_string = ns_token + "{:02d}".format(lat_index)
    lon_string = ew_token + "{:03d}".format(lon_index)

    filename = nasadem_file_prefix + lat_string + lon_string + nasadem_content_extension

    if filename not in nasadem_file_list:
        print("Lat/lon {},{} not available".format(lat, lon))
        filename = None

    return filename

Finally, use the function defined above to generate a URL pointing to the geodata for the Grand Canyon area:

In [None]:
grand_canyon_coord = [36.101690, -112.107676]

url = nasadem_blob_root + get_nasadem_filename(grand_canyon_coord)

After retrieving the raw data for the Grand Canyon, use xarray's [open_rasterio](http://xarray.pydata.org/en/stable/generated/xarray.open_rasterio.html) function to load the data into an array:

In [None]:
import xarray as xr

img_arr = xr.open_rasterio(url).squeeze().drop("band")

In [None]:
img_arr.plot.imshow(figsize=(15, 10));

### 2. Classify elevation data

#### Classify with `natural_breaks()`

Use the [natural breaks](https://xarray-spatial.org/reference/_autosummary/xrspatial.classify.natural_breaks.html) function to classify data with the [Jenks natural breaks classification](http://wiki.gis.com/wiki/index.php/Jenks_Natural_Breaks_Classification) method. This method is designed to distribute data into classes according to clusters that form a "natural" group within the data. The algorithm minimizes the average deviation from the class mean while also maximizing the deviation from the means of the other groups. Therefore, it is generally not recommended for data with low variance.

In [None]:
from datashader.transfer_functions import shade
import matplotlib.pyplot as plt
from xrspatial.classify import natural_breaks

natural_breaks_agg = natural_breaks(img_arr, num_sample=20000, k=15)

shade(natural_breaks_agg, cmap=plt.get_cmap("terrain"), how="linear")

You can see in the image above that the different elevation levels of the Grand Canyon are now identified by a limited number of distinct colors. Each color represents a range of values within the classified data. For example, the dark blue color areas represent the smallest elevation levels with around 700m of altitude, the yellow color areas represent elevation levels of around 1700m, and the white color areas represent the highest elevations of around 2500m of altitude.

#### Classify with `equal_interval()`

To classify data into sets based on intervals of equal width, use the [equal interval](https://xarray-spatial.org/reference/_autosummary/xrspatial.classify.equal_interval.html) function. The [equal interval classification](http://wiki.gis.com/wiki/index.php/Equal_Interval_classification) is useful in cases where you want to emphasize the amount of an attribute value relative to the other values.

In [None]:
from xrspatial.classify import equal_interval

equal_interval_agg = equal_interval(img_arr, k=15)

shade(equal_interval_agg, cmap=plt.get_cmap("terrain"), how="linear")

#### Classify with `quantile()`

To classify data based on quantile groups of equal size, use the [quantile](https://xarray-spatial.org/reference/_autosummary/xrspatial.classify.quantile.html) function. With [quantile classification](http://wiki.gis.com/wiki/index.php/Quantile), each class contains the same amount of data points. This means that each class is equally represented on the map. However, intervals of uneven sizes can lead to an over-weighting of outliers and other effects.

In [None]:
from xrspatial.classify import quantile

quantile_agg = quantile(img_arr, k=15)

shade(quantile_agg, cmap=plt.get_cmap("terrain"), how="linear")

### Use custom bins with `reclassify`

To define your own arbitrary bins to classify data, use the [reclassify](https://xarray-spatial.org/reference/_autosummary/xrspatial.classify.reclassify.html) function. This function is helpful to highlight specific sections of your data, for example. Use `reclassify()` to only visualize elevations greater than 2500m:

In [None]:
from xrspatial.classify import reclassify

bins = range(150, 5200, 350)
new_vals = [val if val > 2500 else 0 for val in bins]

reclass_agg = reclassify(
    agg=img_arr,
    bins=bins,
    new_values=new_vals,
)

shade(reclass_agg, cmap=plt.get_cmap("terrain"), how="linear")

## Next steps: classify different datasets

The [Microsoft Planetary Computer Data Catalog](https://planetarycomputer.microsoft.com/catalog) includes petabytes of environmental monitoring data. All data sets are available in consistent, analysis-ready formats. You can access them through APIs as well as directly via [Azure Storage](https://docs.microsoft.com/en-us/azure/storage/). 

Try using [xarray-spatial's](https://xarray-spatial.org/index.html) classification methods with these datasets:

<div style="width: 100%; overflow: hidden;">
     <div style="width: 50%; float: left;"> 
  
  <center><img src="https://ai4edatasetspublicassets.blob.core.windows.net/assets/pc_thumbnails/additional_datasets/RE3CbUs.jpg" /></center>
<br>
<center><font size="5">Daymet</font>
<center><font size="2">Gridded temperature data across North America</font>
<center><a href="http://aka.ms/ai4edata-daymet" target="_blank">Get Daymet temperature data</a>
  </div>
     <div style="margin-left: 50%;"> 
  <center><img src="https://ai4edatasetspublicassets.blob.core.windows.net/assets/pc_thumbnails/additional_datasets/gbif.jpg" /><center>
<br>
<center><font size="5">GBIF</font>
<center><font size="2">Species occurrences shared through the Global Biodiversity Information Facility</font>
<center><a href="http://aka.ms/ai4edata-gbif/" target="_blank">Get GBIF occurrence data</a>
  </div>
</div>