# Data access with STAC

This section introduces [STAC](https://stacspec.org/), the SpatioTemporal Asset Catalog. STAC provides a standardized way to structure metadata about spatialotemporal data. The STAC community are building APIs and tools on top of this structure to make working with spatiotemporal data easier.

This notebook will give a brief introduction to the concepts defined by STAC, before moving on to using the STAC API easily find and load data.

## Introduction to STAC

Users of STAC will interact most often with **Collections** and **Items** (there's also **Catalogs**, which group together collections). A Collection is just a collection of items, plus some additional metadata like the license and summaries of what's available on each item. You can view the collections available on the Planetary Computer at https://planetarycomputer.microsoft.com/catalog. There's also [STAC Index](https://stacindex.org/), which maintains a list of public catalogs.

Let's load up the collection for Sentinel 2 Level 2-A, and compare it to the [HTML version](https://planetarycomputer.microsoft.com/dataset/sentinel-2-l2a) on the Planetary Computer website (which is generated from the STAC metadata).

In [None]:
import pystac

sentinel2 = pystac.read_file(
    "https://planetarycomputer.microsoft.com/api/stac/v1/collections/sentinel-2-l2a/"
)
sentinel2

In [None]:
print(sentinel2.description)

In [None]:
sentinel2.extent.spatial.to_dict()

In [None]:
sentinel2.extent.temporal.to_dict()

Now let's take a look at a specific item. In this case, we'll load an item that covers the area just north of Buenos Aires: https://planetarycomputer.microsoft.com/api/stac/v1/collections/sentinel-2-l2a/items/S2A_MSIL2A_20210811T135121_R024_T21HUC_20210812T044104

In [None]:
item = pystac.read_file(
    "https://planetarycomputer.microsoft.com/api/stac/v1/collections"
    "/sentinel-2-l2a/items/"
    "S2A_MSIL2A_20210811T135121_R024_T21HUC_20210812T044104"
)
item

The STAC item has a whole bunch of metadata about the Sentinel-2 scene.

### Exercise: What time was the scene captured?

In [None]:
%load solutions/stac-item-datetime.py
# What time was the scene captured?

### Exercise: What is the bounding box for the item?

In [None]:
%load solutions/stac-item-bbox.py
# What is the bounding box for the item?

The core STAC specification covers things that are common to pretty much any geospatial dataset. STAC is built to be extensible to facilate catologing metadata that's specific to certain datasets or certain kinds of datasets.

For example, the `proj` extension allows you to get projection-related information, like the EPSG code:

In [None]:
from pystac.extensions.projection import ProjectionExtension

ProjectionExtension.ext(item).epsg

Or the [geo `transform`](https://github.com/stac-extensions/projection#projtransform)

In [None]:
ProjectionExtension.ext(item.assets["B03"]).transform

## Exercise: How cloudy is the item?

Use `pystac.extensions.eo.EOExtension` to get the cloud cover. See https://pystac.readthedocs.io/en/latest/api.html?highlight=EOExtension#eoextension for help.

In [None]:
%load solutions/stac-item-cloud-cover.py
# How cloud is the item? Use pystac.extension.eo.EOExtension

You can view the full list of STAC extensions at <https://stac-extensions.github.io/>

### Assets

STAC is a *metadata* standard. It doesn't really deal with data files directly. Instead, it links to the data files under the "assets" property.

In [None]:
item.assets

Browsing through that list, most of the assets are link to Cloud Optimized GeoTIFF files in Azure Blob Storage. There are a few other metadata-related assets, and some "preview"-related assets.

We can use the "tilejson" URL, combined with `ipyleaflet` and the Item's geometry to quickly visualize the data.

In [None]:
from ipyleaflet import Map, TileLayer, GeoJSON, FullScreenControl
import shapely.geometry
import requests

center = shapely.geometry.shape(item.geometry).centroid.bounds[:2][::-1]

m = Map(center=center, zoom=12)
layer = TileLayer(
    url=requests.get(item.assets["tilejson"].href).json()["tiles"][0],
)
m.add_layer(layer)
m.add_control(FullScreenControl())

m.scroll_wheel_zoom = True
m

### Signing for the Planetary Computer

If you try to open any of the data assets (e.g. "B02" for the blue band) you'll get a 404 error.

In [None]:
import requests

r = requests.get(item.assets["B02"].href)
r.status_code

So while the STAC *metadata* is all public, *data* from the Planetary Computer is typically in private Blob Storage containers. But, you can access it anonymously by *signing* the item / asset / href.

In [None]:
import rioxarray
import planetary_computer

ds = rioxarray.open_rasterio(planetary_computer.sign(item.assets["B02"].href))
ds

You can sign ItemCollections, Items, Assets, or URLs. Once a URL has been signed, it can be opened with any of your favorite tools (rioxarray, rasterio / GDAL, QGIS, etc.).

## Querying the STAC API

In the examples we've seen so far, we've just been given a STAC item. How do you find the items you want in the first place? That's where a STAC **API** comes in.

A STAC API is some web service that accepts queries and returns STAC objects. The ability to handle queries is what differentiates a STAC API from a *static* STAC catalog, where items are just present on some file system.

![image.png](ms-stac.png)

Visit https://planetarycomputer.microsoft.com/api/stac/v1/docs for documentation on a STAC API with HTTP requests. We'll use [pystac-client](https://pystac-client.readthedocs.io/en/latest/).

In [None]:
import pystac_client

bbox = [-58.92, -34.81, -58.26, -34.18]
date_range = "2020-01-01/2020-12-31"

catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1"
)
search = catalog.search(
    collections=["sentinel-2-l2a"],
    bbox=bbox,
    datetime=date_range,
)

In [None]:
%%time
items = list(search.get_all_items())

In [None]:
import shapely.geometry

item = items[0]
center = shapely.geometry.shape(item.geometry).centroid.bounds[:2][::-1]

m = Map(center=center, zoom=8)
layer = TileLayer(
    url=requests.get(item.assets["tilejson"].href).json()["tiles"][0],
)
m.add_layer(layer)
m.add_control(FullScreenControl())

m.scroll_wheel_zoom = True
m

Oops, that image is pretty cloudy. Let's check its cloud cover.

In [None]:
EOExtension.ext(item).cloud_cover

Fortunately, with the STAC API, it's possible to filter out images that are cloudier than some threshold.

In [None]:
import pystac_client

bbox = [-58.92, -34.81, -58.26, -34.18]
date_range = "2020-01-01/2020-12-31"

catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1"
)
low_cloud_search = catalog.search(
    collections=["sentinel-2-l2a"],
    bbox=bbox,
    datetime=date_range,
    query={"eo:cloud_cover": {"lt": 10}},  # Just return items with few clouds
)

In [None]:
%time items = list(low_cloud_search.get_all_items())

In [None]:
len(items)

Let's visualize that item, along with our bounding box.

In [None]:
item = items[0]
center = shapely.geometry.shape(item.geometry).centroid.bounds[:2][::-1]

m = Map(center=center, zoom=9)
layer = TileLayer(
    url=requests.get(item.assets["tilejson"].href).json()["tiles"][0],
)
m.add_layer(layer)
m.add_control(FullScreenControl())
m.add_layer(GeoJSON(data=shapely.geometry.mapping(shapely.geometry.box(*bbox))))
m.scroll_wheel_zoom = True
m

Notice that there are still a few clouds near the top of the image, but overall there aren't too many.

Also notice that our bounding box isn't completly covered by the one Sentinel 2 scene. It seems like we'll need to combine multiple scenes to cover it. We'll do that next.

## Loading STAC Items into xarray

Many remote sensing datasets are provided as collections of COGs. Each COG covers some area, and was taken at a specific time and a specific wavelength. It's very convienient to work with this as a 4-dimensional DataArray indexed by `(time, band, lon, lat)`. The `stackstac` library is a great way to build this type of DataArray.

In [None]:
signed_items = [planetary_computer.sign(item).to_dict() for item in items]

In [None]:
%%time
import stackstac

ds = stackstac.stack(signed_items, assets=["B02", "B03", "B04", "B08"])
ds

With `stackstac` we can go from a collection of STAC items to a DataArray in a single function call.

But look at the size of that DataArray! We aren't quite ready to tackle it yet, so let's narrow things down a bit. We will

* Crop the data down to our bounding box
* Upsample the data to 90m resolution (instead of the native 10m)

In [None]:
ds = stackstac.stack(
    signed_items, assets=["B02", "B03", "B04", "B08"], bounds_latlon=bbox, resolution=90
).where(lambda x: x > 0)  # filter out nodata
ds

Let's spin up a local "cluster" so that we can watch the computaiton progress.

In [None]:
from distributed import Client

client = Client()
client

In [None]:
# AOI spans two scene boundaries
ds = ds.compute().ffill(dim="time")

So we just

1. Queried the millions of Sentinel-2 Level 2-A scenes for just ones matching our requirements (location, time, cloudiness) using the STAC API
2. Assembled the matching scenes into a 4-d DataArray using stackstac
3. Loaded those into memory in parallel using Dask

Now we ready to do some analytics. Let's compute NDVI.

In [None]:
blue = ds.sel(band="B02")
green = ds.sel(band="B03")
red = ds.sel(band="B04")
nir = ds.sel(band="B08")

ndvi = (nir - red) / (nir + red)

We can plot a single scene:

In [None]:
ndvi[1].plot.imshow(figsize=(12, 9), cmap="YlGn");

Or the timeseries (averaged over latitude and longitude)

In [None]:
ts = ndvi.mean(dim=["y", "x"])
ts.plot()

In [None]:
low = ts.argmin().item()

ndvi.isel(time=low).plot.imshow(figsize=(12, 9), cmap="YlGn");

In [None]:
peak = ndvi.max(dim="time")
change = peak - ndvi.mean()

In [None]:
change.plot.imshow(figsize=(12, 9), cmap="RdYlGn", vmin=-0.65, vmax=0.65);

### Exercise: Plot the GNDVI

In [None]:
%%load solutions/gndi.py

stackstac has a handy function for adding a DataArray to a map, which is nice for larger datasets:

In [None]:
m = stackstac.show(change, cmap="RdYlGn", range=(-0.65, 0.65))
m.add_control(FullScreenControl())
m.scroll_wheel_zoom = True
m