# STAC Intro
This notebook provides a simple example of how to create <b>SpatioTemporal Asset Catolog (STAC)</b> objects and utilize a STAC server to search a catalog.

## Requirements
In order to setup your environment to run this notebook, please use the conda environment file inside the notebooks directory called environment.yml.

You will also need docker and docker-compose installed to run the STAC server docker image.  For information on how to easily setup docker, see https://docs.docker.com/compose/install/compose-desktop/

## STAC Objects
STAC utilizes JSON to describe geospatial information.  For more information about STAC see https://stacspec.org/en/about/.  To learn more about the STAC spec visit https://stacspec.org/en/about/stac-spec/



## PySTAC
In this notebook, we will be utilizing <b>PySTAC</b> (https://github.com/stac-utils/pystac) to create and inspect STAC objects.  As an example, we will use data from an S3 bucket to build a set of STAC items and create a catalog using PySTAC.

We will be creating a <b>collection</b> object to contain the SRTM data hosted on S3.  A collection is very similar to a catalog, but is used to describe a group of items.  For more information on collections see  https://github.com/radiantearth/stac-spec/blob/master/collection-spec/collection-spec.md.

In order to do this, we will setup an S3 connection to the <b>makepath-srtm</b> bucket.  Since we only need to get the bounds and coordinate reference system (CRS) from each file that we want to add to the collection, we don't need to download each file.  We will use rioxarray to simply read the bounds of the files from S3.  We will then create a Polygon geometry from the bounds.  Both the bounds and the geometry will be used to create an <b>item</b> for the raster.  We then need to add the <b>asset</b> (S3 URI for the raster) to the item, and add the item to the collection.  We do this for every raster in the S3 bucket at our specified prefix.

In [1]:
import boto3
from datetime import datetime
import json
import os
import pystac
import rioxarray
import s3fs
from shapely.geometry import Polygon, mapping
import xarray


def get_bbox(raster_file):
    """Return the bounding box for the raster"""
    with rioxarray.open_rasterio(raster_file) as ds:
        return ds.rio.bounds()
    
def get_geometry(bbox):
    """Create a Polygon from the bounding box of a raster"""
    return mapping(Polygon([
            [bbox[0], bbox[1]],
            [bbox[0], bbox[3]],
            [bbox[2], bbox[3]],
            [bbox[2], bbox[1]]
        ]))

def get_crs(raster_file):
    """Return the coordinate reference system (crs) for the raster"""
    with rioxarray.open_rasterio(raster_file) as ds:
        return ds.rio.crs

# setup connection to s3.  Make sure you have aws credentials setup!
s3_bucket_name = "makepath-srtm"
s3_prefix = "srtm-data/"
bucket = boto3.resource("s3").Bucket(s3_bucket_name)

Now that we have defined some utility functions and setup our S3 connection, it is time to create the collection and start adding items and assets.

In [2]:
# create the collection which will contain all our STAC items
collection = pystac.Collection(
    id='makepath-srtm',
    description='makepath SRTM data.',
    extent=pystac.Extent(
        pystac.SpatialExtent([0, 0, 0, 0]),
        pystac.TemporalExtent([[datetime.utcnow(), datetime.utcnow()]]),
    )
)

fs = s3fs.S3FileSystem(anon=False)

# add each .tif file in the bucket to the catalog as an item with appropriate asset
for obj in bucket.objects.filter(Delimiter="/", Prefix=s3_prefix):
    file_name = obj.key
    # get a filelike object for s3 so we can open the rasters and get only their metadata from s3 to build the items
    if file_name.lower().endswith(".tif"):
        print(file_name)
        s3_path = f"s3://{s3_bucket_name}/{file_name}"
        bbox = get_bbox(s3_path)
        # when creating the item, the id can not contain any slashes
        # as this will cause issues accessing the item via REST api
        item = pystac.Item(id=file_name.split(".")[0].replace("/", "-"),
                     geometry=get_geometry(bbox),
                     bbox=bbox,
                     datetime=datetime.utcnow(),
                     properties={
                         "crs": str(get_crs(s3_path)),
                     })
        # add the asset too
        item.add_asset(
            key="raster-file",
            asset=pystac.Asset(
                href=s3_path, 
                media_type=pystac.MediaType.GEOTIFF,
            ),
        )
        collection.add_item(item)

srtm-data/n39_w113_1arc_v2.tif
srtm-data/n39_w114_1arc_v2.tif


In [3]:
# we can check the collection by describing it
collection.describe()

* <Collection id=makepath-srtm>
  * <Item id=srtm-data-n39_w113_1arc_v2>
  * <Item id=srtm-data-n39_w114_1arc_v2>


In [4]:
# we can inspect what an item looks like by getting one and printing the json dump
item = next(iter(collection.get_items()))
print(json.dumps(item.to_dict(), indent=4))

{
    "type": "Feature",
    "stac_version": "1.0.0",
    "id": "srtm-data-n39_w113_1arc_v2",
    "properties": {
        "crs": "EPSG:4326",
        "datetime": "2022-07-28T15:19:23.411121Z"
    },
    "geometry": {
        "type": "Polygon",
        "coordinates": [
            [
                [
                    -113.00013888888888,
                    38.9998611111111
                ],
                [
                    -113.00013888888888,
                    40.000138888888884
                ],
                [
                    -111.99986111111112,
                    40.000138888888884
                ],
                [
                    -111.99986111111112,
                    38.9998611111111
                ],
                [
                    -113.00013888888888,
                    38.9998611111111
                ]
            ]
        ]
    },
    "links": [
        {
            "rel": "root",
            "href": null,
            "type": "applicati

### Normalizing Item HREFs
Now that we have a collection setup with a set of items and assets, we need to ensure that the <b>HREFs</b> within the collection and items are correct.  To do this, PySTAC provides a utility function that lets us normalize the HREFs.  Typically this would be used to provide a location to the json files and assets on disk.  However, since our data is hosted in an S3 bucket, we will use this functionality to setup the references to reflect the remote storage location.

By default, PySTAC will create subdirectories for each item and asset.  Since all of our rasters are in a single S3 bucket prefix, we need to tell PySTAC that we want all the assets and items in a single subfolder.  To do this, we use a <b>TemplateLayoutStrategy</b> (as shown below).  We also want to make sure that the catalog HREFs are absolute, so we set the collection's catalog type (remember collections are just catalogs!) to <b>ABSOLUTE_PUBLISHED</b>.  Once this is complete, we can see that all the HREFs for items are setup correctly and that they are now S3 URIs.

In [5]:
# normalize all hrefs to the s3 bucket uri
from pystac.layout import TemplateLayoutStrategy

collection.catalog_type = pystac.CatalogType.ABSOLUTE_PUBLISHED
strategy = TemplateLayoutStrategy(item_template=f"")
collection.normalize_hrefs(f"s3://{s3_bucket_name}", strategy=strategy)
item = next(iter(collection.get_items()))
print(json.dumps(item.to_dict(), indent=4))

{
    "type": "Feature",
    "stac_version": "1.0.0",
    "id": "srtm-data-n39_w113_1arc_v2",
    "properties": {
        "crs": "EPSG:4326",
        "datetime": "2022-07-28T15:19:23.411121Z"
    },
    "geometry": {
        "type": "Polygon",
        "coordinates": [
            [
                [
                    -113.00013888888888,
                    38.9998611111111
                ],
                [
                    -113.00013888888888,
                    40.000138888888884
                ],
                [
                    -111.99986111111112,
                    40.000138888888884
                ],
                [
                    -111.99986111111112,
                    38.9998611111111
                ],
                [
                    -113.00013888888888,
                    38.9998611111111
                ]
            ]
        ]
    },
    "links": [
        {
            "rel": "root",
            "href": "s3://makepath-srtm/collection.json"

## STAC-FastAPI
In this notebook, we will be using <b>STAC-FastAPI</b> (https://github.com/stac-utils/stac-fastapi) to host an OpenAPI compliant, STAC server based on FastAPI.  This will provide a set of RESTful endpoints that we can use to host STAC objects.  We can also use the API to search for STAC Items using meta data like a bounding box.

### Setup
We will clone the stac-fastapi github repo and run the server locally using a docker image in the repo.  <b><u>In a new terminal (outside the notebook)</u></b>, use the following set of commands to clone the repo to any desired location, then build and run the stac-fastapi project:

`git clone git@github.com:stac-utils/stac-fastapi.git`<br>
`cd stac-fastapi`<br>
`make image`<br>
`make docker-run-pgstac`<br>

With the docker containers running, the stac-fastapi is now accessable at http://localhost:8080.

### Optional Database Inspection
This step is <b><u>entirely optional</u></b>, but for anyone interested in connecting to the postgis database to inspect the data, the postgis backend can be accessed using psql:
`psql -h 127.0.0.1 -p 5439 -U username -d postgis` with password: `password`.

Once connected to the database, the schema must be set to `pgstac` (`set schema 'pgstac';`).

## STAC Server Data Ingestion
Now that we have created a catalog and started a server, we need to have the STAC server ingest the new data.  In order to do this, we will write a simple python script that uses the STAC-FastAPI REST endpoints to ingest new data.

In [10]:
import json
import sys
from urllib.parse import urljoin
import requests


app_host = "http://localhost:8080"

def post_or_put(url: str, data: dict):
    """Post or put data to url."""
    r = requests.post(url, json=data)
    if r.status_code == 409:
        # Exists, so update
        r = requests.put(url, json=data)
        # Unchanged may throw a 404
        if not r.status_code == 404:
            r.raise_for_status()
    else:
        r.raise_for_status()
        

collection_dict = collection.to_dict()
post_or_put(urljoin(app_host, "/collections"), collection_dict)

for item in collection.get_items():
    item_dict = item.to_dict()
    del item_dict["stac_extensions"]
    post_or_put(urljoin(app_host, f"collections/{collection_dict['id']}/items"), item_dict)

## PySTAC-Client
In this notebook, we will be utilizing <b>PySTAC-Client</b> (https://github.com/stac-utils/pystac-client) to access catalogs via the STAC server's RESTful API.  Now that we have some data uploaded to our local STAC server running in the docker container, we can use the PySTAC-Client to access it and execute searches.

In [6]:
from pystac_client import Client, ConformanceClasses
api = Client.open('http://localhost:8080')
api.title

'stac-fastapi'

In [7]:
print([x for x in api.get_all_collections()])
print([x for x in api.get_all_items()])

[<CollectionClient id=makepath-srtm>]
[<Item id=srtm-data-n39_w114_1arc_v2>, <Item id=srtm-data-n39_w113_1arc_v2>]


In [8]:
results = api.search(max_items=5, bbox=[
    -113.00013888888888,
    38.9998611111111,
    -111.99986111111112,
    40.000138888888884]
)
for x in results.items():
    print(x.assets)

{'raster-file': <Asset href=s3://makepath-srtm/srtm-data/n39_w114_1arc_v2.tif>}
{'raster-file': <Asset href=s3://makepath-srtm/srtm-data/n39_w113_1arc_v2.tif>}


## STAC Extensions
Extensions add new fields or semantics to objects.  For a list of STAC extensions, see https://stac-extensions.github.io/.

For a simple example of how extensions can be used, see https://pystac.readthedocs.io/en/stable/quickstart.html#STAC-Extensions


## Additional Resources
- List of tools and resources: https://stacspec.org/en/about/tools-resources/
- STAC utils github: https://github.com/stac-utils
- STAC spec github: https://github.com/radiantearth/stac-spec
- STAC best practices: https://github.com/radiantearth/stac-spec/blob/master/best-practices.md
- PySTAC docs: https://pystac.readthedocs.io/en/stable/api/pystac.html#module-pystac