# STAC specification

The **SpatioTemporal Asset Catalog (STAC)** is an emerging open standard for geospatial data that aims to increase the interoperability of geospatial data, particularly satellite imagery. 
[Many major data archives](https://stacspec.org/en/about/datasets/) now follow the STAC specification.

In this lesson we'll be working with the [Microsoft's Planetary Computer (MPC)](https://planetarycomputer.microsoft.com) STAC API. 

## MPC Catalog 
First, load the necessary packages:

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import geopandas as gpd
import rioxarray as rioxr
from shapely.geometry import Polygon

from pystac_client import Client # To access STAC catalogs

import planetary_computer #API for planetary computer

from IPython.display import Image # To nicely display images

### Access
We use the `Client` function from the `pystac_client` package to access the catalog:

In [3]:
# Access MPC catalog
catalog = Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)

The `modifier` parameter is needed to access the data in the MPC catalog.

### Catalog Exploration


In [4]:
# Explore catalog metadata
print('Title: ', catalog.title)
print('Description: ', catalog.description)

Title:  Microsoft Planetary Computer STAC API
Description:  Searchable spatiotemporal metadata describing Earth science datasets hosted by the Microsoft Planetary Computer


We can access its collections by using the `get_collections` method:

In [5]:
catalog.get_collections()

<generator object Client.get_collections at 0x1505bbab0>

The output is a **generator**:

- a special kind of **lazy object** in Python over which you can iterate over liek alist
- the items in a generator don't exist in memory until you explicitly iterate over them or conver them to a list
- allows for more efficient memory allocation
- once the generator is iterated over completely, it cannot be resued unless it is recreated.

In [6]:
# Get collections and print their names

collections = list(catalog.get_collections()) # Turn generator to list

print('Number of collections: ', len(collections))

print('Collections IDs (first 10')
for i in range(10):
    print('-', collections[i].id)
      

Number of collections:  126
Collections IDs (first 10
- daymet-annual-pr
- daymet-daily-hi
- 3dep-seamless
- 3dep-lidar-dsm
- fia
- gridmet
- daymet-annual-na
- daymet-monthly-na
- daymet-annual-hi
- daymet-monthly-hi


## Collection

Select a single collection using the `get_child()` catalog method and the collection ID as the parameters

In [7]:
# Access NAIP collection
naip_collection = catalog.get_child('naip')
naip_collection

## Catalog search

We can narrow down the search within the `catalog` by specifying a time range, an area of interest, and the collection ID.

Two simple ways to define the area of interest:
- a GeoJSON-type dictionary with the coordinates of the AOI bounding box
- as a lisit `[xmin, ymin, xmax, ymax]` with the coordinates defining the four corners of the bounding box.

Goal: retrive NAIP scenes of Santa Barbara from 2018-2023.

In [9]:
# NCEAS bounding box (as a GeoJSON)
bbox = {
    "type": "Polygon",
    "coordinates":[
        [
            [-119.70608227128903, 34.426300194372274],
            [-119.70608227128903, 34.42041139020533],
            [-119.6967885126002, 34.42041139020533],
            [-119.6967885126002, 34.426300194372274],
            [-119.70608227128903, 34.426300194372274]
        ]
    ],
}

# Temporal range of interest

time_range = "2018-01-01/2023-01-01"

# Catalog search
search = catalog.search(
    collections = ['naip'],
    intersects = bbox,
    datetime = time_range
)

search

<pystac_client.item_search.ItemSearch at 0x15157f5d0>

to get the items found in the seach using `item_collection()` method:

In [10]:
items = search.item_collection()
len(items)

3

In [11]:
items

## Item

In [12]:
# Get the first item in the catalog search
item = items[0]
type(item)

pystac.item.Item

The STAC item is the core object in STAC catalog.

The item **does not contain the data itself** it has properties (metadata) and assets (links to actual data).

In [13]:
# Print itm ID and properties
print('ID', item.id)
item.properties

ID ca_m_3411935_sw_11_060_20220513


{'gsd': 0.6,
 'datetime': '2022-05-13T16:00:00Z',
 'naip:year': '2022',
 'proj:bbox': [246930.0, 3806808.0, 253260.0, 3814296.0],
 'providers': [{'url': 'https://www.fsa.usda.gov/programs-and-services/aerial-photography/imagery-programs/naip-imagery/',
   'name': 'USDA Farm Service Agency',
   'roles': ['producer', 'licensor']}],
 'naip:state': 'ca',
 'proj:shape': [12480, 10550],
 'proj:centroid': {'lat': 34.40624, 'lon': -119.71877},
 'proj:transform': [0.6, 0.0, 246930.0, 0.0, -0.6, 3814296.0, 0.0, 0.0, 1.0],
 'proj:code': 'EPSG:26911'}