![Slide reading: If your metadata is not indexed your data does not exist](images/your-data-does-not-exist.png)
<div style="text-align: right"> Slide pulled from Matt Hanson's talk on <a href="https://docs.google.com/presentation/d/1BWYiOwBVwZfcrSDG7tdA5TmQkz-3DwGElZIzm15NpKQ/edit#slide=id.g127dd1b9487_0_0">Cloud-Native Geospatial</a></div>

# STAC <img src=images/e84-logo.png align="right"></img>
<div style="text-align: right"> Julia Signell | @jsignell </div>

This lightning talk is about a few things:

- how catalogs make it easier for people to access data
- how specifications make it faster to develop new highly-usable tools
- how xarray is the best and it's even better if you have STAC

![Slide showing the directory-like structure of a STAC catalog](images/stac-catalog.png)
<div style="text-align: right"> Slide pulled from Matt Hanson's talk on <a href="https://docs.google.com/presentation/d/1BWYiOwBVwZfcrSDG7tdA5TmQkz-3DwGElZIzm15NpKQ/edit#slide=id.g127dd1b9487_0_0">Cloud-Native Geospatial</a></div>

In [1]:
import planetary_computer
import pystac_client
import xarray as xr

catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)

In [3]:
collection = catalog.get_collection("nasa-nex-gddp-cmip6")
asset = collection.assets["ACCESS-CM2.historical"]
asset

# What's in there?

The STAC metadata contains everything you need to know to read the data.

 - where it is stored
 - how it is structured
 - how you can access it

In this case we have a reference file because the data has been kerchunked but do I need to know that? Nope.

In [4]:
%%time

xr.open_dataset(asset)

CPU times: user 1.77 s, sys: 106 ms, total: 1.87 s
Wall time: 8.42 s


Let's do another one.

In [5]:
collection = catalog.get_collection("daymet-daily-hi")
asset = collection.assets["zarr-abfs"]
asset

In [6]:
%%time

xr.open_dataset(asset)

CPU times: user 310 ms, sys: 17.8 ms, total: 328 ms
Wall time: 10.5 s


What if it's just regular COGs. You can read from a whole bunch of items (scenes) returned by a search:

In [7]:
search = catalog.search(
    collections=["landsat-c2-l2"], 
    intersects={"type": "Point", "coordinates": [-97.74, 30.26]},
    datetime="2022-07-01/2022-08-01",
)
next(search.items())

In [8]:
xr.open_dataset(search, engine="stac")

  times = pd.to_datetime(


This should be feeling boring at this point. _It is supposed to feel boring._ 

The interesting part should happen after you have the data.

# Separation of responsibilities

- The data provider writes STAC metadata
- The STAC API takes queries and returns STAC metadata.
- The software tools take STAC metadata and return data

The data consumers write queries and get data.

![](images/separation-of-responsibilities.png)