![Slide reading: If your metadata is not indexed your data does not exist](images/your-data-does-not-exist.png)
<div style="text-align: right"> Slide pulled from Matt Hanson's talk on <a href="https://docs.google.com/presentation/d/1BWYiOwBVwZfcrSDG7tdA5TmQkz-3DwGElZIzm15NpKQ/edit#slide=id.g127dd1b9487_0_0">Cloud-Native Geospatial</a></div>


# Data Access with STAC <img src=images/e84-logo.png align="right"></img>
<div style="text-align: right"> Julia Signell | @jsignell </div>

This lightning talk is about a few things:

- how specifications let data providers abstract dataset-specific knowledge away from the data consumer (and absorb this burden themselves)
- how specifications make it faster to develop new highly-usable tools
- how xarray is the best and it's even better if you have STAC

In [None]:
import planetary_computer
import pystac_client
import xarray as xr

catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)

In [None]:
collection = catalog.get_collection("nasa-nex-gddp-cmip6")
asset = collection.assets["ACCESS-CM2.historical"]
asset

# What's in there?

The STAC metadata contains everything you need to know to read the data.

 - where it is stored
 - how it is structured
 - how you can access it

The metadata can be hard to write properly but that is something that you (if you are the data provider) just have to figure out **once**. Once that metadata is written, whenever someone goes to use the data they will have an easier time.

In this case we have a reference file because the data has been kerchunked but do I need to know that? Nope.

In [None]:
%%time 

xr.open_dataset(asset)

Let's do another one.

In [None]:
collection = catalog.get_collection("daymet-daily-hi")
asset = collection.assets["zarr-abfs"]
asset

In [None]:
%%time

xr.open_dataset(asset)

What if it's just regular COGs. You can read from a whole bunch of items (scenes) returned by a search:

In [None]:
search = catalog.search(
    collections=["landsat-c2-l2"], 
    intersects={"type": "Point", "coordinates": [-97.74, 30.26]},
    datetime="2022-07-01/2022-08-01",
)

In [None]:
next(search.items())

In [None]:
xr.open_dataset(search, engine="stac")

This should be feeling boring at this point. _It is supposed to feel boring._ 

As a data consumer you only need to think about the parts that are interesting from a scientific perspective: 
> I want to see what Austin looked like last July. 

# Separation of responsibilities

- The data provider writes STAC metadata
- The STAC API takes queries and returns STAC metadata.
- The software tools take STAC metadata and return data

The data consumers write queries and get data.

![](images/separation-of-responsibilities.png)