# Finding Data using the NASA CMR-STAC API

**Summary**

This notebook will demonstrate how to find and access remote sensing data using the [NASA Common Metadata Repository (CMR) SpatioTemporal Asset Catalog (STAC) application programming interface (API)](https://cmr.earthdata.nasa.gov/stac/docs/index.html). The CMR-STAC API is NASA's implementation of the [STAC API specification](https://stacspec.org/) for all Earth Science data archived by NASA Earthdata. The current implementation allows users to execute searches within provider catalogs (e.g., LPCLOUD) to find the STAC Items they are searching for. All the providers can be found at the CMR-STAC endpoint here: <https://cmr.earthdata.nasa.gov/stac/>. In this exercise, we will query the **LPCLOUD** provider to identify STAC Items matching our search criteria.

After finding the results we are interested in, we will write a text file output with links that will allow us to access the HLS assets in the cloud. We will create an example text file of these cloud access links for use via HTTPS and Amazon Web Services s3 buckets.

**What is STAC?**  

[SpatioTemporal Asset Catalog (STAC)](https://stacspec.org/) is a specification that provides a common language for interpreting geospatial information in order to standardize indexing and discovering data.  

The [STAC specification](https://stacspec.org/en/about/stac-spec/) is made up of a collection of related, yet independent specifications that when used together provide search and discovery capabilities for remote assets.

**Four STAC Specifications**  

- [STAC Item](https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md)
- [STAC Catalog](https://github.com/radiantearth/stac-spec/blob/master/catalog-spec/catalog-spec.md)
- [STAC Collection](https://github.com/radiantearth/stac-spec/blob/master/collection-spec/collection-spec.md)
- [STAC API](https://github.com/radiantearth/stac-api-spec)

In the following sections, we will explore each of STAC elements using NASA's CMR-STAC API.  

**Requirements:**
- A NASA [Earthdata Login](https://urs.earthdata.nasa.gov/) account is required to download EMIT data   
- Follow the [instructions](https://github.com/nasa/LPDAAC-Data-Resources/blob/main/setup/setup_instructions_python.md) to set up a compatible local Python environment.

**Learning Objectives:**
- Understand the STAC Specification and how to use it to find data
- Write an output text file with links to access the data
- Open a file and visualize it


## Setup
Import the required packages.

In [769]:
# import requests
import pystac_client
from collections import defaultdict    
import json
import geopandas as gpd
import hvplot.pandas

## CMR STAC Client 

### Submit request to the CMR STAC API

Use the `pystac_client` package to submit a request to the CMR STAC API.

In [770]:
stac_url = 'https://cmr.earthdata.nasa.gov/stac'

In [771]:
provider_cat = Client.open(stac_url)
provider_cat.to_dict()


{'type': 'Catalog',
 'id': 'CMR-STAC',
 'stac_version': '1.1.0',
 'description': 'This is the landing page for CMR-STAC. Each provider link contains a STAC endpoint.',
 'links': [{'rel': 'self',
   'href': 'https://cmr.earthdata.nasa.gov/stac',
   'type': 'application/json'},
  {'rel': 'root',
   'href': 'https://cmr.earthdata.nasa.gov/stac',
   'type': 'application/json',
   'title': 'NASA Common Metadata Repository CMR-STAC API'},
  {'rel': 'service-desc',
   'href': 'https://api.stacspec.org/v1.0.0-beta.1/openapi.yaml',
   'type': 'application/yaml',
   'title': 'OpenAPI Documentation'},
  {'rel': 'service-doc',
   'href': 'https://wiki.earthdata.nasa.gov/display/ED/CMR+SpatioTemporal+Asset+Catalog+%28CMR-STAC%29+Documentation',
   'type': 'text/html',
   'title': 'NASA CMR-STAC Documentation'},
  {'rel': 'child',
   'href': 'https://cmr.earthdata.nasa.gov/stac/ESA',
   'type': 'application/json',
   'title': 'ESA'},
  {'rel': 'child',
   'href': 'https://cmr.earthdata.nasa.gov/stac

The CMR STAC API endpoint lists the available providers. Each **provider** is a separate STAC Catalog endpoint that can be used to submit spatiotemporal queries against.

In [772]:
providers = {}
for p in provider_cat.to_dict()['links']:
    if 'child' in p['rel']:
        providers[p['title']] =  p['href'] 
providers

{'ESA': 'https://cmr.earthdata.nasa.gov/stac/ESA',
 'GHRC': 'https://cmr.earthdata.nasa.gov/stac/GHRC',
 'ECHO': 'https://cmr.earthdata.nasa.gov/stac/ECHO',
 'ISRO': 'https://cmr.earthdata.nasa.gov/stac/ISRO',
 'EDF_DEV04': 'https://cmr.earthdata.nasa.gov/stac/EDF_DEV04',
 'ASF': 'https://cmr.earthdata.nasa.gov/stac/ASF',
 'EUMETSAT': 'https://cmr.earthdata.nasa.gov/stac/EUMETSAT',
 'CDDIS': 'https://cmr.earthdata.nasa.gov/stac/CDDIS',
 'JAXA': 'https://cmr.earthdata.nasa.gov/stac/JAXA',
 'AU_AADC': 'https://cmr.earthdata.nasa.gov/stac/AU_AADC',
 'ECHO10_OPS': 'https://cmr.earthdata.nasa.gov/stac/ECHO10_OPS',
 'LANCEAMSR2': 'https://cmr.earthdata.nasa.gov/stac/LANCEAMSR2',
 'GESDISCCLD': 'https://cmr.earthdata.nasa.gov/stac/GESDISCCLD',
 'GHRSSTCWIC': 'https://cmr.earthdata.nasa.gov/stac/GHRSSTCWIC',
 'LARC_CLOUD': 'https://cmr.earthdata.nasa.gov/stac/LARC_CLOUD',
 'LANCEMODIS': 'https://cmr.earthdata.nasa.gov/stac/LANCEMODIS',
 'NSIDCV0': 'https://cmr.earthdata.nasa.gov/stac/NSIDCV0',

In this example we will explore the collections availablee through `LPCLOUD` provider.

In [773]:
provider = 'LPCLOUD'
provider_url = providers[provider]
provider_url

'https://cmr.earthdata.nasa.gov/stac/LPCLOUD'

We can use both `collection_search` or `get_collections()` a get collections from the provider endpoint while `collection_search` method allows further filtering capabilities. The returned object is a PySTAC CollectionClient which is converted to a list to be able to access the actual contents without iteration.

In [774]:
catalog = Client.open(provider_url)
# collections = list(catalog.collection_search().collections())
collections = list(catalog.get_collections())
collections

[<CollectionClient id=ASTGTM_NUMNC_003>,
 <CollectionClient id=ASTGTM_NC_003>,
 <CollectionClient id=ASTGTM_003>,
 <CollectionClient id=AG1km_003>,
 <CollectionClient id=AG100_003>,
 <CollectionClient id=AG5KMMOH_041>,
 <CollectionClient id=CAM5K30CFCLIM_003>,
 <CollectionClient id=CAM5K30CF_002>,
 <CollectionClient id=CAM5K30COVCLIM_003>,
 <CollectionClient id=CAM5K30EMCLIM_003>,
 <CollectionClient id=CAM5K30EM_002>,
 <CollectionClient id=CAM5K30UCCLIM_003>,
 <CollectionClient id=CAM5K30UC_002>,
 <CollectionClient id=WaterBalance_Daily_Historical_GRIDMET_1.5>,
 <CollectionClient id=ECO_L2G_CLOUD_002>,
 <CollectionClient id=ECO_L3G_MET_002>,
 <CollectionClient id=ECO_L3G_SM_002>,
 <CollectionClient id=ECO_L4G_ESI_002>,
 <CollectionClient id=ECO_L3G_JET_002>,
 <CollectionClient id=ECO_L2G_LSTE_002>,
 <CollectionClient id=ECO_L3G_SEB_002>,
 <CollectionClient id=ECO_L1CG_RAD_002>,
 <CollectionClient id=ECO_L4G_WUE_002>,
 <CollectionClient id=ECO_L1B_ATT_002>,
 <CollectionClient id=ECO_L2_

View the content returned for the first collection. 

In [775]:
collections[0].to_dict()

{'type': 'Collection',
 'id': 'ASTGTM_NUMNC_003',
 'stac_version': '1.1.0',
 'description': 'The ASTER Global Digital Elevation Model (GDEM) Version 3 (ASTGTM) provides a global digital elevation model (DEM) of land areas on Earth at a spatial resolution of 1 arc second (approximately 30 meter horizontal posting at the equator).\r\n\r\nThe development of the ASTER GDEM data products is a collaborative effort between National Aeronautics and Space Administration (NASA) and Japan’s Ministry of Economy, Trade, and Industry (METI). The ASTER GDEM data products are created by the Sensor Information Laboratory Corporation (SILC) in Tokyo. \r\n\r\nThe ASTER GDEM Version 3 data product was created from the automated processing of the entire ASTER Level 1A (https://doi.org/10.5067/ASTER/AST_L1A.003) archive of scenes acquired between March 1, 2000, and November 30, 2013. Stereo correlation was used to produce over one million individual scene based ASTER DEMs, to which cloud masking was applied

There's a lot of information returned for each collection. Let's print all available STAC Collections within the `LPCLOUD` catalog.

In [776]:
for c in collections:
    print(f'{c.id}: {c.title}')
    

ASTGTM_NUMNC_003: ASTER Global Digital Elevation Model Attributes NetCDF V003
ASTGTM_NC_003: ASTER Global Digital Elevation Model NetCDF V003
ASTGTM_003: ASTER Global Digital Elevation Model V003
AG1km_003: ASTER Global Emissivity Dataset, 1 kilometer, HDF5 V003
AG100_003: ASTER Global Emissivity Dataset, 100 meter, HDF5 V003
AG5KMMOH_041: ASTER Global Emissivity Dataset, Monthly, 0.05 deg, HDF5 V041
CAM5K30CFCLIM_003: Combined ASTER and MODIS Emissivity database over Land (CAMEL) Coefficient Climatology Monthly Global 0.05Deg V003
CAM5K30CF_002: Combined ASTER and MODIS Emissivity database over Land (CAMEL) Coefficient Monthly Global 0.05Deg V002
CAM5K30COVCLIM_003: Combined ASTER and MODIS Emissivity database over Land (CAMEL) Covariances Climatology Monthly Global 0.25Deg V003
CAM5K30EMCLIM_003: Combined ASTER and MODIS Emissivity database over Land (CAMEL) Emissivity Climatology Monthly Global 0.05Deg V003
CAM5K30EM_002: Combined ASTER and MODIS Emissivity database over Land (CAMEL

In [777]:
print(f'LPCLOUD has {len(collections)} Collections')

LPCLOUD has 438 Collections


The `LPCLOUD` provider offers over 400 data products available in Earthdata Cloud. Using `collection_search` method, a string can be used to search through the available collections. The `collections` method allows iteration through the returned results. In this example, collections are filtered to retain only those that include "HLS" in their records.

In [778]:
for result in catalog.collection_search(q="HLS").collections():
    for link in result.links:
        if 'self' in link.rel:
            print(result.id, f"{link.href}", sep=": ")
            

HLSL30_2.0: https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/HLSL30_2.0
HLSS30_2.0: https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/HLSS30_2.0
HLSL30_VI_2.0: https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/HLSL30_VI_2.0
HLSS30_VI_2.0: https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/HLSS30_VI_2.0
ECO_L2T_LSTE_002: https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/ECO_L2T_LSTE_002
OPERA_L3_DIST-ALERT-HLS_V1_1: https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/OPERA_L3_DIST-ALERT-HLS_V1_1
ECO_L2T_STARS_002: https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/ECO_L2T_STARS_002
ECO_L3T_JET_002: https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/ECO_L3T_JET_002
ECO_L1CT_RAD_002: https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/ECO_L1CT_RAD_002
ECO_L4T_ESI_002: https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/ECO_L4T_ESI_002
ECO_L4T_WUE_002: https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/ECO_L4T_WUE_002
ECO_L3T_MET_002

### Get information about an individual Collection

Once the desired collections are identified, their collection IDs can be used for further queries. Below we'll specify a STAC Collection, `HLSL30_2.0`, and request its STAC metadata. Note that the Collection IDs in CMR STAC differ from product short names. 

In [779]:
# Define desired collection
collection = 'HLSL30_2.0'

In [780]:
#  = catalog.search(collections=collection)
catalog.get_collection(collection).to_dict()

{'type': 'Collection',
 'id': 'HLSL30_2.0',
 'stac_version': '1.1.0',
 'description': 'The Harmonized Landsat Sentinel-2 (HLS) project provides consistent surface reflectance (SR) and top of atmosphere (TOA) brightness data from a virtual constellation of satellite sensors. The Operational Land Imager (OLI) is housed aboard the joint NASA/USGS Landsat 8 and Landsat 9 satellites, while the Multi-Spectral Instrument (MSI) is mounted aboard Europe’s Copernicus Sentinel-2A, Sentinel-2B, and Sentinel-2C satellites. The combined measurement enables global observations of the land every 2–3 days at 30-meter (m) spatial resolution. The HLS project uses a set of algorithms to obtain seamless products from OLI and MSI that include atmospheric correction, cloud and cloud-shadow masking, spatial co-registration and common gridding, illumination and view angle normalization, and spectral bandpass adjustment.\n\nThe HLSL30 product provides 30-m Nadir Bidirectional Reflectance Distribution Function (

## Item Search

For this section, a spatiotemporal query is performed to filter the data assets available from a collection (or across multiple collections). The area of interest is defined using a geojson file. Additionally, the data collection(s) and time range for our example are defined.

In [781]:
# Open geojson file
field = gpd.read_file('../../data/ne_w_agfields.geojson')
field

Unnamed: 0,geometry
0,"POLYGON ((-101.67272 41.04754, -101.65345 41.0..."


In [782]:
# Visualize area of interest using hvplot
field.hvplot(tiles='ESRI', crs='EPSG:4326', line_color='yellow', line_width=2, fill_color=None)

We will now start to specify the search criteria we are interested in, i.e., the **date range**, the **region of interest** (roi), and the **data collections**, to pass to the STAC API

In [783]:
# Specify area of interest
roi = field['geometry'][0].__geo_interface__
roi

{'type': 'Polygon',
 'coordinates': (((-101.67271614074707, 41.04754380304359),
   (-101.65344715118408, 41.04754380304359),
   (-101.65344715118408, 41.06213891056728),
   (-101.67271614074707, 41.06213891056728),
   (-101.67271614074707, 41.04754380304359)),)}

Date ranges can be submitted as a start and end date with a "/". For our search we will only use year and month to define our range, but an example is included below using a full `%Y-%m-%dT%H:%M:%SZ` format.

In [784]:
# Specify date range
date_range = "2021-05/2021-08"
# date_range = "2021-05-01T00:00:00Z/2021-08-30T23:59:59Z" 


A STAC Collection is synonymous with what is referred by LP DAAC as a Data Product, though within STAC, the collection name may not match exactly with the product shortname.

In [785]:
# Define Collections
collections =['HLSL30_2.0', 'HLSS30_2.0'] 
collections

['HLSL30_2.0', 'HLSS30_2.0']

The CMR STAC API is a powerful search and discovery utility, allowing filtering capabilities using the STAC API specification. Below, we specify highest cloud cover percentage to keep to pass that to the query.

In [786]:
cloudcover = 25

In [787]:
# Define Search Parameters
search_params = {
    "collections": collections,
    "intersects": roi,
    "datetime": date_range,
    "query":{"eo:cloud_cover": {"lt": cloudcover}}
}

Now that we have our search parameters we will place them into a dictionary, then use `pystac_client` to submit the request to search the catalog.

In [788]:
# Perform the search
query = catalog.search(**search_params)

Print the number of STAC Items returned from our search.

In [789]:
# Count result quantity
query.matched()

64

We now have a search object containing the STAC records that matched our query. Let's pull out all of the STAC Items (as a PySTAC ItemCollection object) and explore the contents (i.e., the STAC Items).

In [790]:
# Return Resulting Items
items = query.item_collection_as_dict()


We can view the returned STAC Items.

In [791]:
items

{'type': 'FeatureCollection',
 'features': [{'type': 'Feature',
   'id': 'HLS.L30.T13TGF.2021133T172406.v2.0',
   'stac_version': '1.0.0',
   'stac_extensions': ['https://stac-extensions.github.io/eo/v1.0.0/schema.json'],
   'properties': {'datetime': '2021-05-13T17:24:06.480Z',
    'eo:cloud_cover': 25,
    'start_datetime': '2021-05-13T17:24:06.480Z',
    'end_datetime': '2021-05-13T17:24:30.362Z'},
   'geometry': {'type': 'Polygon',
    'coordinates': [[[-101.344445, 40.5048858],
      [-101.2894253, 41.4919436],
      [-101.9209422, 41.5106056],
      [-102.2213228, 40.5293241],
      [-101.344445, 40.5048858]]]},
   'bbox': [-102.2213228, 40.5048858, -101.2894253, 41.5106056],
   'assets': {'browse': {'href': 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-public/HLSL30.020/HLS.L30.T13TGF.2021133T172406.v2.0/HLS.L30.T13TGF.2021133T172406.v2.0.jpg',
     'title': 'Download HLS.L30.T13TGF.2021133T172406.v2.0.jpg',
     'type': 'image/jpeg',
     'roles': ['browse']},
    'thumb

Below, view the metadata record for the first item. 

In [792]:
items['features'][0]

{'type': 'Feature',
 'id': 'HLS.L30.T13TGF.2021133T172406.v2.0',
 'stac_version': '1.0.0',
 'stac_extensions': ['https://stac-extensions.github.io/eo/v1.0.0/schema.json'],
 'properties': {'datetime': '2021-05-13T17:24:06.480Z',
  'eo:cloud_cover': 25,
  'start_datetime': '2021-05-13T17:24:06.480Z',
  'end_datetime': '2021-05-13T17:24:30.362Z'},
 'geometry': {'type': 'Polygon',
  'coordinates': [[[-101.344445, 40.5048858],
    [-101.2894253, 41.4919436],
    [-101.9209422, 41.5106056],
    [-102.2213228, 40.5293241],
    [-101.344445, 40.5048858]]]},
 'bbox': [-102.2213228, 40.5048858, -101.2894253, 41.5106056],
 'assets': {'browse': {'href': 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-public/HLSL30.020/HLS.L30.T13TGF.2021133T172406.v2.0/HLS.L30.T13TGF.2021133T172406.v2.0.jpg',
   'title': 'Download HLS.L30.T13TGF.2021133T172406.v2.0.jpg',
   'type': 'image/jpeg',
   'roles': ['browse']},
  'thumbnail_0': {'href': 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-public/HLSL

We will also specify the STAC Assets (i.e., bands/layers) of interest for both the S30 and L30 collections. These differ because B8A is the NIR band for the S30 collection, while B5 is the NIR band for the L30 collection.

In [793]:
s30_bands = ['B8A', 'B04', 'B02', 'Fmask']
l30_bands = ['B05', 'B04', 'B02', 'Fmask']

Retrieve the asset links for each band from the STAC Items that match our cloud cover threshold and store them in a list.

In [794]:
evi_band_links = []

for i in items['features']:
    if i['collection'] == 'HLSS30_2.0':
        evi_bands = s30_bands
    elif i['collection'] == 'HLSL30_2.0':
        evi_bands = l30_bands
    for a in i['assets']:
        if a in evi_bands:
            evi_band_links.append(i['assets'][a]['href'])


The filtering done in the previous steps produces a list of links to STAC Assets. Let's print out the first ten links.

In [795]:
evi_band_links[:10]

['https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021133T172406.v2.0/HLS.L30.T13TGF.2021133T172406.v2.0.B04.tif',
 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021133T172406.v2.0/HLS.L30.T13TGF.2021133T172406.v2.0.B05.tif',
 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021133T172406.v2.0/HLS.L30.T13TGF.2021133T172406.v2.0.Fmask.tif',
 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021133T172406.v2.0/HLS.L30.T13TGF.2021133T172406.v2.0.B02.tif',
 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T14TKL.2021133T172406.v2.0/HLS.L30.T14TKL.2021133T172406.v2.0.B02.tif',
 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T14TKL.2021133T172406.v2.0/HLS.L30.T14TKL.2021133T172406.v2.0.B04.tif',
 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL

Save the link lists as a text file.

In [812]:
name = 'HTTPS_HLS_Links.txt'
with open(f'../../data/{name}', 'w') as f:
    for link in evi_band_links:
        f.write(link + "\n")


You can also write links for S3 direct access in a separate text file.

In [813]:
name = 'S3_HLS_Links.txt'
with open(f'../../data/{name}', 'w') as f:
    for link in evi_band_links:
        s3 = link.replace('https://data.lpdaac.earthdatacloud.nasa.gov/', 's3://')
        f.write(s3 + "\n")

## Additional Resources

- https://github.com/nasa/cmr-stac
- https://stacspec.org/
- https://pystac-client.readthedocs.io/en/latest/index.html

## Contact Info:  

Email: LPDAAC@usgs.gov  
Voice: +1-866-573-3222  
Organization: Land Processes Distributed Active Archive Center (LP DAAC)¹  
Website: <https://lpdaac.usgs.gov/>  

¹Work performed under USGS contract G15PD00467 for NASA contract NNG14HH33I.  