# Finding Data using the NASA CMR-STAC API

**Summary**

This notebook will demonstrate how to find and access remote sensing data using the [NASA Common Metadata Repository (CMR) SpatioTemporal Asset Catalog (STAC) application programming interface (API)](https://cmr.earthdata.nasa.gov/stac/docs/index.html). The CMR-STAC API is NASA's implementation of the [STAC API specification](https://stacspec.org/) for all Earth Science data archived by NASA Earthdata. The current implementation allows users to execute searches within provider catalogs (e.g., LPCLOUD) to find the STAC Items they are searching for. All the providers can be found at the CMR-STAC endpoint here: <https://cmr.earthdata.nasa.gov/stac/>. In this exercise, we will query the **LPCLOUD** provider to identify STAC Items matching our search criteria.

After finding the results we are interested in, we will write a text file output with links that will allow us to access the HLS assets in the cloud. We will create an example text file of these cloud access links for use via HTTPS and Amazon Web Services s3 buckets.

**What is STAC?**  

[SpatioTemporal Asset Catalog (STAC)](https://stacspec.org/) is a specification that provides a common language for interpreting geospatial information in order to standardize indexing and discovering data.  

The [STAC specification](https://stacspec.org/core.html) is made up of a collection of related, yet independent specifications that when used together provide search and discovery capabilities for remove assets.

**Four STAC Specifications**  

- [STAC Item](https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md)
- [STAC Catalog](https://github.com/radiantearth/stac-spec/blob/master/catalog-spec/catalog-spec.md)
- [STAC Collection](https://github.com/radiantearth/stac-spec/blob/master/collection-spec/collection-spec.md)
- [STAC API](https://github.com/radiantearth/stac-api-spec)

In the following sections, we will explore each of STAC element using NASA's CMR-STAC API.  

**Requirements:**
- A NASA [Earthdata Login](https://urs.earthdata.nasa.gov/) account is required to download EMIT data   
- *No Python setup requirements if connected to the workshop cloud instance!*
- **Local Only** Set up Python Environment - See **setup_instructions.md** in the `/setup/` folder to set up a local compatible Python environment

**Learning Objectives:**
- Understand the STAC Specification and how to user it to find data
- Write an output text file with links to access the data
- Open a file and visualize it


## Setup
Import the required packages.

In [1]:
import requests
from pystac_client import Client
from collections import defaultdict    
import json
import geopandas as gpd
import hvplot.pandas

### Submit `GET` request to the CMR STAC API

Use the `reqests` package to submit a `GET` request to the CMR STAC API. We'll parse the response and extract the information we need to navigate the STAC Catalog.

In [2]:
stac_url = 'https://cmr.earthdata.nasa.gov/stac'

In [3]:
provider_cat = requests.get(stac_url)

The CMR STAC API endpoint lists the available providers. Each **provider** is a separate STAC Catalog endpoint that can be used to submit spatiotemporal queries against.

In [4]:
providers = [p['title'] for p in provider_cat.json()['links'] if 'child' in p['rel']]
print(providers)

['ESA', 'GHRC', 'ECHO', 'ISRO', 'EDF_DEV04', 'ASF', 'EUMETSAT', 'CDDIS', 'JAXA', 'AU_AADC', 'ECHO10_OPS', 'LANCEAMSR2', 'GESDISCCLD', 'GHRSSTCWIC', 'LARC_CLOUD', 'LANCEMODIS', 'NSIDCV0', 'NSIDC_ECS', 'NCCS', 'OBPG', 'OMINRT', 'USGS_LTA', 'ASIPS', 'ESDIS', 'NSIDC_CPRD', 'ORNL_CLOUD', 'FEDEO', 'MLHUB', 'LAADS', 'LARC_ASDC', 'LPDAAC_ECS', 'NOAA_NCEI', 'OB_DAAC', 'XYZ_PROV', 'GHRC_DAAC', 'CSDA', 'NRSCC', 'CEOS_EXTRA', 'AMD_KOPRI', 'AMD_USAPDC', 'LARC', 'SCIOPS', 'USGS_EROS', 'LPCUMULUS', 'MOPITT', 'GHRC_CLOUD', 'LPCLOUD', 'ORNL_DAAC', 'CCMEO', 'POCLOUD', 'PODAAC', 'SEDAC', 'GES_DISC', 'LM_FIRMS', 'ENVIDAT', 'TROPICSDPC', 'INPE', 'OB_CLOUD', 'USGS', 'all']


In this example we will explore the `LPCLOUD` provider.

In [5]:
provider = 'LPCLOUD'

In [6]:
provider_url = f'{stac_url}/{provider}'
provider_url

'https://cmr.earthdata.nasa.gov/stac/LPCLOUD'

We can submit a get request to the provider endpoint to get details about it, including the data collections available.

In [7]:
cat = requests.get(provider_url)
cat.json()

{'type': 'Catalog',
 'id': 'LPCLOUD',
 'title': 'LPCLOUD STAC Catalog',
 'stac_version': '1.0.0',
 'description': 'Root STAC catalog for LPCLOUD',
 'conformsTo': ['https://api.stacspec.org/v1.0.0-rc.2/core',
  'https://api.stacspec.org/v1.0.0-rc.2/item-search',
  'https://api.stacspec.org/v1.0.0-rc.2/ogcapi-features',
  'https://api.stacspec.org/v1.0.0-rc.2/item-search#fields',
  'https://api.stacspec.org/v1.0.0-rc.2/item-search#features',
  'https://api.stacspec.org/v1.0.0-rc.2/item-search#query',
  'https://api.stacspec.org/v1.0.0-rc.2/item-search#sort',
  'https://api.stacspec.org/v1.0.0-rc.2/item-search#context',
  'http://www.opengis.net/spec/ogcapi-features-1/1.0/conf/core',
  'http://www.opengis.net/spec/ogcapi-features-1/1.0/conf/oas30',
  'http://www.opengis.net/spec/ogcapi-features-1/1.0/conf/geojson',
  'https://api.stacspec.org/v1.0.0-rc.2/collection-search',
  'https://api.stacspec.org/v1.0.0-rc.2/collection-search#free-text',
  'https://api.stacspec.org/v1.0.0-rc.2/collec

There's a lot if information returned. Filter it using string matching to create a dictionary of collections and their endpoints.

In [8]:
cols = [{l['href'].split('/')[-1]: l['href']} for l in cat.json() ['links'] if 'child' in l['rel']]
for c in cols:
    print(c)

{'ASTGTM_NUMNC_003': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/ASTGTM_NUMNC_003'}
{'ASTGTM_NC_003': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/ASTGTM_NC_003'}
{'ASTGTM_003': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/ASTGTM_003'}
{'AG1km_003': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/AG1km_003'}
{'AG100_003': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/AG100_003'}
{'AG5KMMOH_041': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/AG5KMMOH_041'}
{'CAM5K30CFCLIM_003': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/CAM5K30CFCLIM_003'}
{'CAM5K30CF_002': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/CAM5K30CF_002'}
{'CAM5K30COVCLIM_003': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/CAM5K30COVCLIM_003'}
{'CAM5K30EMCLIM_003': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/CAM5K30EMCLIM_003'}
{'CAM5K30EM_002': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/CAM5K30

**Notice** that only 10 collections are returned here, but the `LPCLOUD` provider has over 100 data products available in Earthdata Cloud. This is because the CMR STAC API returns 10 collections by default. A `limit` parameter can be added to the end of the `LPCLOUD` STAC API endpoint to increase the number of collections returned at one time, but this can sometimes be time consuming. Here we'll loop through each `next` page link to make a seperate request for the next 10 collections from each available page.

In [9]:
try:
    print(f"Requesting page:")
    while nxt_pg := [l for l in cat.json()['links'] if 'next' in l['rel']][0]:
        print(f"{nxt_pg['href'].split('=')[-1]}...", end = ' ')
        cat = requests.get(nxt_pg['href'])
        cols.extend([{l['href'].split('/')[-1]: l['href']} for l in cat.json()['links']if 'child' in l['rel']])
except:
    print('No additional pages')

Requesting page:
eyJqc29uIjoiW1wibW9kaXMvYXF1YSBsYW5kIHN1cmZhY2UgdGVtcGVyYXR1cmUvMy1iYW5kIGVtaXNzaXZpdHkgZGFpbHkgbDMgZ2xvYmFsIDAuMDVkZWcgY21nIHYwNjFcIixcIkxQQ0xPVURcIixcIk1ZRDIxQzFcIixcIjYxXCIsMjU2NTgwNTgwNSwyOF0ifQ%3D%3D... eyJqc29uIjoiW1wibW9kaXMvdGVycmErYXF1YSBicmRmL2FsYmVkbyBuYWRpciBicmRmLWFkanVzdGVkIHJlZiBkYWlseSBsMyBnbG9iYWwgMC4wNWRlZyBjbWcgdjA2MVwiLFwiTFBDTE9VRFwiLFwiTUNENDNDNFwiLFwiNjFcIiwyNTMyNDQ5MTc5LDMwXSJ9... eyJqc29uIjoiW1widmVnZXRhdGlvbiBpbmRleCBhbmQgcGhlbm9sb2d5ICh2aXApIHZlZ2V0YXRpb24gaW5kaWNlcyA3ZGF5cyBnbG9iYWwgMC4wNWRlZyBjbWcgdjAwNFwiLFwiTFBDTE9VRFwiLFwiVklQMDdcIixcIjRcIiwyNzYzMjY4NDQ5LDI0XSJ9... eyJqc29uIjoiW1widmlpcnMvbnBwIHN1cmZhY2UgcmVmbGVjdGFuY2UgZGFpbHkgbDMgZ2xvYmFsIDAuMDVkZWcgY21nIHYwMDJcIixcIkxQQ0xPVURcIixcIlZOUDA5Q01HXCIsXCIyXCIsMjUxOTEyNjc5Myw0NF0ifQ%3D%3D... No additional pages


Print all available STAC Collections within the `LPCLOUD` catalog.

In [10]:
print(f'LPCLOUD has {len(cols)} Collections')

LPCLOUD has 409 Collections


In [11]:
for c in cols:
    print(c)

{'ASTGTM_NUMNC_003': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/ASTGTM_NUMNC_003'}
{'ASTGTM_NC_003': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/ASTGTM_NC_003'}
{'ASTGTM_003': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/ASTGTM_003'}
{'AG1km_003': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/AG1km_003'}
{'AG100_003': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/AG100_003'}
{'AG5KMMOH_041': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/AG5KMMOH_041'}
{'CAM5K30CFCLIM_003': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/CAM5K30CFCLIM_003'}
{'CAM5K30CF_002': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/CAM5K30CF_002'}
{'CAM5K30COVCLIM_003': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/CAM5K30COVCLIM_003'}
{'CAM5K30EMCLIM_003': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/CAM5K30EMCLIM_003'}
{'CAM5K30EM_002': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/CAM5K30

### Get information about an individual Collection

Below we'll specify a STAC Collection, `HLSL30_2.0`, and request the STAC metadata.

In [12]:
# Define desired collection
collection = 'HLSL30_2.0'

In [13]:
# Filter list of collections
collection_link = list(filter(lambda c: collection == list(c.keys())[0], cols))[0]
collection_link

{'HLSL30_2.0': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/HLSL30_2.0'}

In [14]:
# Request STAC collection metadata
requests.get(collection_link[collection]).json()

{'type': 'Collection',
 'id': 'HLSL30_2.0',
 'title': 'HLS Landsat Operational Land Imager Surface Reflectance and TOA Brightness Daily Global 30m v2.0',
 'description': 'The Harmonized Landsat Sentinel-2 (HLS) project provides consistent surface reflectance (SR) and top of atmosphere (TOA) brightness data from a virtual constellation of satellite sensors. The Operational Land Imager (OLI) is housed aboard the joint NASA/USGS Landsat 8 and Landsat 9 satellites, while the Multi-Spectral Instrument (MSI) is mounted aboard Europe’s Copernicus Sentinel-2A, Sentinel-2B, and Sentinel-2C satellites. The combined measurement enables global observations of the land every 2–3 days at 30-meter (m) spatial resolution. The HLS project uses a set of algorithms to obtain seamless products from OLI and MSI that include atmospheric correction, cloud and cloud-shadow masking, spatial co-registration and common gridding, illumination and view angle normalization, and spectral bandpass adjustment.\n\nThe 

For this next section, use `pystac_client` to submit a spatiotemporal query for data assets across multiple collections. We will define our area of interest using a geojson file, while also specifying the data collections and time range of needed for our example.

In [15]:
# Open geojson file
field = gpd.read_file('../../data/ne_w_agfields.geojson')
field

Unnamed: 0,geometry
0,"POLYGON ((-101.67272 41.04754, -101.65345 41.0..."


In [16]:
# Visualize area of interest using hvplot
field.hvplot(tiles='ESRI', crs='EPSG:4326', line_color='yellow', line_width=2, fill_color=None)

We will now start to specify the search criteria we are interested in, i.e, the **date range**, the **region of interest** (roi), and the **data collections**, to pass to the STAC API

In [17]:
# Specify area of interest
roi = field['geometry'][0].__geo_interface__
roi

{'type': 'Polygon',
 'coordinates': (((-101.67271614074707, 41.04754380304359),
   (-101.65344715118408, 41.04754380304359),
   (-101.65344715118408, 41.06213891056728),
   (-101.67271614074707, 41.06213891056728),
   (-101.67271614074707, 41.04754380304359)),)}

Date ranges can be submitted as a start and end date with a "/". For our search we will only use year and month to define our range, but an example is included below using a full `%Y-%m-%dT%H:%M:%SZ` format.

In [18]:
# Specify data range
date_range = "2021-05/2021-08"
# date_range = "2021-05-01T00:00:00Z/2021-08-30T23:59:59Z" 


A STAC Collection is synonymous with what is referred by LP DAAC as a Data Product, though within STAC, the collection name may not match exactly with the product shortname.

In [19]:
# Define Collections
collections = ['HLSL30_2.0', 'HLSS30_2.0']
collections

['HLSL30_2.0', 'HLSS30_2.0']

In [20]:
# Define Search Parameters
search_params = {
    "collections": collections,
    "intersects": roi,
    "datetime": date_range,
}

Now that we have our search parameters we will place them into a dictionary, then use `pystac_client` to submit the request to search the catalog.

In [21]:
# Define Catalog to Search
catalog = Client.open(provider_url)

In [22]:
# Perform the search
query = catalog.search(**search_params)

Print the number of STAC Items returned from our search.

In [23]:
# Count result quantity
query.matched()

113

We now have a search object containing the STAC records that matched our query. Let's pull out all of the STAC Items (as a PySTAC ItemCollection object) and explore the contents (i.e., the STAC Items).

In [24]:
# Return Resulting Items
items = list(query.items())

We can view the individual STAC Items and their metadata using their index.

In [25]:
items[0]

We can use the nice representation provided by `pystac_client` to view the metadata for each STAC Item, or we can convert it to a dictionary.

In [26]:
# Convert to dictionary
items[0].to_dict()

{'type': 'Feature',
 'stac_version': '1.1.0',
 'stac_extensions': ['https://stac-extensions.github.io/eo/v1.1.0/schema.json'],
 'id': 'HLS.L30.T13TGF.2021124T173013.v2.0',
 'geometry': {'type': 'Polygon',
  'coordinates': [[[-101.5423534, 40.5109845],
    [-101.3056118, 41.2066375],
    [-101.2894253, 41.4919436],
    [-102.6032964, 41.5268623],
    [-102.638891, 40.5386175],
    [-101.5423534, 40.5109845]]]},
 'bbox': [-102.638891, 40.5109845, -101.2894253, 41.5268623],
 'properties': {'datetime': '2021-05-04T17:30:13.428000Z',
  'eo:cloud_cover': 36,
  'start_datetime': '2021-05-04T17:30:13.428Z',
  'end_datetime': '2021-05-04T17:30:37.319Z'},
 'links': [{'rel': 'self',
   'href': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/HLSL30_2.0/items/HLS.L30.T13TGF.2021124T173013.v2.0',
   'type': 'application/geo+json'},
  {'rel': 'parent',
   'href': 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/HLSL30_2.0/',
   'type': 'application/geo+json'},
  {'rel': 'collection',

While the CMR-STAC API is a powerful search and discovery utility, it is still maturing and currently does not have the full gamut of filtering capabilities that the STAC API specification allows for. Hence, additional filtering is required if we want to filter by a property like cloud cover. Below we will loop through and filter the STAC items by a specified cloud cover as well as extract the bands we want.

Set the max cloud cover threshold to 25%.

In [27]:
cloudcover = 25

We will also specify the STAC Assets (i.e., bands/layers) of interest for both the S30 and L30 collections. These differ because B8A is the NIR band for the S30 collection, while B5 is the NIR band for the L30 collection.

In [28]:
s30_bands = ['B8A', 'B04', 'B02', 'Fmask']
l30_bands = ['B05', 'B04', 'B02', 'Fmask']

Retrieve the asset links for each band from the STAC Items that match our cloud cover threshold and store them in a list.

In [29]:
evi_band_links = []

for i in items:
    if i.properties['eo:cloud_cover'] <= cloudcover:
        if i.collection_id == 'HLSS30_2.0':
            #print(i.properties['eo:cloud_cover'])
            evi_bands = s30_bands
        elif i.collection_id == 'HLSL30_2.0':
            #print(i.properties['eo:cloud_cover'])
            evi_bands = l30_bands

        for a in i.assets:
            if any(b==a for b in evi_bands):
                evi_band_links.append(i.assets[a].href)

The filtering done in the previous steps produces a list of links to STAC Assets. Let's print out the first ten links.

In [30]:
evi_band_links[:10]

['https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021133T172406.v2.0/HLS.L30.T13TGF.2021133T172406.v2.0.B04.tif',
 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021133T172406.v2.0/HLS.L30.T13TGF.2021133T172406.v2.0.B05.tif',
 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021133T172406.v2.0/HLS.L30.T13TGF.2021133T172406.v2.0.Fmask.tif',
 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021133T172406.v2.0/HLS.L30.T13TGF.2021133T172406.v2.0.B02.tif',
 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T14TKL.2021133T172406.v2.0/HLS.L30.T14TKL.2021133T172406.v2.0.B02.tif',
 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T14TKL.2021133T172406.v2.0/HLS.L30.T14TKL.2021133T172406.v2.0.B04.tif',
 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL

Notice that in the list of links that we have multiple tiles, i.e. **T14TKL** & **T13TGF** that intersect with our region of interest. These tiles represent neighboring UTM zones. We will split the list of links into separate lists for each tile.

Split by Universal Transverse Mercator (UTM) tile specified in the file name (e.g., T14TKL & T13TGF)

In [31]:
tile_dicts = defaultdict(list) 
tile_dicts

defaultdict(list, {})

In [32]:
for l in evi_band_links:
    tile = l.split('.')[-6]
    tile_dicts[tile].append(l)

Print dictionary keys and values, i.e. the data links.

In [33]:
tile_dicts.keys()

dict_keys(['T13TGF', 'T14TKL'])

In [34]:
tile_dicts['T13TGF'][:5]

['https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021133T172406.v2.0/HLS.L30.T13TGF.2021133T172406.v2.0.B04.tif',
 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021133T172406.v2.0/HLS.L30.T13TGF.2021133T172406.v2.0.B05.tif',
 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021133T172406.v2.0/HLS.L30.T13TGF.2021133T172406.v2.0.Fmask.tif',
 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021133T172406.v2.0/HLS.L30.T13TGF.2021133T172406.v2.0.B02.tif',
 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021133T173859.v2.0/HLS.S30.T13TGF.2021133T173859.v2.0.B8A.tif']

Now we will create a seperate list of data links for each tile

In [35]:
tile_links_T14TKL = tile_dicts['T14TKL']
tile_links_T13TGF = tile_dicts['T13TGF']

Print the band/layer links for HLS tile T13TGF.

In [36]:
tile_links_T13TGF[:10]

['https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021133T172406.v2.0/HLS.L30.T13TGF.2021133T172406.v2.0.B04.tif',
 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021133T172406.v2.0/HLS.L30.T13TGF.2021133T172406.v2.0.B05.tif',
 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021133T172406.v2.0/HLS.L30.T13TGF.2021133T172406.v2.0.Fmask.tif',
 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021133T172406.v2.0/HLS.L30.T13TGF.2021133T172406.v2.0.B02.tif',
 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021133T173859.v2.0/HLS.S30.T13TGF.2021133T173859.v2.0.B8A.tif',
 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021133T173859.v2.0/HLS.S30.T13TGF.2021133T173859.v2.0.Fmask.tif',
 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HL

Split the links by band.

In [37]:
bands_dicts = defaultdict(list)

In [38]:
for b in tile_links_T13TGF:
    band = b.split('.')[-2]
    bands_dicts[band].append(b)

In [39]:
bands_dicts.keys()

dict_keys(['B04', 'B05', 'Fmask', 'B02', 'B8A'])

In [40]:
bands_dicts['B04']

['https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021133T172406.v2.0/HLS.L30.T13TGF.2021133T172406.v2.0.B04.tif',
 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021133T173859.v2.0/HLS.S30.T13TGF.2021133T173859.v2.0.B04.tif',
 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021140T173021.v2.0/HLS.L30.T13TGF.2021140T173021.v2.0.B04.tif',
 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021140T172859.v2.0/HLS.S30.T13TGF.2021140T172859.v2.0.B04.tif',
 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021145T172901.v2.0/HLS.S30.T13TGF.2021145T172901.v2.0.B04.tif',
 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021155T172901.v2.0/HLS.S30.T13TGF.2021155T172901.v2.0.B04.tif',
 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30

Save the individual link lists as separate text files with descriptive names.

In [41]:
for k, v in bands_dicts.items():
    name = (f'HTTPS_T13TGF_{k}_Links.txt')
    with open(f'../../data/{name}', 'w') as f:
        for l in v:
            f.write(f"{l}" + '\n')

Write links to files for S3 direct access.

In [42]:
for k, v in bands_dicts.items():
    name = (f'S3_T13TGF_{k}_Links.txt')
    with open(f'../../data/{name}', 'w') as f:
        for l in v:
            s3l = l.replace('https://data.lpdaac.earthdatacloud.nasa.gov/', 's3://')
            f.write(f"{s3l}" + '\n')

## Additional Resources

- https://github.com/nasa/cmr-stac
- https://stacspec.org/
- https://stackoverflow.com/questions/26367812/appending-to-list-in-python-dictionary
- https://pystac-client.readthedocs.io/en/latest/index.html
- https://pystac.readthedocs.io/en/1.0/

## Contact Info:  

Email: LPDAAC@usgs.gov  
Voice: +1-866-573-3222  
Organization: Land Processes Distributed Active Archive Center (LP DAAC)¹  
Website: <https://lpdaac.usgs.gov/>  

¹Work performed under USGS contract G15PD00467 for NASA contract NNG14HH33I.  