##  GEE STAC Catalog

Google Earth Engine produces a static catalogUse `pystac_client` package to query a static STAC catalog as a JSON file on Google Cloud Storage.

We use PySTAC Client to read and extract information from this catalog.
 
PySTAC Client (`pystac_client`) builds upon PySTAC (`pystac`) through higher-level functionality. Some of the functions are inherited from the `pystac` library

In [22]:
try:
    import pystac_client
except ModuleNotFoundError:
    if 'google.colab' in str(get_ipython()):
        !apt install pystac_client -qq
    else:
        print('pystac_client not found, please install via conda in your environment')

In [23]:
from pystac_client import Client
import pandas as pd

## Catalog

A STAC Catalog is a top-level object. Create a STAC Catalog object from the `catalog.json`.

In [24]:
catalog = Client.open('https://earthengine-stac.storage.googleapis.com/catalog/catalog.json')

Check the catalog description. The description is in Markdown format, so we format it to HTML.

In [25]:
from IPython.display import Markdown
Markdown(catalog.description)

The [Earth Engine](https://earthengine.google.com/) Public Data Catalog.

See also:

- [HTML version of the catalog](https://developers.google.com/earth-engine/datasets/catalog)
- [STAC Browser version](https://radiantearth.github.io/stac-browser/#/external/storage.googleapis.com/earthengine-stac/catalog/catalog.json)


There are [many types](https://pystac.readthedocs.io/en/stable/concepts.html#catalog-types) of STAC Catalogs. We check the type of this catalog.

In [26]:
catalog.catalog_type

<CatalogType.ABSOLUTE_PUBLISHED: 'ABSOLUTE_PUBLISHED'>

## Collection

A STAC Collection is used to group related Items and provide aggregate or summary metadata for those Items.

STAC Catalogs may have many nested layers of Catalogs or Collections within the top-level collection.

The GEE Catalog contains nested collections, grouped by providers. Here we query for all collections by provider 'USDA'

In [27]:
cols = catalog.get_child('USDA')
cols

0
ID: USDA
Title: USDA
Description: Datasets from the [U.S. Department of Agriculture](https://www.usda.gov/).
type: Catalog

0
ID: USDA/NAIP/DOQQ
Title: NAIP: National Agriculture Imagery Program
"Description: The National Agriculture Imagery Program (NAIP) acquires aerial imagery during the agricultural growing seasons in the continental U.S. NAIP projects are contracted each year based upon available funding and the imagery acquisition cycle. Beginning in 2003, NAIP was acquired on a 5-year cycle. 2008 was a transition year, and a three-year cycle began in 2009. NAIP imagery is acquired at a one-meter ground sample distance (GSD) with a horizontal accuracy that matches within six meters of photo-identifiable ground control points, which are used during image inspection. Older images were collected using 3 bands (Red, Green, and Blue: RGB), but newer imagery is usually collected with an additional near-infrared band (RGBN). RGB asset ids begin with 'n_', NRG asset ids begin with 'c_', RGBN asset ids begin with 'm_'."
"Providers:  USDA Farm Production and Conservation - Business Center, Geospatial Enterprise Operations (producer, licensor)  Google Earth Engine (host)"
"gee:terms_of_use: Most information presented on the FSA Web site is considered public domain information. Public domain information may be freely distributed or copied, but use of appropriate byline/photo/image credits is requested. For more information visit the [FSA Policies and Links](https://www.fsa.usda.gov/help/policies-and-links) website. Users should acknowledge USDA Farm Production and Conservation - Business Center, Geospatial Enterprise Operations when using or distributing this data set."
gee:type: image_collection
"keywords: ['aerial', 'agriculture', 'fpac', 'highres', 'imagery', 'naip', 'usda']"
"providers: [{'name': 'USDA Farm Production and Conservation - Business Center, Geospatial Enterprise Operations', 'roles': ['producer', 'licensor'], 'url': 'https://naip-usdaonline.hub.arcgis.com/'}, {'name': 'Google Earth Engine', 'roles': ['host'], 'url': 'https://developers.google.com/earth-engine/datasets/catalog/USDA_NAIP_DOQQ'}]"
"sci:citation: USDA Farm Production and Conservation - Business Center, Geospatial Enterprise Operations"
"stac_extensions: ['https://stac-extensions.github.io/eo/v1.0.0/schema.json', 'https://stac-extensions.github.io/scientific/v1.0.0/schema.json']"

0
https://stac-extensions.github.io/eo/v1.0.0/schema.json
https://stac-extensions.github.io/scientific/v1.0.0/schema.json

0
Rel: root
Target:
Media Type: application/json

0
Rel: related
Target: https://code.earthengine.google.com/?scriptPath=Examples:Datasets/USDA_NAIP_DOQQ
Media Type: text/html
code: JavaScript

0
Rel: preview
Target: https://developers.google.com/earth-engine/datasets/images/USDA/USDA_NAIP_DOQQ_sample.png
Media Type: image/png

0
Rel: license
Target: https://developers.google.com/earth-engine/datasets/catalog/USDA_NAIP_DOQQ#terms-of-use
Media Type: text/html

0
Rel: self
Target: https://storage.googleapis.com/earthengine-stac/catalog/USDA/USDA_NAIP_DOQQ.json
Media Type: application/json

0
Rel: parent
Target:
Media Type: application/json

0
Rel: root
Target:
Media Type: application/json

0
Rel: child
Target:
Media Type: application/json

0
Rel: child
Target: https://storage.googleapis.com/earthengine-stac/catalog/USDA/USDA_NASS_CDL.json
Media Type: application/json

0
Rel: self
Target: https://storage.googleapis.com/earthengine-stac/catalog/USDA/catalog.json
Media Type: application/json

0
Rel: parent
Target:
Media Type: application/json


We can get a child collection by specifying the id. Since this is a nested collection, we specify `recursive=True`

In [28]:
col = catalog.get_child('USDA/NAIP/DOQQ', recursive=True)
col

0
ID: USDA/NAIP/DOQQ
Title: NAIP: National Agriculture Imagery Program
"Description: The National Agriculture Imagery Program (NAIP) acquires aerial imagery during the agricultural growing seasons in the continental U.S. NAIP projects are contracted each year based upon available funding and the imagery acquisition cycle. Beginning in 2003, NAIP was acquired on a 5-year cycle. 2008 was a transition year, and a three-year cycle began in 2009. NAIP imagery is acquired at a one-meter ground sample distance (GSD) with a horizontal accuracy that matches within six meters of photo-identifiable ground control points, which are used during image inspection. Older images were collected using 3 bands (Red, Green, and Blue: RGB), but newer imagery is usually collected with an additional near-infrared band (RGBN). RGB asset ids begin with 'n_', NRG asset ids begin with 'c_', RGBN asset ids begin with 'm_'."
"Providers:  USDA Farm Production and Conservation - Business Center, Geospatial Enterprise Operations (producer, licensor)  Google Earth Engine (host)"
"gee:terms_of_use: Most information presented on the FSA Web site is considered public domain information. Public domain information may be freely distributed or copied, but use of appropriate byline/photo/image credits is requested. For more information visit the [FSA Policies and Links](https://www.fsa.usda.gov/help/policies-and-links) website. Users should acknowledge USDA Farm Production and Conservation - Business Center, Geospatial Enterprise Operations when using or distributing this data set."
gee:type: image_collection
"keywords: ['aerial', 'agriculture', 'fpac', 'highres', 'imagery', 'naip', 'usda']"
"providers: [{'name': 'USDA Farm Production and Conservation - Business Center, Geospatial Enterprise Operations', 'roles': ['producer', 'licensor'], 'url': 'https://naip-usdaonline.hub.arcgis.com/'}, {'name': 'Google Earth Engine', 'roles': ['host'], 'url': 'https://developers.google.com/earth-engine/datasets/catalog/USDA_NAIP_DOQQ'}]"
"sci:citation: USDA Farm Production and Conservation - Business Center, Geospatial Enterprise Operations"
"stac_extensions: ['https://stac-extensions.github.io/eo/v1.0.0/schema.json', 'https://stac-extensions.github.io/scientific/v1.0.0/schema.json']"

0
https://stac-extensions.github.io/eo/v1.0.0/schema.json
https://stac-extensions.github.io/scientific/v1.0.0/schema.json

0
Rel: root
Target:
Media Type: application/json

0
Rel: related
Target: https://code.earthengine.google.com/?scriptPath=Examples:Datasets/USDA_NAIP_DOQQ
Media Type: text/html
code: JavaScript

0
Rel: preview
Target: https://developers.google.com/earth-engine/datasets/images/USDA/USDA_NAIP_DOQQ_sample.png
Media Type: image/png

0
Rel: license
Target: https://developers.google.com/earth-engine/datasets/catalog/USDA_NAIP_DOQQ#terms-of-use
Media Type: text/html

0
Rel: self
Target: https://storage.googleapis.com/earthengine-stac/catalog/USDA/USDA_NAIP_DOQQ.json
Media Type: application/json

0
Rel: parent
Target:
Media Type: application/json


The result is a `CollectionClient` object. Let's extract the bsaic information about this collection.

In [33]:
col.title

'NAIP: National Agriculture Imagery Program'

In [34]:
Markdown(col.description)

The National Agriculture Imagery Program (NAIP) acquires aerial imagery
during the agricultural growing seasons in the continental U.S.

NAIP projects are contracted each year based upon available funding and the
imagery acquisition cycle. Beginning in 2003, NAIP was acquired on
a 5-year cycle. 2008 was a transition year, and a three-year cycle began
in 2009.

NAIP imagery is acquired at a one-meter ground sample distance (GSD) with a
horizontal accuracy that matches within six meters of photo-identifiable
ground control points, which are used during image inspection.

Older images were collected using 3 bands (Red, Green, and Blue: RGB), but
newer imagery is usually collected with an additional near-infrared band
(RGBN). RGB asset ids begin with 'n_', NRG asset ids begin with 'c_', RGBN
asset ids begin with 'm_'.


A catalog consists of one or more collections. Let's get all the collections from the catalog and iterate through them.

The collection also has some GEE specific metadata

In [46]:
Markdown(col.extra_fields['gee:terms_of_use'])

Most information presented on the FSA Web site is considered public domain
information. Public domain information may be freely distributed or copied,
but use of appropriate byline/photo/image credits is requested. For more
information visit the [FSA Policies and Links](https://www.fsa.usda.gov/help/policies-and-links)
website.

Users should acknowledge USDA Farm Production and Conservation -
Business Center, Geospatial Enterprise Operations when using or
distributing this data set.


STAC collections have extensions that allows one to specify additional domain specific information. Most remote sensing datasets have fields from the [Electro-Optical (EO)](https://github.com/stac-extensions/eo) extension that can describe the images, bands and their spatial resolutions.

In [49]:
summaries = col.summaries.to_dict()

summaries['eo:bands']

[{'description': 'Red', 'gee:units': 'DN', 'name': 'R'},
 {'description': 'Green', 'gee:units': 'DN', 'name': 'G'},
 {'description': 'Blue', 'gee:units': 'DN', 'name': 'B'},
 {'description': 'Near infrared', 'gee:units': 'DN', 'name': 'N'}]

## Collections

We can query the catalog for all collections instead of specifying a single colleciton by id.

In [50]:
collections = catalog.get_all_collections()

The `collections` is a generator object. You can iterate over it and extract information fom each [STAC Collection](https://pystac.readthedocs.io/en/stable/api/collection.html) object.

In [52]:
datasets = []

for col in catalog.get_all_collections():
    datasets.append({
        'id': col.id,
        'title': col.title,
        'description':col.description
    })

Remove all deprecated collections.

In [54]:
datasets = [d for d in datasets if 'deprecated' not in d['title']]

Create a Pandas DataFrame.

In [55]:
df = pd.DataFrame(datasets)
df

Unnamed: 0,id,title,description
0,AAFC/ACI,Canada AAFC Annual Crop Inventory,"Starting in 2009, the Earth Observation Team o..."
1,ACA/reef_habitat/v1_0,Allen Coral Atlas (ACA) - Geomorphic Zonation ...,The [Allen Coral Atlas](https://allencoralatla...
2,AHN/AHN2_05M_INT,"AHN Netherlands 0.5m DEM, Interpolated",The AHN DEM is a 0.5m DEM covering the Netherl...
3,AHN/AHN2_05M_NON,"AHN Netherlands 0.5m DEM, Non-Interpolated",The AHN DEM is a 0.5m DEM covering the Netherl...
4,AHN/AHN2_05M_RUW,"AHN Netherlands 0.5m DEM, Raw Samples",The AHN DEM is a 0.5m DEM covering the Netherl...
...,...,...,...
580,YALE/YCEO/UHI/UHI_yearly_pixel/v4,YCEO Surface Urban Heat Islands: Pixel-Level A...,"This dataset contains annual, summertime, and ..."
581,YALE/YCEO/UHI/Winter_UHI_yearly_pixel/v4,YCEO Surface Urban Heat Islands: Pixel-Level Y...,"This dataset contains annual, summertime, and ..."
582,projects/planet-nicfi/assets/basemaps/africa,Planet & NICFI Basemaps for Tropical Forest Mo...,This image collection provides access to high-...
583,projects/planet-nicfi/assets/basemaps/americas,Planet & NICFI Basemaps for Tropical Forest Mo...,This image collection provides access to high-...


Save to Excel.

In [19]:
df.to_excel('ee_datasets.xlsx', index=False)