##  GEE STAC Catalog

Google Earth Engine produces a static catalogUse `pystac_client` package to query a static STAC catalog as a JSON file on Google Cloud Storage.

We use PySTAC Client to read and extract information from this catalog.
 
PySTAC Client (`pystac_client`) builds upon PySTAC (`pystac`) through higher-level functionality. Some of the functions are inherited from the `pystac` library

In [1]:
try:
    import pystac_client
except ModuleNotFoundError:
    if 'google.colab' in str(get_ipython()):
        !pip install pystac_client -qq
    else:
        print('pystac_client not found, please install via conda in your environment')

In [2]:
from pystac_client import Client
import pandas as pd

## Catalog

A STAC Catalog is a top-level object. Create a STAC Catalog object from the `catalog.json`.

In [3]:
catalog = Client.open('https://earthengine-stac.storage.googleapis.com/catalog/catalog.json')

Check the catalog description. The description is in Markdown format, so we format it to HTML.

In [4]:
from IPython.display import Markdown
Markdown(catalog.description)

The [Earth Engine](https://earthengine.google.com/) Public Data Catalog.

See also:

- [HTML version of the catalog](https://developers.google.com/earth-engine/datasets/catalog)
- [STAC Browser version](https://radiantearth.github.io/stac-browser/#/external/storage.googleapis.com/earthengine-stac/catalog/catalog.json)


There are [many types](https://pystac.readthedocs.io/en/stable/concepts.html#catalog-types) of STAC Catalogs. We check the type of this catalog.

In [5]:
catalog.catalog_type

<CatalogType.ABSOLUTE_PUBLISHED: 'ABSOLUTE_PUBLISHED'>

## Collection

A STAC Collection is used to group related Items and provide aggregate or summary metadata for those Items.

STAC Catalogs may have many nested layers of Catalogs or Collections within the top-level collection.

The GEE Catalog contains nested collections, grouped by providers. Here we query for all collections by provider 'USDA'

In [6]:
cols = catalog.get_child('USDA')
cols

<Client id=USDA>

We can get a child collection by specifying the id. Since this is a nested collection, we specify `recursive=True`

In [None]:
col = catalog.get_child('USDA/NAIP/DOQQ', recursive=True)
col

The result is a `CollectionClient` object. Let's extract the bsaic information about this collection.

In [None]:
col.title

In [None]:
Markdown(col.description)

A catalog consists of one or more collections. Let's get all the collections from the catalog and iterate through them.

The collection also has some GEE specific metadata

In [None]:
Markdown(col.extra_fields['gee:terms_of_use'])

STAC collections have extensions that allows one to specify additional domain specific information. Most remote sensing datasets have fields from the [Electro-Optical (EO)](https://github.com/stac-extensions/eo) extension that can describe the images, bands and their spatial resolutions.

In [None]:
summaries = col.summaries.to_dict()

summaries['eo:bands']

## Collections

We can query the catalog for all collections instead of specifying a single colleciton by id.

In [8]:
collections = catalog.get_all_collections()

The `collections` is a generator object. You can iterate over it and extract information fom each [STAC Collection](https://pystac.readthedocs.io/en/stable/api/collection.html) object.

In [11]:
datasets = []

for col in catalog.get_all_collections():
    datasets.append({
        'id': col.id,
        'title': col.title,
        'description':col.description,
        'keywords': col.keywords
    })

Remove all deprecated collections.

In [12]:
datasets = [d for d in datasets if 'deprecated' not in d['title']]

Select all datasets tagged with the keyword `landcover`.

In [13]:
datasets = [d for d in datasets if 'landcover' in d['keywords']]

Create a Pandas DataFrame.

In [14]:
df = pd.DataFrame(datasets)
df

Unnamed: 0,id,title,description,keywords
0,AAFC/ACI,Canada AAFC Annual Crop Inventory,"Starting in 2009, the Earth Observation Team o...","[aafc, canada, crop, landcover]"
1,COPERNICUS/CORINE/V20/100m,Copernicus CORINE Land Cover,The CORINE (coordination of information on the...,"[clc, copernicus, corine, eea, esa, eu, landco..."
2,COPERNICUS/Landcover/100m/Proba-V-C3/Global,Copernicus Global Land Cover Layers: CGLS-LC10...,The Copernicus Global Land Service (CGLS) is e...,"[copernicus, eea, esa, eu, landcover, proba, p..."
3,CSP/HM/GlobalHumanModification,CSP gHM: Global Human Modification,The global Human Modification dataset (gHM) pr...,"[csp, fragmentation, human_modification, landc..."
4,DLR/WSF/WSF2015/v1,World Settlement Footprint 2015,The World Settlement Footprint (WSF) 2015 is a...,"[landcover, landsat-derived, sentinel1-derived..."
5,ESA/CCI/FireCCI/5_1,FireCCI51: MODIS Fire_cci Burned Area Pixel Pr...,The MODIS Fire_cci Burned Area pixel product v...,"[burn, c3s, cci, climate_change, copernicus, e..."
6,ESA/GLOBCOVER_L4_200901_200912_V2_3,GlobCover: Global Land Cover Map,GlobCover 2009 is a global land cover map base...,"[esa, globcover, landcover]"
7,ESA/WorldCover/v100,ESA WorldCover 10m v100,The European Space Agency (ESA) WorldCover 10 ...,"[esa, landcover, landuse, sentinel1-derived, s..."
8,GLIMS/20210914,GLIMS 2021: Global Land Ice Measurements From ...,Global Land Ice Measurements from Space (GLIMS...,"[glacier, glims, ice, landcover, nasa, nsidc, ..."
9,GLIMS/current,GLIMS Current: Global Land Ice Measurements Fr...,Global Land Ice Measurements from Space (GLIMS...,"[glacier, glims, ice, landcover, nasa, nsidc, ..."


Save to Excel.

In [None]:
df.to_excel('ee_datasets.xlsx', index=False)