# Generate a STAC Catalog

For better or worse, it is easier to build a STAC catalog at the same time that the STAC items are created. Below we generate a catalog and metadata for a directory of data

---------------------
## Step I: Make the catalog

In [22]:
import datetime
import json
import os
from pathlib import Path
import ssl
# This is require for verification / validation using remote resources when inside the network
ssl._create_default_https_context = ssl._create_unverified_context

import pystac
from pystac import Collection, SpatialExtent, TemporalExtent, Extent

from amg.isismetadata import IsisMetadata
from amg.fgdcmetadata import FGDCMetadata, EquirectangularFgdcParser, PolarStereoGraphicFgdcParser
from amg.gdalmetadata import GDALMetadata
from amg.formatters.stac_formatter import to_stac
from amg.formatters.fgdc_formatter import to_fgdc
from amg import UnifiedMetadata

In [23]:
description = """
The Solid State Imager (SSI) on NASA's Galileo spacecraft acquired more than 500 images of Jupiter's moon, Europa, 
providing the only moderate- to high-resolution images of the moon's surface. Images were acquired as observation 
sequences during each orbit that targeted the moon. Each of these observation sequences consists of between 1 and 
19 images acquired close in time, that typically overlap, have consistent illumination and similar pixel scale. 
The observations vary from relatively low-resolution hemispherical imaging, to high-resolution targeted images that 
cover a small portion of the surface. Here we provide average mosaics of each of the individual observation sequences 
acquired by the Galileo spacecraft. These observation mosaics were constructed from a set of 481 Galileo images that 
were photogrammetrically controlled globally (along with 221 Voyager 1 and 2 images) to improve their relative 
locations on Europa's surface. The 92 observation mosaics provide users with nearly the entire Galileo Europa 
imaging dataset at its native resolution and with improved relative image locations.

The Solid State Imager (SSI) on NASA's Galileo spacecraft provided the only moderate- to high-resolution images 
of Jupiter's moon, Europa. Unfortunately, uncertainty in the position and pointing of the spacecraft, as well as 
the position and orientation of Europa, when the images were acquired resulted in significant errors in image 
locations on the surface. The result of these errors is that images acquired during different Galileo orbits, or 
even at different times during the same orbit, are significantly misaligned (errors of up to 100 km on the surface).
Previous work has generated global mosaics of Galileo and Voyager images that photogrammetrically control a subset 
of the available images to correct their relative locations. However, these efforts result in a "static" mosaic 
that is projected to a consistent pixel scale, and only use a fraction of the dataset (e.g., high resolution images 
are not included). The purpose of this current dataset is to increase the usability of the entire Galileo image set 
by photogrammetrically improving the locations of nearly every Europa image acquired by Galileo, and making them 
available to the community at their native resolution and in easy-to-use regional mosaics based on their acquisition time.
The dataset therefore provides a set of image mosaics that can be used for scientific analysis and mission planning activities.
"""

coll = Collection(id='usgs_controlled_voy1_voy2_galileo',
                  title='USGS Controlled Europa Voyager 1, Voyager 2, and Galileo Image Data',
                  description=description,
                  extent=Extent(SpatialExtent([-180, -90, 180, 90]), 
                                TemporalExtent([datetime.datetime(2021, 1, 1), None])),
                  href='https://asc-jupiter.s3-us-west-2.amazonaws.com/europa/individual_l2/collection.json',
                  license='PDDL-1.0'
                 )
coll.validate()

----------------
## Step II: Get a list of the input data

Below, we are generating the catalog from a list of files that contains full, qualified paths. One could also use glob to generate a file list dynamically from within a notebook.

The `UPLOAD_DIR` argument defines where we are going to write out the collection and any metadata files. In practice, the workflow that I have been using is:

1. Generate the cloud optimized geotiffs (COGs) and stage them into the UPLOAD_DIR
1. Generate the metadata and collection files, pointing at the original data, and stage them into the UPLOAD_DIR
1. Push all of the data to S3
1. Scrape the new S3 bucket using a local stac-browser
1. Push the updated stac-browser (it is a static site after all) to the web hosting S3 bucket

In [3]:
UPLOAD_DIR = '/scratch/ARD/stac/jupiter/europa/'

# List the products to generate STAC for...
with open('/archive/projects/europa/GLL_FinProducts/observation_lev2_products.lis', 'r') as f:
    products = f.readlines()
products = [Path(p.rstrip()) for p in products]

The first few entries from the above list file are printed as a sanity check.

In [19]:
print(products[0:2])

[PosixPath('/archive/projects/europa/GLL_FinProducts/10ESGLOBAL01/Lev2/s0413742778.equi.cub'), PosixPath('/archive/projects/europa/GLL_FinProducts/10ESGLOBAL01/Lev2/s0413742778.equi.photr.cub')]


-----------------------
## Step III: Cook Metadata and Update the Catalog

Now it is necessary to loop over the individual files and generate appropriate metadata. Before doing that, three items are defined:
- the template to use / parse for metadata 
- the overrides
- the mappings

Checkout the GenererateIndividualMetadata notebook for a full description of these arguments

In [11]:
FGDC_TEMPLATE = '../templates/europa_individual_l2_fgdc.xml'

# Define overrides
overrides = {'license': 'PDDL-1.0',
             'missions':['Voyager 1', 'Voyager 2', 'Galileo'],
             'doi':'https://doi.org/10.5066/P9VKKK7C',
             'href':'https://asc-jupiter.s3-us-west-2.amazonaws.com/europa/individual_l2'}

# Define mappings
mappings = {'bbox':IsisMetadata, }

STAC also has a concept of assets or files that are closely associated with one another. For each data set, it is necessary to define the assets template. The code will dynamically populate entries in the list of assets by filling in variables that are indicated by `{}`. For example, in the assets below the title of the first asset reads `'JPEG thumbnail of image {productid}'`. The code parses this string and replaces `{productid}` with the `productid` that is parsed off of the `UnifiedMetadata` object. 

In [13]:
assets = [{'title':'JPEG thumbnail of image {productid}',
           'href':'{href}/{productid}.jpeg',
           'media_type':'image/jpeg',
           'roles':['thumbnail'],
           'key':'thumbnail'},
          {'title': 'Cloud optimized GeoTiff for image {productid}',
           'href':'{href}/{productid}-cog.tif',
           'media_type':'image/tiff; application=geotiff; profile=cloud-optimized',
           'roles':['data'],
           'key':'B1'},
          {'title': 'GDAL PAM Metadata for image {productid}',
           'href':'{href}/{productid}-cog.tif.aux.xml',
           'media_type':'application/xml',
           'roles':['metadata'],
           'key':'gdal_metadata'},
          {'title': 'FGDC Metadata for image {productid}',
           'href':'{href}/productid.xml',
           'media_type':'application/xml',
           'roles':['metadata'],
           'key':'fgdc_metadata'}]

The cell below, in this example, is going to run for a fair amount of time simply because lots of metadata files are being generated.

In [25]:
for f in products:
    # Perform some mundging on the path to get the base product name without path or file extension
    f = str(f)
    basename = os.path.basename(f)
    outname = os.path.splitext(basename)[0]
    
    # Parse the filename (in this case) to get the projection so the correct FGDC projection injection can occurr
    if 'equi' in f:
        proj='equirect'
    elif 'npola' in f or 'spola' in f:
        proj='polarst'
    
    # Create the unified metadata record
    fgdc = FGDCMetadata(FGDC_TEMPLATE, proj=proj)
    gd = GDALMetadata(f)
    imd = IsisMetadata(f)

    record = UnifiedMetadata([fgdc, gd, imd], overrides=overrides, mappings={'bbox':IsisMetadata, })

    # Generate the FGDC metadata
    fgdc_md = to_fgdc(record)
    with open(f'{UPLOAD_DIR}/{outname}.xml', 'w') as f:
        f.write(fgdc_md)
    
    # Convert the generic metadata record into a STAC formatted metadata record
    as_stac = to_stac(record, assets=assets)
    as_stac.validate()
    
    # Add the item to the parent collection. This also adds the collection to the item (win-win)
    coll.add_item(as_stac)
    
    # Write the STAC metadata
    with open(f'{UPLOAD_DIR}/{outname}.json', 'w') as f:
        json.dump( as_stac.to_dict(), f, indent=2)

# Now write the collection
coll.validate()
with open(f'{UPLOAD_DIR}/collection.json', 'w') as f:
    json.dump(coll.to_dict(), f)

---------------------------
## Step IV: What about using glob?

It is also sometimes desirable to not have to generate a file list before hand. It is possible to use the glob module to generate a listing similar to the one above. The cells below:
- Create a new collection for Europa mosaics
- Glob a local directory full of said mosaics
- Generate stac and fgdc metadata for the image mosaics