# Create a STAC Collection

<a style="display: inline-block;" href="https://mybinder.org/v2/gh/RadiantMLHub/ml4eo-bootcamp-2021/main?filepath=Lecture%205%2Fexercises%2F3_create_stac_collection.ipynb"><img src="https://mybinder.org/badge_logo.svg" alt="Launch in Binder"/></a>

In the [second exercise](./2_create_stac_label_item.ipynb) we learned about STAC Extensions, and we created a STAC Item to represent our label assets. In this exercise, we will:

* Describe how Items can be grouped together into Collections
* Create a Collection to hold all of the Items we have created so far

## STAC Collections

A [**STAC Collection**](https://github.com/radiantearth/stac-spec/tree/master/collection-spec) is used to describe a set of related Items and provide additional metadata about the set of Items as a whole. Items in a Collection generally share the same properties and high-level metadata. As with STAC Items, the core spec defines a minimal set of fields that are applicable to most Collections and allows the user to add extensions or define custom fields to assign additional metadata to the Collection.

## STAC Catalogs & STAC API

The STAC spec also defines a [**Catalog**](https://github.com/radiantearth/stac-spec/tree/master/catalog-spec) object which is meant to be a top-level object that groups together Collections in a single location. Since we can use either a Catalog or a Collection as our top-level object, we will not be covering the creation of Catalogs in these exercises. However, in the next exercise, we will be working with the Radiant MLHub API, which implements the [STAC API spec](https://github.com/radiantearth/stac-api-spec). The landing page for a STAC API implementation will return a Catalog, which in turn links to the Collections it contains. We will cover this in more detail in a future exercise.

## Create a Collection

We will use the [`pystac.Collection`](https://pystac.readthedocs.io/en/latest/api.html#collection) class to create a new Collection and add the Items we created in the previous two exercises.

The ["Use of links"](https://github.com/radiantearth/stac-spec/blob/master/best-practices.md#use-of-links) section in the STAC Best Practices documentation gives us some guidance on different ways of handling links in our Catalogs and Collections. Our first decision is whether we are going to create a "published" catalog or a "self-contained" catalog. Since we will just be working with our files locally and are not planning on publishing this Collection on the web, we will go with a "self-contained" Collection. Within the "self-contained" catalogs, we can choose to create a metadata-only catalog or to include our assets. We will have our assets alongside the STAC objects in the same file system, so we will elect to create a [**self-contained catalog with assets**](https://github.com/radiantearth/stac-spec/blob/master/best-practices.md#self-contained-with-assets). In this case, we will use relative links for both `links` and `assets`.

First, let's import the libraries we will be working with.

In [1]:
import json
from pathlib import Path

import pystac

tmp_dir = Path.cwd().parent / 'tmp'

Next, we will create the core Collection. We will provide a value of `None` for the `extent` initially, and then update this extent based on the Item geometries and datetimes after we have added them.

In [2]:
# Load the previously created Items
source_img_item = pystac.Item.from_file(str(tmp_dir / 'S2A_34HCH_20171008_0_L2A_TCI_source.json'))
label_item = pystac.Item.from_file(str(tmp_dir / 'S2A_34HCH_20171008_0_L2A_TCI_labels.json'))

Finally, we will construct the Collection using these extents and the remaining required arguments. We will save our STAC objects in a `stac` sub-directory in the directory for this lecture.

In [3]:
stac_path = Path.cwd().parent / 'stac'
collection_path = stac_path / 'collection.json'

extent = pystac.Extent.from_items([
    source_img_item,
    label_item
])
collection = pystac.Collection(
    id='example-collection',
    description='An example STAC Collections containing labeled crop types.',
    extent=None,
    href=str(collection_path),
    catalog_type=pystac.CatalogType.SELF_CONTAINED
)

In [4]:
# Add the items
collection.add_item(source_img_item)
collection.add_item(label_item)

# Update the extent based on the item geometries and datetimes
collection.update_extent_from_items()

We can use the [`get_items`](https://pystac.readthedocs.io/en/latest/api.html#pystac.Catalog.get_items) method on our Collection to get a generator of all of the items. Let's use this to check out the first Item in the Collection.

In [5]:
first_item = next(collection.get_items())
print(json.dumps(first_item.to_dict(), indent=4))

{
    "type": "Feature",
    "stac_version": "1.0.0-beta.2",
    "id": "S2A_34HCH_20171008_0_L2A_TCI_source",
    "properties": {
        "platform": "Sentinel-2",
        "constellation": "Sentinel-2",
        "datetime": "2017-10-08T00:00:00Z"
    },
    "geometry": {
        "type": "Polygon",
        "coordinates": [
            [
                [
                    19.282309157846285,
                    -33.65646563177132
                ],
                [
                    19.28079050631379,
                    -33.732735055742886
                ],
                [
                    19.353625203525958,
                    -33.733727878895834
                ],
                [
                    19.35507958936285,
                    -33.65745561206109
                ],
                [
                    19.282309157846285,
                    -33.65646563177132
                ]
            ]
        ]
    },
    "links": [
        {
            "rel": "root",
 

We can see that PySTAC has automatically added links back to the parent collection. Very helpful! However, our Asset links and the `"self"` links for our Items are still absolute. This means if we zip up our files and send them to someone, the links will be broken. We can fix the asset links using the [`make_all_asset_hrefs_relative`](https://pystac.readthedocs.io/en/latest/api.html#pystac.Catalog.make_all_asset_hrefs_relative) method on our Collection. 

We can fix the `"self"` links in the Items (and save all of our STAC files at the same time) using the [`normalize_and_save`](https://pystac.readthedocs.io/en/latest/api.html#pystac.Catalog.normalize_and_save) method.

In [6]:
collection.make_all_asset_hrefs_relative()
collection.normalize_and_save(str(stac_path))

In [7]:
!find ../stac -type f

../stac/S2A_34HCH_20171008_0_L2A_TCI_labels/S2A_34HCH_20171008_0_L2A_TCI_labels.json
../stac/collection.json
../stac/S2A_34HCH_20171008_0_L2A_TCI_source/S2A_34HCH_20171008_0_L2A_TCI_source.json
