# Create a STAC Catalog with a Collection Using PySTAC

This tutorial builds off of the knowledge from the previous tutorials (where you learned how to [create a STAC Catalog](/en/tutorials/2-create-stac-catalog-python/index.html) and [create a STAC Item that utilizes extensions](/en/tutorials/3-create-stac-item-with-extension/index.html)). Now that you know the basics of creating a STAC Catalog, we want to add more functionality to it. This tutorial shows you how to add a STAC Collection to a Catalog to better organize the catalog's items.

## Dependencies
If you need to install pystac, rasterio, or pystac, uncomment the lines below and run the cell.

In [None]:
! pip install pystac
! pip install rasterio
! pip install shapely

Collecting pystac
  Downloading pystac-1.9.0-py3-none-any.whl (181 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m181.6/181.6 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: pystac
Successfully installed pystac-1.9.0
Collecting rasterio
  Downloading rasterio-1.3.9-cp310-cp310-manylinux2014_x86_64.whl (20.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m20.6/20.6 MB[0m [31m46.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting affine (from rasterio)
  Downloading affine-2.4.0-py3-none-any.whl (15 kB)
Collecting snuggs>=1.4.1 (from rasterio)
  Downloading snuggs-1.4.7-py3-none-any.whl (5.4 kB)
Installing collected packages: snuggs, affine, rasterio
Successfully installed affine-2.4.0 rasterio-1.3.9 snuggs-1.4.7


## STAC Collections

Collections are a subtype of a catalog that have some additional properties to make them more searchable. They also can define common properties so that items in the collection don't have to duplicate common data for each item. Let's create a collection to hold common properties between two images
We will use the image we have been working with along with another image.

## Import Packages and Store Data
To begin, import the packages that you need to access data and work with STAC in Python.

In [None]:
import os
import rasterio
import urllib.request
import pystac

from shapely.geometry import Polygon, mapping
from datetime import datetime, timezone
from pystac.extensions.eo import Band, EOExtension
from tempfile import TemporaryDirectory

Let's set up our temporary directory and store two images from the Spacenet 5 Challenge.

## Collect the Items' `geometry` and `bbox`
To get the bounding box and footprint of the image, we will utilize the `get_bbox_and_footprint` function we first used in the [Create a STAC Catalog Tutorial](/en/tutorials/2-create-stac-catalog-python/index.html).

We will do this process for both the images in which we are using.

In [None]:
image_1= '/content/sentinel_20m_20240109.tif'
image_2='/content/sentinel_20m_20240129.tif'

In [None]:
print("image_1: " ,image_1 , "\n", "image_2: ", image_2)

image_1:  /content/sentinel_20m_20240109.tif 
 image_2:  /content/sentinel_20m_20240129.tif


In [None]:
def get_bbox_and_footprint(raster):
    with rasterio.open(raster) as r:
        bounds = r.bounds
        bbox = [bounds.left, bounds.bottom, bounds.right, bounds.top]
        footprint = Polygon([
            [bounds.left, bounds.bottom],
            [bounds.left, bounds.top],
            [bounds.right, bounds.top],
            [bounds.right, bounds.bottom]
        ])

        return (bbox, mapping(footprint))

In [None]:
# Run the function and print out the results for image 1
bbox, footprint = get_bbox_and_footprint(image_1)
print("bbox: ", bbox, "\n")
print("footprint: ", footprint)

bbox:  [600000.0, 2990220.0, 709800.0, 3100020.0] 

footprint:  {'type': 'Polygon', 'coordinates': (((600000.0, 2990220.0), (600000.0, 3100020.0), (709800.0, 3100020.0), (709800.0, 2990220.0), (600000.0, 2990220.0)),)}


In [None]:
# Run the function and print out the results for image 2
bbox2, footprint2 = get_bbox_and_footprint(image_2)
print("bbox: ", bbox2, "\n")
print("footprint: ", footprint2)

bbox:  [600000.0, 2990220.0, 709800.0, 3100020.0] 

footprint:  {'type': 'Polygon', 'coordinates': (((600000.0, 2990220.0), (600000.0, 3100020.0), (709800.0, 3100020.0), (709800.0, 2990220.0), (600000.0, 2990220.0)),)}


## Define the Bands of Sentinel-2

In [None]:
S2_bands = [Band.create(name='Aerosols', description='Aerosols: 443 - 442 nm', common_name='Aerosols'),
             Band.create(name='Blue', description='Blue: 496.6 - 492.1 nm', common_name='blue'),
             Band.create(name='Green', description='Green: 560 - 559 nm', common_name='green'),
             Band.create(name='Red', description='Red: 664.5 - 665 nm', common_name='red'),
             Band.create(name='Red Edge1', description='Red Edge1: 703.9 - 703.8 nm', common_name='rededge1'),
             Band.create(name='Red Edge2', description='Red Edge2: 740.2 - 739.1 nm', common_name='rededge2'),
             Band.create(name='Red Edge3', description='Red Edge3: 782.5 - 779.7 nm', common_name='rededge3'),
             Band.create(name='NIR', description='NIR: 835.1 - 833 nm', common_name='nir'),
             Band.create(name='Red Edge4', description='Red Edge4: 864.8 - 864 nm', common_name='rededge4'),
             Band.create(name='Water Vapor', description='Water Vapor : 945 - 943.2 nm', common_name='water vapor'),
             Band.create(name='SWIR1', description='SWIR1: 1613.7 - 1610.4nm', common_name='swir1'),
             Band.create(name='SWIR2', description='SWIR2: 2202.4 - 2185 nm', common_name='swir2')]

## Create the Collection

Take a look at the PySTAC API Documentation for [Collection](https://pystac.readthedocs.io/en/stable/api/collection.html#pystac-collection) to see what information we need to supply in order to satisfy the specification.

Beyond what a Catalog requires, a Collection requires a `license` of the data in the collection and an `extent` that describes the range of space and time that the items it holds occupy.

An `extent` is comprised of a `SpatialExtent` and a `TemporalExtent`. These extents hold one or more bounding boxes and time intervals, respectively, that completely  cover the items contained in the collections.

Let's start with creating two new items - these will be core items. We can set these items to implement the EO extension by specifying them in the `stac_extensions`.

In [None]:
collection_item = pystac.Item(id='sentinel2_09_2024',
                               geometry=footprint,
                               bbox=bbox,
                               datetime=datetime.utcnow(),
                               properties={})

collection_item.common_metadata.platform = 'Bhoonidhi'
collection_item.common_metadata.instruments = ['Sentinel2']

asset = pystac.Asset(href=image_1,
                      media_type=pystac.MediaType.GEOTIFF)
collection_item.add_asset("image", asset)
eo = EOExtension.ext(collection_item.assets["image"], add_if_missing=True)
eo.apply(S2_bands)

collection_item2 = pystac.Item(id='sentinel2_29_2024',
                               geometry=footprint2,
                               bbox=bbox2,
                               datetime=datetime.utcnow(),
                               properties={})


collection_item2.common_metadata.platform = 'Bhoonidhi'
collection_item2.common_metadata.instruments = ['Sentinel2']

asset2 = pystac.Asset(href=image_2,
                     media_type=pystac.MediaType.GEOTIFF)
collection_item2.add_asset("image", asset2)
eo = EOExtension.ext(collection_item2.assets["image"], add_if_missing=True)
eo.apply([
    band for band in S2_bands if band.name in ["Red", "Green", "Blue"]
])


We can use our two items' metadata to find out what the proper bounds are:

In [None]:
from shapely.geometry import shape

unioned_footprint = shape(footprint).union(shape(footprint2))
collection_bbox = list(unioned_footprint.bounds)
spatial_extent = pystac.SpatialExtent(bboxes=[collection_bbox])

In [None]:
collection_interval = sorted([collection_item.datetime, collection_item2.datetime])
temporal_extent = pystac.TemporalExtent(intervals=[collection_interval])

In [None]:
collection_extent = pystac.Extent(spatial=spatial_extent, temporal=temporal_extent)

In [None]:
collection = pystac.Collection(id='S2-images',
                               description='Sikkim',
                               extent=collection_extent,
                              )

Now if we add our items to our collection, and our collection to a catalog, we get the following STAC that can be saved:

In [None]:
collection.add_items([collection_item, collection_item2])
catalog = pystac.Catalog(id='catalog-with-collection',
                         description='Catalog for Sikkim Sentinel2 data.')
catalog.add_child(collection)


In [None]:
catalog.describe()

* <Catalog id=catalog-with-collection>
    * <Collection id=S2-images>
      * <Item id=sentinel2_09_2024>
      * <Item id=sentinel2_29_2024>
      * <Item id=sentinel2_09_2024>
      * <Item id=sentinel2_29_2024>


In [None]:
catalog.normalize_and_save(root_href=os.path.join(tmp_dir.name, 'stac-collection'),
                           catalog_type=pystac.CatalogType.SELF_CONTAINED)

## Cleanup

Don't forget to clean up the temporary directory.

In [None]:
tmp_dir.cleanup()

There you have it. A STAC Catalog with a STAC Collection, STAC Items, and use of a STAC Extension. Now you are ready to build your own STAC Catalog for a dataset of your own.

#### Join the conversation
If you have any questions, you’re welcome to ask our community on [Gitter](https://app.gitter.im/#/room/#SpatioTemporal-Asset-Catalog_Lobby).