# Create Label STAC Item

<a style="display: inline-block;" href="https://mybinder.org/v2/gh/RadiantMLHub/ml4eo-bootcamp-2021/main?filepath=Lecture%205%2Fexercises%2F2_create_stac_label_item.ipynb"><img src="https://mybinder.org/badge_logo.svg" alt="Launch in Binder"/></a>

In the [first exercise](./1_create_source_imagery_stac_item.ipynb) we learned about STAC Items and Assets, and we created a STAC Item to represent our source image tile. In this exercise, we will:

* Describe how STAC Extensions enhance STAC Items
* Create a STAC Item to represent our label assets
* Link our label Item to the corresponding source imagery Item we created in the first exercise.

## STAC Extensions

The core STAC specification intentionally defines a minimal set of properties that should apply to all EO assets, but additional fields can be introduced through [STAC Extensions](https://stac-extensions.github.io/). STAC Extensions may apply to any combination of Catalogs, Collections, and/or Items and are often used to add fields that better describe the data. Data providers are encouraged to combine multiple Extensions to properly describe their data. 

Anyone can create a STAC Extension and host it either in their own GitHub organization or in the official [stac-extensions](https://github.com/stac-extensions/) organization. Extensions may progress through different levels of maturity from Proposal to Pilot to Stable.

For a full list of STAC Extensions, including their maturity level, see the official ["List of STAC Extensions"](https://stac-extensions.github.io/#list-of-stac-extensions).

### Label Extension

The [STAC Label Extension](https://github.com/stac-extensions/label) is used within the STAC community to describe training data for EO machine learning models. Despite officially being in the "Proposal" stage, this extension has JSON schemas and examples defined and is being used in at least a few production systems (including the Radiant MLHub API). 

In particular, the Label Extension allows data providers to describe the type of task that the labels apply to (e.g. regression, classification, detection, segmentation), how the labels were generated, the classes present in the labeled data. It also allows the provider to link labeled data to corresponding source imagery using STAC links. 

**Input and feedback on this extension from ML practitioners and data scientist is welcomed and encouraged.** You can contribute by creating and commenting on issues or contributing pull requests in the [Label Extension repo](https://github.com/stac-extensions/label) or by joining the conversation on [Gitter](https://gitter.im/SpatioTemporal-Asset-Catalog/Lobby).

## Create Label Item

We will start by creating a STAC Item to represent our labeled data, much like in the first exercise. This item (and its assets) will include additional properties from the Label Extension. Then, we will add a link that points to the source imagery Item we created in the first exercise.

First, we import the libraries we will be using.

In [1]:
import json
from pathlib import Path

import rasterio
import rasterio.warp
import rasterio.features
import pystac

# This is the tmp_dir we created in the first exercise
tmp_dir = Path.cwd().parent / 'tmp'

Next, we create assets for our label raster and the GeoJSON file that describes the label values.

In [2]:
label_asset = pystac.Asset(
    href=str(Path('./labels.tif').resolve()),
    media_type='image/tiff; application=geotiff',
    roles=['labels', 'labels-raster']
)

classes_asset = pystac.Asset(
    href=str(Path('./labels.json').resolve()),
    media_type='application/json',
    roles=['labels', 'labels-classes']
)

Finally, we'll create the STAC Item and add the Assets to it. We can get the geometry information in the same way as the first exercise, and we will reuse the ID and datetime information from the first exercise to fill in some of the fields for this Item. We will also use the [Label Extension Item Properties](https://github.com/stac-extensions/label#item-properties) documentation to determine what property values we should include (since our labels are rasters, many of these will be `null`).

In [3]:
# Open the source imagery item so we can use some of the information
source_img_path = tmp_dir / 'S2A_34HCH_20171008_0_L2A_TCI_source.json'
source_img_item = pystac.Item.from_file(str(source_img_path))

# Labels Item ID will be the same as the source imagery Item ID, but with a "_labels" suffix
labels_id = source_img_item.id.rsplit('_source', 1)[0] + '_labels'

labels = '../data/labels.tif'
with rasterio.open(labels) as src:
    bounds_native = src.bounds
    native_crs = src.crs

# Construct the native geometry and transform into EPSG:4326
geom_native = {
    'type': 'Polygon',
    'coordinates': [
        [
            [bounds_native.left, bounds_native.top],
            [bounds_native.left, bounds_native.bottom],
            [bounds_native.right, bounds_native.bottom],
            [bounds_native.right, bounds_native.top],
            [bounds_native.left, bounds_native.top]
        ]
    ]
}

geom_4326 = rasterio.warp.transform_geom(
    native_crs,
    'EPSG:4326',
    geom_native,
)

# Get the bounds in EPSG:4326
bounds_4326 = rasterio.features.bounds(geom_4326)

# Create the Item
label_item = pystac.Item(
    id=labels_id,
    geometry=geom_4326,
    bbox=bounds_4326,
    datetime=source_img_item.datetime,
    stac_extensions=['label'],
    properties={
        'label:type': 'raster',
        'label:properties': None,
        'label:description': 'Crop type labels created from Sentinel-2 imagery.',
        'label:classes': [
            {'name': None, 'classes': [0, 1, 2, 3, 4, 5, 6]}
        ]
    }
)

# Add assets
label_item.add_asset("labels", label_asset)
label_item.add_asset("classes", classes_asset)

print(json.dumps(label_item.to_dict(), indent=4))

{
    "type": "Feature",
    "stac_version": "1.0.0-beta.2",
    "id": "S2A_34HCH_20171008_0_L2A_TCI_labels",
    "properties": {
        "label:type": "raster",
        "label:properties": null,
        "label:description": "Crop type labels created from Sentinel-2 imagery.",
        "label:classes": [
            {
                "name": null,
                "classes": [
                    0,
                    1,
                    2,
                    3,
                    4,
                    5,
                    6
                ]
            }
        ],
        "datetime": "2017-10-08T00:00:00Z"
    },
    "geometry": {
        "type": "Polygon",
        "coordinates": [
            [
                [
                    19.282368894925145,
                    -33.65648731266274
                ],
                [
                    19.280850294948134,
                    -33.73275673876073
                ],
                [
                    19.353685012706

### Link to Source Imagery

Next, we need to link our labels to the source imagery Item. We do this by adding a link with a `"rel"` type of `"source"`.

In [4]:
source_link = pystac.Link(
    rel="source",
    target=source_img_item,
    media_type='application/json',
    title="Source Imagery"
)
label_item.add_link(source_link)

print(json.dumps(label_item.to_dict()['links'], indent=4))

[
    {
        "rel": "source",
        "href": "/Users/jduckworth/Code/ml-hub/ml4eo-bootcamp-2021/Lecture 5/tmp/S2A_34HCH_20171008_0_L2A_TCI_source.json",
        "type": "application/json",
        "title": "Source Imagery"
    }
]


Notice that PySTAC has used an absolute path in our link based on the location of the source imagery Item. This is fine for now, we will turn this into a relative path in the next exercise so that the items are more portable.

### Save Item

Finally, we save the labels Item to the same temporary directory as our source imagery Item.

In [5]:
label_item_path = tmp_dir / f'{label_item.id}.json'
label_item.save_object(dest_href=label_item_path)