# Create Source Imagery STAC Item

<a style="display: inline-block;" href="https://mybinder.org/v2/gh/RadiantMLHub/ml4eo-bootcamp-2021/main?filepath=Lecture%205%2Fexercises%2F1_create_source_imagery_stac_item.ipynb"><img src="https://mybinder.org/badge_logo.svg" alt="Launch in Binder"/></a>

In this exercise, we will:

* Discuss STAC Items and what they represent
* Discuss STAC Assets and how they relate to Items
* Use the `pystac` package to create a STAC Item to represent our source imagery tile

## STAC Items

A **[STAC Item](https://github.com/radiantearth/stac-spec/tree/master/item-spec)** is the core, atomic unit in STAC. It represents a single spatio-temporal asset for a particular place and time. STAC Items are represented as [GeoJSON Features](https://tools.ietf.org/html/rfc7946#section-3.2) with additional [foreign members](https://tools.ietf.org/html/rfc7946#section-6) (properties) relevant to the STAC spec.

Among other things, all STAC objects have the following properties:

* `assets`: List of objects that describe actual files that can be downloaded (including links)
* `datetime`: A date-time associated with the resource
* `geometry`: Typical GeoJSON Geometry that describes the location of the resource
* `links`: List of links to other relevant STAC entities or external resources

The following is a simple example of a STAC Item (taken from the official STAC examples [here](https://github.com/radiantearth/stac-spec/blob/master/examples/simple-item.json)):

```json
{
  "stac_version": "1.0.0-rc.2",
  "stac_extensions": [],
  "type": "Feature",
  "id": "20201211_223832_CS2",
  "bbox": [172.91173669923782, 1.3438851951615003, 172.95469614953714, 1.3690476620161975],
  "geometry": {
    "type": "Polygon",
    "coordinates": [
      [
        [172.91173669923782, 1.3438851951615003],
        [172.95469614953714, 1.3438851951615003],
        [172.95469614953714, 1.3690476620161975],
        [172.91173669923782, 1.3690476620161975],
        [172.91173669923782, 1.3438851951615003]
      ]
    ]
  },
  "properties": {
    "datetime": "2020-12-11T22:38:32.125000Z"
  },
  "collection": "simple-collection",
  "links": [
    {
      "rel": "collection",
      "href": "./collection.json",
      "type": "application/json",
      "title": "Simple Example Collection"
    },
    {
      "rel": "root",
      "href": "./collection.json",
      "type": "application/json"
    }
  ],
  "assets": {
    "visual": {
      "href": "https://storage.googleapis.com/open-cogs/stac-examples/20201211_223832_CS2.tif",
      "type": "image/tiff; application=geotiff; profile=cloud-optimized",
      "title": "3-Band Visual",
      "roles": [
        "visual"
      ]
    },
    "thumbnail": {
      "href": "https://storage.googleapis.com/open-cogs/stac-examples/20201211_223832_CS2.jpg",
      "title": "Thumbnail",
      "type": "image/jpeg",
      "roles": [
        "thumbnail"
      ]
    }
  }
}
```


## Source Imagery Item

### Create Core Item

To get started, we will create a [`pystac.Item`](https://pystac.readthedocs.io/en/latest/api.html#item) for our source imagery with the just the core properties. Later, we will add some properties related to some of the extensions relevant to our data. Taking a look at the `pystac.Item` docs, we can see that we'll at least need to specify the `id`, `geometry`, `bbox`, and `datetime` arguments, as well as a `properties` dictionary.

First, let's import the libraries that we'll be using.

In [1]:
from datetime import datetime
import json
from pathlib import Path

import rasterio
import rasterio.warp
import rasterio.features
import pystac

Next, we get the `bbox` argument by reading the image tile using [`rasterio`](https://rasterio.readthedocs.io/en/latest/) and getting the image `bounds`. We can then use these `bounds` to construct a GeoJSON Geometry. This geometry will be in the native geometry of the image, but the STAC spec requires a geometry in EPSG:4326. We can use the `rasterio.warp` module to transform the geometry into the correct CRS.

In [2]:
source_img = Path('../data/tiles/S2A_34HCH_20171008_0_L2A_TCI.tif')

# Read the source image and save the bounds and CRS
with rasterio.open(source_img) as src:
    bounds_native = src.bounds
    native_crs = src.crs
  
# Create a GeoJSON Geometry from the image bounds
geom_native = {
    'type': 'Polygon',
    'coordinates': [
        [
            [bounds_native.left, bounds_native.top],
            [bounds_native.left, bounds_native.bottom],
            [bounds_native.right, bounds_native.bottom],
            [bounds_native.right, bounds_native.top],
            [bounds_native.left, bounds_native.top]
        ]
    ]
}

# Transform the GeoJSON Geometry into EPSG:4326
geom_4326 = rasterio.warp.transform_geom(
    native_crs,
    'EPSG:4326',
    geom_native,
)

# Get the bounds in EPSG:4326
bounds_4326 = rasterio.features.bounds(geom_4326)

print('Geometry (EPSG:4326):')
print(json.dumps(geom_4326, indent=4))
print('')
print('Bounds (EPSG:4326):')
print(bounds_4326)

Geometry (EPSG:4326):
{
    "type": "Polygon",
    "coordinates": [
        [
            [
                19.282309157846285,
                -33.65646563177132
            ],
            [
                19.28079050631379,
                -33.732735055742886
            ],
            [
                19.353625203525958,
                -33.733727878895834
            ],
            [
                19.35507958936285,
                -33.65745561206109
            ],
            [
                19.282309157846285,
                -33.65646563177132
            ]
        ]
    ]
}

Bounds (EPSG:4326):
(19.28079050631379, -33.733727878895834, 19.35507958936285, -33.65646563177132)


We will parse the file name to get the `datetime` property. We will also create a unique `id` based on the file name.

In [3]:
# The datetime is the 3rd part of the filename split on the underscore ("_") character
datetime_str = source_img.name.split('_', 3)[2]

# Create a naive datetime object by parsing this string
dt = datetime.strptime(datetime_str, '%Y%m%d')

# The Item ID will be the file name with a "_source" suffix
item_id = source_img.name.split('.')[0] + "_source"

Finally, we put this all together into a PySTAC Item.

In [4]:
item = pystac.Item(
    id=item_id,
    geometry=geom_4326,
    bbox=bounds_4326,
    datetime=dt,
    properties={
        'platform': 'Sentinel-2',
        'constellation': 'Sentinel-2'
    }
)
print(json.dumps(item.to_dict(), indent=4))

{
    "type": "Feature",
    "stac_version": "1.0.0-beta.2",
    "id": "S2A_34HCH_20171008_0_L2A_TCI_source",
    "properties": {
        "platform": "Sentinel-2",
        "constellation": "Sentinel-2",
        "datetime": "2017-10-08T00:00:00Z"
    },
    "geometry": {
        "type": "Polygon",
        "coordinates": [
            [
                [
                    19.282309157846285,
                    -33.65646563177132
                ],
                [
                    19.28079050631379,
                    -33.732735055742886
                ],
                [
                    19.353625203525958,
                    -33.733727878895834
                ],
                [
                    19.35507958936285,
                    -33.65745561206109
                ],
                [
                    19.282309157846285,
                    -33.65646563177132
                ]
            ]
        ]
    },
    "links": [],
    "assets": {},
    "bbox": [
    

### Source Imagery Asset

A **STAC Asset** represents a file associated with an Item (or Collection) that can be downloaded. Some common examples of assets include:

* Multi-band data images
* Low-resolution thumbnail images
* Sidecar metadata files
* Documentation files

Our source imagery item has only one asset: the `S2A_34HCH_20171008_0_L2A_TCI.tif` image tile. We will create a `pystac.Asset` for this tile and add it to the `pystac.Item` we just created.

The [`pystac.Asset` docs](https://pystac.readthedocs.io/en/latest/api.html#asset) tell us that we only need to define the `href` argument to create a new Asset. The `href` property represents the path to the asset, which can either be an absolute URL or a relative URL/path. Since we are only going to be working with our STAC Items locally, we will use a relative path. This path should be *relative to the STAC Item that contains the Asset* and not necessarily the current working directory. 

We will use an absolute path to our asset for now, but will turn this into a relative path in a later exercise when we create our Collection. The relative path will be more portable if we want to zip the entire collections and move it to a new location.

We will also define the `roles` and `type` arguments to make it more clear what this asset represents. The STAC Best Practices documentation has a nice section on ["Common Media Types in STAC"](https://github.com/radiantearth/stac-spec/blob/master/best-practices.md#common-media-types-in-stac) that we can use to select the right `type` argument. Likewise, we can use the ["List of Asset Roles"](https://github.com/radiantearth/stac-spec/blob/master/best-practices.md#list-of-asset-roles) section of the STAC Best Practices docs to select the right `roles`. In our case, we will use the `visual` role and the `image/tiff; application=geotiff` media type.

In [5]:
asset = pystac.Asset(
    href=str(source_img.resolve()),
    media_type='image/tiff; application=geotiff',
    roles=['visual']
)

Finally, we need to associate this Asset with the Item we created above. Since the `assets` property of a STAC Item is an object, we need to provide a key that should be used for this asset. The choice of key is up to the user, so we will just use `"visual"` to match our role.

In [6]:
item.add_asset("visual", asset)
print(json.dumps(item.to_dict(), indent=4))

{
    "type": "Feature",
    "stac_version": "1.0.0-beta.2",
    "id": "S2A_34HCH_20171008_0_L2A_TCI_source",
    "properties": {
        "platform": "Sentinel-2",
        "constellation": "Sentinel-2",
        "datetime": "2017-10-08T00:00:00Z"
    },
    "geometry": {
        "type": "Polygon",
        "coordinates": [
            [
                [
                    19.282309157846285,
                    -33.65646563177132
                ],
                [
                    19.28079050631379,
                    -33.732735055742886
                ],
                [
                    19.353625203525958,
                    -33.733727878895834
                ],
                [
                    19.35507958936285,
                    -33.65745561206109
                ],
                [
                    19.282309157846285,
                    -33.65646563177132
                ]
            ]
        ]
    },
    "links": [],
    "assets": {
        "visual": {


We can see that our Item now has the Asset that we just added.

### Save Item

Finally, we'll save our Item to a JSON file. We'll save this Item into a `tmp` folder because we will be making some changes to the links when we create a Collection in the 3rd exercise.

In [7]:
tmp_dir = source_img.parent.parent.parent / 'tmp'
tmp_dir.mkdir(exist_ok=True)

item_path = tmp_dir / f'{item.id}.json'
item.save_object(dest_href=item_path)