# STAC Ingestion

## Ingest STAC via STAC API

This section explores using the STAC API interfaces to ingest STAC Items using endpoints that implement the ["transactions" STAC API extension](https://github.com/stac-api-extensions/transaction).

This section presumes you've started the notebook with a few environment variables defined for configuration settings,

* `STAC_API_ROOT` is the URL to the root of your STAC API
* Basic authorization details (optional)
    * `STAC_API_USERNAME` is a user with permission to interact with the transaction endpoints
    * `STAC_API_PASSWORD` is the password for the username

In [2]:
import json
import os
from itertools import batched
from pathlib import Path
from urllib.parse import urljoin

import pystac
import requests
import rustac

STAC_API_ROOT = os.environ["STAC_API_ROOT"]
if not STAC_API_ROOT.endswith("/"):
    STAC_API_ROOT += "/"

session = requests.Session()

if (username := os.getenv("STAC_API_USERNAME")) and (
    password := os.getenv("STAC_API_PASSWORD")
):
    session.auth = (username, password)

First, we need to ensure that our STAC Collection has been created.

In [3]:
collections = [pystac.read_file(path) for path in Path(".").glob("*collection.json")]
collections

[<Collection id=openaerialmap>, <Collection id=maxar-opendata>]

In [4]:
for collection in collections:
    r = session.post(urljoin(STAC_API_ROOT, "collections"), json=collection.to_dict())
    r.raise_for_status()
    print(f"Created {collection.id}")

Created openaerialmap
Created maxar-opendata


To help handle cases where the same STAC Item has already been added to our STAC Catalog, we're going to use the "bulk items" endpoint to easily "upsert" (insert or update) our STAC Items.

In [17]:
# this may need to be tuned depending on the API's maximum supported request size
ITEM_CHUNKSIZE = 15

In [18]:
for collection in collections:
    item_paths = sorted(Path(".").glob(f"{collection.id}*.parquet"))
    for item_path in item_paths:
        print(f"Adding {item_path} to STAC Catalog")
        item_collection = await rustac.read(str(item_path))
        for items in batched(item_collection["features"], ITEM_CHUNKSIZE):
            items_by_id = {item["id"]: item for item in items}
            r = session.post(
                urljoin(STAC_API_ROOT, f"collections/{collection.id}/bulk_items"),
                json={"method": "upsert", "items": items_by_id},
            )
            r.raise_for_status()

Adding openaerialmap-20250512T180412.parquet to STAC Catalog
Adding maxar-opendata-20250513T092240.parquet to STAC Catalog


## Ingest STAC Into PgSTAC via pypgstac

This section is an example of using `pypgstac` tools to load STAC Collections and Items produced from the other example notebooks into PgSTAC. Once in PgSTAC these data will be available through the STAC API.

This notebook assumes you have defined some environment variables for the PgSTAC database connection. If they are not defined this notebook uses defaults from the [eoAPI](https://github.com/developmentseed/eoapi) Docker Compose setup.

The environment variables expected are the ["standard PG environment variables"](https://github.com/stac-utils/pgstac/blob/v0.9.6/docs/src/pypgstac.md?plain=1#L42-L48) used by PgSTAC,

* `PGHOST`
* `PGPORT`
* `PGUSER`
* `PGDATABASE`
* `PGPASSWORD`

In [1]:
import os
from pathlib import Path
from typing import Iterator

from pypgstac.db import PgstacDB
from pypgstac.load import Loader, Methods

In [2]:
dsn = "postgresql://{username}:{password}@{host}:{port}/{database}".format(
    username=os.getenv("PGUSER", "username"),
    password=os.getenv("PGPASSWORD", "password"),
    host=os.getenv("PGHOST", "localhost"),
    port=os.getenv("PGPORT", 5439),
    database=os.getenv("PGDATABASE", "postgis"),
)

In [3]:
pgstac = PgstacDB(dsn)
loader = Loader(pgstac)

In [None]:
def read_items_from_ndjson(
    ndjson: Path,
    collection_id: str,
) -> Iterator[dict]:
    """Read STAC Items from a NDJSON file for a particular Collection."""
    with ndjson.open() as src:
        for line in src:
            item = json.loads(line)
            item["collection"] = collection_id
            yield item

## OpenAerialMap Collection

In [5]:
loader.load_collections("openaerialmap-collection.json", insert_mode=Methods.upsert)

In [27]:
oam_ndjson_files = sorted(Path(".").glob("openaerialmap-*.ndjson"))

for oam_ndjson in oam_ndjson_files:
    print(f"Loading Items from {oam_ndjson}")
    loader.load_items(
        read_items_from_ndjson(oam_ndjson, "openaerialmap"),
        insert_mode=Methods.insert_ignore,
    )

Loading Items from openaerialmap-20250501T142120.ndjson
Loading Items from openaerialmap-20250506T175956.ndjson
Loading Items from openaerialmap-20250506T181630.ndjson


## Maxar for OAM Collection

In [7]:
loader.load_collections("maxar-opendata-collection.json", insert_mode=Methods.upsert)

In [28]:
maxar_ndjson_files = sorted(Path(".").glob("maxar-*.ndjson"))

for maxar_ndjson in maxar_ndjson_files:
    print(f"Loading Items from {maxar_ndjson}")
    loader.load_items(
        read_items_from_ndjson(maxar_ndjson, "maxar-opendata"),
        insert_mode=Methods.insert_ignore,
    )

Loading Items from maxar-opendata-20250505T085548.ndjson
