# Create a geoparquet with STAC Items

Goal: 

* Read STAC Items
* Create an iterable of STAC Items
* Create an Apache Arrow record batch reader
* Create an Apache Arrow table
* Serialize to geoparquet 

## Configure the environment

Create a Python environment with:

* stac-geoparquet
* pystac
* ipykernel

## Imports

In [1]:
import stac_geoparquet
from stac_geoparquet.arrow import parse_stac_items_to_arrow, to_parquet
from pystac import read_file
from pyarrow.parquet import read_table

## Create an iterable of STAC Items

Use `pystac.readfile` to read the local STAC Item files

See [https://pystac.readthedocs.io/en/stable/api/pystac.html#pystac.read_file](https://pystac.readthedocs.io/en/stable/api/pystac.html#pystac.read_file)

In [2]:
item_paths = [
    "data-files/items/S2A_10TFK_20220524_0_L2A.json",
    "data-files/items/S2B_10TFK_20210713_0_L2A.json",
]

In [3]:
items_iterable = [read_file(item_path) for item_path in item_paths]

items_iterable

[<Item id=S2A_10TFK_20220524_0_L2A>, <Item id=S2B_10TFK_20210713_0_L2A>]

## Create a record batch reader

Use `stac_geoparquet.arrow.parse_stac_items_to_arrow` to create an Apache Arrow record batch reader

See:

* `stac_geoparquet.arrow.parse_stac_items_to_arrow` [https://stac-utils.github.io/stac-geoparquet/latest/api/arrow/#stac_geoparquet.arrow.parse_stac_items_to_arrow](https://stac-utils.github.io/stac-geoparquet/latest/api/arrow/#stac_geoparquet.arrow.parse_stac_items_to_arrow)
* `RecordBatchReader` [https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatchReader.html#pyarrow.RecordBatchReader](https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatchReader.html#pyarrow.RecordBatchReader)

In [4]:
record_batch_reader = parse_stac_items_to_arrow(items_iterable)

## Create a `pyarrow.lib.Table`

See  [https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow-table](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow-table)

In [5]:
table = record_batch_reader.read_all()

table.schema

assets: struct<AOT: struct<href: string, proj:shape: list<item: int64>, proj:transform: list<item: int64>, roles: list<item: string>, title: string, type: string>, B01: struct<eo:bands: list<item: struct<center_wavelength: double, common_name: string, full_width_half_max: double, name: string>>, gsd: int64, href: string, proj:shape: list<item: int64>, proj:transform: list<item: int64>, roles: list<item: string>, title: string, type: string>, B02: struct<eo:bands: list<item: struct<center_wavelength: double, common_name: string, full_width_half_max: double, name: string>>, gsd: int64, href: string, proj:shape: list<item: int64>, proj:transform: list<item: int64>, roles: list<item: string>, title: string, type: string>, B03: struct<eo:bands: list<item: struct<center_wavelength: double, common_name: string, full_width_half_max: double, name: string>>, gsd: int64, href: string, proj:shape: list<item: int64>, proj:transform: list<item: int64>, roles: list<item: string>, title: string, type:

Inspect the first row

In [6]:
table[0]

<pyarrow.lib.ChunkedArray object at 0x7f2c72cb75e0>
[
  -- is_valid: all not null
  -- child 0 type: struct<href: string, proj:shape: list<item: int64>, proj:transform: list<item: int64>, roles: list<item: string>, title: string, type: string>
    -- is_valid: all not null
    -- child 0 type: string
      [
        "https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/10/T/FK/2022/5/S2A_10TFK_20220524_0_L2A/AOT.tif",
        "https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/10/T/FK/2021/7/S2B_10TFK_20210713_0_L2A/AOT.tif"
      ]
    -- child 1 type: list<item: int64>
      [
        [
          1830,
          1830
        ],
        [
          1830,
          1830
        ]
      ]
    -- child 2 type: list<item: int64>
      [
        [
          60,
          0,
          600000,
          0,
          -60,
          4500000,
          0,
          0,
          1
        ],
        [
          60,
          0,
          600000,
          0,
 

## Serialize as a geoparquet

Use `stac_geoparquet.arrow.to_parquet` to serialize as geoparquet.

See [https://stac-utils.github.io/stac-geoparquet/latest/api/arrow/#stac_geoparquet.arrow.to_parquet](https://stac-utils.github.io/stac-geoparquet/latest/api/arrow/#stac_geoparquet.arrow.to_parquet)

In [7]:
s2_parquet_path = "s2.parquet"
to_parquet(table, s2_parquet_path)

## Verify serialized geoparquet

Use `pyarrow.parquet.read_table` from pyarrow.parquet, see [https://arrow.apache.org/docs/python/generated/pyarrow.parquet.read_table.html#pyarrow-parquet-read-table](https://arrow.apache.org/docs/python/generated/pyarrow.parquet.read_table.html#pyarrow-parquet-read-table)


In [8]:
read_table(s2_parquet_path) == table

True