## Use `stac-geoparquet` to convert CMR STAC records to geoparquet
Based on https://stac-utils.github.io/stac-geoparquet/latest/examples/naip/

In [1]:
import json
from pathlib import Path

import pyarrow.parquet as pq
import pystac_client
import stac_geoparquet

In [2]:
catalog = pystac_client.Client.open(
    "https://cmr.earthdata.nasa.gov/stac/LPCLOUD/"
)

In [3]:
collection_shortname = "MCD19A2_061"
file_basename = collection_shortname + "_items"

In [4]:
max_items = 20
items_iter = catalog.get_collection(collection_shortname).get_items()

out_json_path = Path(file_basename + ".jsonl")
with open(out_json_path, "w") as f:
    count = 0

    for item in items_iter:
        json.dump(item.to_dict(), f, separators=(",", ":"))
        f.write("\n")

        count += 1
        if count >= max_items:
            break

/python3.12/site-packages/pystac_client/client.py:650: MissingLink: No link with rel='data' could be found on this Client.
  href = self._get_href("data", data_link, "collections")


### Convert from STAC JSON to Arrow

In [5]:
record_batch_reader = stac_geoparquet.arrow.parse_stac_ndjson_to_arrow(out_json_path)

In [6]:
table = record_batch_reader.read_all()

In [7]:
print("Arrow table shape:   ", table.shape)
print("Arrow table columns: \n  ", str(table.column_names).replace(',', ',\n  '))

Arrow table shape:    (20, 13)
Arrow table columns: 
   ['assets',
   'bbox',
   'collection',
   'geometry',
   'id',
   'links',
   'stac_extensions',
   'stac_version',
   'type',
   'datetime',
   'end_datetime',
   'eo:cloud_cover',
   'start_datetime']


### Convert from Arrow to Geoparquet

In [8]:
out_parquet_path = file_basename + ".parquet"
stac_geoparquet.arrow.parse_stac_ndjson_to_parquet(out_json_path, out_parquet_path)

### Read it back in from Parquet format

In [9]:
pq_table = pq.read_table(out_parquet_path)
type(pq_table)

pyarrow.lib.Table

In [10]:
print("Parquet table shape:   ", pq_table.shape)
print("Parquet table columns: \n  ", str(pq_table.column_names).replace(',', ',\n  '))

Parquet table shape:    (20, 13)
Parquet table columns: 
   ['assets',
   'bbox',
   'collection',
   'geometry',
   'id',
   'links',
   'stac_extensions',
   'stac_version',
   'type',
   'datetime',
   'end_datetime',
   'eo:cloud_cover',
   'start_datetime']


Verify compliance with `gpq`

In [11]:
%%bash -s "$out_parquet_path"
gpq validate $1


Summary: Passed 20 checks.

 ✓ file must include a "geo" metadata key
 ✓ metadata must be a JSON object
 ✓ metadata must include a "version" string
 ✓ metadata must include a "primary_column" string
 ✓ metadata must include a "columns" object
 ✓ column metadata must include the "primary_column" name
 ✓ column metadata must include a valid "encoding" string
 ✓ column metadata must include a "geometry_types" list
 ✓ optional "crs" must be null or a PROJJSON object
 ✓ optional "orientation" must be a valid string
 ✓ optional "edges" must be a valid string
 ✓ optional "bbox" must be an array of 4 or 6 numbers
 ✓ optional "epoch" must be a number
 ✓ geometry columns must not be grouped
 ✓ geometry columns must be stored using the BYTE_ARRAY parquet type
 ✓ geometry columns must be required or optional, not repeated
 ✓ all geometry values match the "encoding" metadata
 ✓ all geometry types must be included in the "geometry_types" metadata (if not empty)
 ✓ all polygon geometries must follow

## Quick visualization of granule geometry

`Lonboard` can visualize geoparquet files directly, but it so happens that these first granules span the antimeridian, so their plots are a mess.

In [12]:
import lonboard
lonboard.viz(pq_table)

Map(basemap_style=<CartoBasemap.DarkMatter: 'https://basemaps.cartocdn.com/gl/dark-matter-gl-style/style.json'…

### Antimeridian example fix
This following is a quick fix for the first polygon of geoparquet table. This is only necessary for visualization: the antimeridian crossing observed in this example data does not indicate any corruption introduced by format conversion.

In [13]:
from shapely import wkb, wkt
import antimeridian
granule_shape_0 = wkb.loads(pq_table.slice(0,1)['geometry'][0].as_py())
granule_shape_0_fixed = antimeridian.fix_shape(granule_shape_0)
lonboard.viz(granule_shape_0_fixed)

  warn(


Map(basemap_style=<CartoBasemap.DarkMatter: 'https://basemaps.cartocdn.com/gl/dark-matter-gl-style/style.json'…