opengeospatial · cholmes · Mar 25, 2024 · Nov 17, 2023 · Nov 17, 2023 · Jan 30, 2024
diff --git a/examples/example_metadata_geoarrow.json b/examples/example_metadata_geoarrow.json
@@ -0,0 +1,15 @@
+{
+  "geo": {
+    "columns": {
+      "geometry": {
+        "encoding": "geoarrow",
+        "geoarrow_type": "geoarrow.point",
+        "geometry_types": [
+          "Point"
+        ]
+      }
+    },
+    "primary_column": "geometry",
+    "version": "1.1.0-dev"
+  }
+}
diff --git a/format-specs/compatible-parquet.md b/format-specs/compatible-parquet.md
@@ -12,7 +12,7 @@ The core idea of the compatibility guidelines is to have the output match the de
 
 * The geometry column should be named either `"geometry"` or `"geography"`.
 
-* The geometry column should be a `BYTE_ARRAY` with Well Known Binary (WKB) used to define the geometries, as defined in the [encoding](./geoparquet.md#encoding) section of the GeoParquet spec.
+* The geometry column should be a `BYTE_ARRAY` with Well Known Binary (WKB) used to define the geometries, as defined in the [encoding](./geoparquet.md#encoding) section of the GeoParquet spec. Alternatively, the geometry column can be stored according to the Point, MultiPoint, MultiLineString, or MultiPolygon memory layouts with separated (struct) coordinates as specified in the [GeoArrow format](https://geoarrow.org/format).
 
 * All data is stored in longitude, latitude based on the WGS84 datum, as defined as the default in the [crs](./geoparquet.md#crs) section of the GeoParquet spec.
 

diff --git a/format-specs/geoparquet.md b/format-specs/geoparquet.md
@@ -12,11 +12,7 @@ This is version 1.1.0-dev of the GeoParquet specification.  See the [JSON Schema
 
 ## Geometry columns
 
-Geometry columns MUST be stored using the `BYTE_ARRAY` parquet type. They MUST be encoded as [WKB](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry#Well-known_binary).
-
-Implementation note: when using the ecosystem of Arrow libraries, Parquet types such as `BYTE_ARRAY` might not be directly accessible. Instead, the corresponding Arrow data type can be `Arrow::Type::BINARY` (for arrays that whose elements can be indexed through a 32-bit index) or `Arrow::Type::LARGE_BINARY` (64-bit index). It is recommended that GeoParquet readers are compatible with both data types, and writers preferably use `Arrow::Type::BINARY` (thus limiting to row groups with content smaller than 2 GB) for larger compatibility.
-
-See the [encoding](#encoding) section below for more details.
+Geometry columns MUST be encoded as [WKB](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry#Well-known_binary) or [GeoArrow](https://geoarrow.org/). See the [encoding](#encoding) section below for more details.
 
 ### Nesting
 
@@ -51,13 +47,14 @@ Each geometry column in the dataset MUST be included in the `columns` field abov
 
 | Field Name     | Type         | Description |
 | -------------- | ------------ | ----------- |
-| encoding       | string       | **REQUIRED.** Name of the geometry encoding format. Currently only `"WKB"` is supported. |
+| encoding       | string       | **REQUIRED.** Name of the geometry encoding format. Currently `"WKB"` and `"geoarrow"` are supported. |
 | geometry_types | \[string]    | **REQUIRED.** The geometry types of all geometries, or an empty array if they are not known. |
 | crs            | object\|null | [PROJJSON](https://proj.org/specifications/projjson.html) object representing the Coordinate Reference System (CRS) of the geometry. If the field is not provided, the default CRS is [OGC:CRS84](https://www.opengis.net/def/crs/OGC/1.3/CRS84), which means the data in this column must be stored in longitude, latitude based on the WGS84 datum. |
 | orientation    | string       | Winding order of exterior ring of polygons. If present must be `"counterclockwise"`; interior rings are wound in opposite order. If absent, no assertions are made regarding the winding order. |
 | edges          | string       | Name of the coordinate system for the edges. Must be one of `"planar"` or `"spherical"`. The default value is `"planar"`. |
 | bbox           | \[number]    | Bounding Box of the geometries in the file, formatted according to [RFC 7946, section 5](https://tools.ietf.org/html/rfc7946#section-5). |
 | epoch          | number       | Coordinate epoch in case of a dynamic CRS, expressed as a decimal year. |
+| geoarrow_type  | string       | The [GeoArrow extension name](https://geoarrow.org/extension-types#extension-names) corresponding to the column's memory layout. This is required when `encoding` is `"geoarrow"` and must be omitted otherwise. |
 
 #### crs
 
@@ -83,10 +80,18 @@ The optional `epoch` field allows to specify this in case the `crs` field define
 
 #### encoding
 
-This is the binary format that the geometry is encoded in. The string `"WKB"`, signifying Well Known Binary is the only current option, but future versions of the spec may support alternative encodings. This SHOULD be the ["OpenGIS® Implementation Specification for Geographic information - Simple feature access - Part 1: Common architecture"](https://portal.ogc.org/files/?artifact_id=18241) WKB representation (using codes for 3D geometry types in the \[1001,1007\] range). This encoding is also consistent with the one defined in the ["ISO/IEC 13249-3:2016 (Information technology - Database languages - SQL multimedia and application packages - Part 3: Spatial)"](https://www.iso.org/standard/60343.html) standard.
+This is the memory layout used to encode geometries in the geometry column.
+
+The preferred option for maximum portability is `"WKB"`, signifying Well Known Binary. This SHOULD be the ["OpenGIS® Implementation Specification for Geographic information - Simple feature access - Part 1: Common architecture"](https://portal.ogc.org/files/?artifact_id=18241) WKB representation (using codes for 3D geometry types in the \[1001,1007\] range). This encoding is also consistent with the one defined in the ["ISO/IEC 13249-3:2016 (Information technology - Database languages - SQL multimedia and application packages - Part 3: Spatial)"](https://www.iso.org/standard/60343.html) standard.
 
 Note that the current version of the spec only allows for a subset of WKB: 2D or 3D geometries of the standard geometry types (the Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection geometry types). This means that M values or non-linear geometry types are not yet supported.
 
+Using the `"geoarrow"` encoding may provide better performance and enable readers to leverage more features of the Parquet format to accelerate geospatial queries (e.g., row group-level min/max statistics). When `encoding` is set to `"geoarrow"`, the column metadata must also specify `geoarrow_type` according to the [GeoArrow metadata specification for extension names](https://geoarrow.org/extension-types#extension-names) to signify the memory layout used by the geometry column.
+
+Note that the current version of the spec only allows for a subset of GeoArrow: separated (struct) coordinates are required, only 2D or 3D geometries are permitted, and supported extension are currently `"geoarrow.point"`, `"geoarrow.linestring"`, `"geoarrow.polygon"`, `"geoarrow.multipoint"`, `"geoarrow.multilinestring"`, and `"geoarrow.multipolygon"`. This means that M values and serialized encodings are not yet supported.
+
+Implementation note: when using WKB encoding with the ecosystem of Arrow libraries, Parquet types such as `BYTE_ARRAY` might not be directly accessible. Instead, the corresponding Arrow data type can be `Arrow::Type::BINARY` (for arrays that whose elements can be indexed through a 32-bit index) or `Arrow::Type::LARGE_BINARY` (64-bit index). It is recommended that GeoParquet readers are compatible with both data types, and writers preferably use `Arrow::Type::BINARY` (thus limiting to row groups with content smaller than 2 GB) for larger compatibility.
+
 #### Coordinate axis order
 
 The axis order of the coordinates in WKB stored in a GeoParquet follows the de facto standard for axis order in WKB and is therefore always (x, y) where x is easting or longitude and y is northing or latitude. This ordering explicitly overrides the axis order as specified in the CRS. This follows the precedent of [GeoPackage](https://geopackage.org), see the [note in their spec](https://www.geopackage.org/spec130/#gpb_spec).

diff --git a/format-specs/schema.json b/format-specs/schema.json
@@ -23,7 +23,7 @@
           "properties": {
             "encoding": {
               "type": "string",
-              "const": "WKB"
+              "pattern": "^(WKB|geoarrow)$"
             },
             "geometry_types": {
               "type": "array",
@@ -71,6 +71,10 @@
             },
             "epoch": {
               "type": "number"
+            },
+            "geoarrow_type": {
+              "type": "string",
+              "pattern": "^geoarrow\\.(point|linestring|polygon|multipoint|multilinestring|multipolygon)$"
             }
           }
         }