Skip to content

Support pluggable authentication providers for REST Namespace (e.g. AWS SigV4) #6583

@shiwk

Description

@shiwk

(Note: Migrated from lance-format/lance-namespace#327 for better visibility.)

Motivation

Cloud-managed catalog services are becoming the norm. AWS S3 Tables provides an Iceberg-compatible catalog directly from the storage layer. GCP BigLake offers similar capabilities. These services use their native IAM authentication (e.g. AWS SigV4) rather than OAuth2 or static Bearer tokens.

For the Lance REST Namespace to integrate with these services — or any catalog service fronted by cloud API gateways with IAM auth — it needs to support request-signing authentication at the protocol level. This is essentially an enrichment of the Credential Vending mechanism in Lance Namespace: beyond vending storage credentials for data access, the namespace itself needs flexible authentication when talking to catalog endpoints.

This was discussed in lance-format/lance-namespace#10 where the conclusion was that REST only needs OAuth2 since native implementations handle other auth flows. I believe this deserves reconsideration given how the ecosystem has evolved — expecting every cloud catalog deployment to set up an OAuth2 token exchange layer creates unnecessary friction.

Current State

The OpenAPI spec (docs/src/rest.yaml) defines three securitySchemes: OAuth2 (client credentials), Bearer Token, and API Key. No scheme exists for request-signing mechanisms. The existing client implementations (Rust, Java, Python) all mirror this limitation — none of them support request-level signing.

While native implementations (lance-namespace-impls) can work around this by talking to cloud services directly (e.g. the Glue implementation uses AWS SDK with IAM credentials), relying on native implementations as the sole solution for cloud auth has several drawbacks:

  • Misplaced responsibility: Catalog service providers should focus on catalog semantics, not on reimplementing the full Lance Namespace interface. If a cloud catalog already speaks the REST protocol, requiring a dedicated native implementation just for auth is unnecessary overhead.
  • Community fragmentation: As more cloud vendors offer catalog services, each requiring its own native implementation, the ecosystem risks becoming fragmented with many parallel implementations that are hard to maintain and keep consistent.
  • Undermines the value of REST as a standard: The REST Namespace is designed to be a universal interop layer. If auth limitations push users toward vendor-specific native implementations even when the catalog already speaks REST, it effectively encourages the community to bypass the REST standard.
  • Duplicated effort: Each native implementation must reimplement protocol handling, error mapping, pagination, and other concerns that the REST client already handles well. The only missing piece is authentication.

Proposal

Introduce a pluggable, registry-based authentication provider mechanism for the REST Namespace. Rather than hardcoding specific auth schemes, allow users to register an auth provider that can intercept and mutate outgoing HTTP requests.

Key design points:

  • Pluggable architecture: Define an auth provider interface (e.g. a RequestSigner or AuthProvider trait/interface) that has access to the full HTTP request and can sign or modify it before execution. Users and cloud vendors can implement this interface for their specific auth mechanism.
  • Registry-based: Auth providers should be registerable by name (e.g. sigv4, gcp-iam), similar to how namespace implementations are registered via ConnectBuilder. Configuration would drive which provider is activated and with what parameters.
  • Spec-level extensibility: The OpenAPI spec should acknowledge that the REST Namespace security model is extensible beyond the three built-in schemes, potentially via a convention for custom security schemes.
  • Built-in providers for major clouds: Ship first-party providers for widely-used mechanisms (starting with AWS SigV4) behind feature flags, so common use cases work out of the box.

Use Cases

  • AWS S3 Tables: S3 Tables exposes an Iceberg REST catalog interface authenticated via SigV4. Although it does not yet support the Lance format, it represents the direction cloud-managed catalogs are heading. A Lance REST Namespace client with SigV4 support could connect to services like this directly without an intermediate auth proxy.
  • AWS API Gateway + IAM Auth: Catalog services deployed behind API Gateway using IAM authorization require SigV4-signed requests from clients.
  • GCP BigLake / Managed Catalogs: GCP-managed catalog services that rely on Google IAM and signed requests.
  • General cloud-native deployments: Any environment where IAM-based auth is already the standard and adding OAuth2 infrastructure is unnecessary overhead.

References


I'm happy to discuss this further and contribute to the design and implementation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions