Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
114 changes: 100 additions & 14 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@
# Contributing to Lance Namespace

The Lance Namespace codebase is at [lance-format/lance-namespace](https://github.com/lance-format/lance-namespace).
This codebase contains code of the Lance Namespace specification
as well as generated clients and servers using OpenAPI generator.
This codebase contains:

This project should only be used to make spec changes to Lance Namespace,
- The Lance Namespace specification
- The core `LanceNamespace` interface and generic connect functionality for all languages except Rust
(for Rust, these are located in the [lance-format/lance](https://github.com/lance-format/lance) repo)
- Generated clients and servers using OpenAPI generator

This project should only be used to make spec and interface changes to Lance Namespace,
or to add new clients and servers to be generated based on community demand.
In general, we welcome more generated components to be added as long as
the contributor is willing to set up all the automations for generation and publication.
Expand All @@ -15,17 +19,94 @@ For contributing changes to implementations other than the directory and REST na
or for adding new namespace implementations,
please go to the [lance-namespace-impls](https://github.com/lance-format/lance-namespace-impls) repo.

## Project Dependency

This project contains the core Lance Namespace specification, interface and generated modules across all languages.
The dependency structure varies by language due to different build and distribution models.

### Rust

For Rust, the interface module `lance-namespace` and implementations (`lance-namespace-impls` for REST and directory namespaces)
are located in the core [lance-format/lance](https://github.com/lance-format/lance) repository.
This is because Rust uses source code builds, and separating modules across repositories makes dependency management complicated.

The dependency chain is: `lance-namespace` → `lance` → `lance-namespace-impls`

### Other Languages (e.g. Python, Java)

For Python, Java, and other languages, the core `LanceNamespace` interface and generic connect functionality
are maintained in **this repository** (e.g., `lance-namespace` for Python, `lance-namespace-core` for Java).
The core [lance-format/lance](https://github.com/lance-format/lance) repository then imports these modules.

The reason for this import direction is that `lance-namespace-impls` (REST and directory namespace implementations)
are used in the Lance Python and Java bindings, and are exposed back through the corresponding language interfaces.
These language interfaces can also be imported dynamically without the need to have a dependency of the Lance core library bindings in those languages.

### Other Implementations

For namespace implementations other than directory and REST namespaces,
those are stored in the [lance-format/lance-namespace-impls](https://github.com/lance-format/lance-namespace-impls) repository,
with one implementation per language.

### Dependency Diagram

```mermaid
flowchart TB
subgraph this_repo["lance-namespace repo"]
spec["Spec & Generated Clients"]
py_core["Python: lance-namespace"]
java_core["Java: lance-namespace-core"]
end

subgraph lance_repo["lance repo"]
subgraph rust_modules["Rust Modules"]
rs_ns["lance-namespace"]
rs_lance["lance"]
rs_impls["lance-namespace-impls<br/>(dir, rest)"]
end
py_lance["Python: lance"]
java_lance["Java: lance"]
end

subgraph impls_repo["namespace-impls repo"]
polaris["Apache Polaris"] ~~~ hive["Apache Hive"] ~~~ iceberg_rest["Apache Iceberg REST"] ~~~ unity["Unity Catalog"] ~~~ glue["AWS Glue"]
end

%% Rust dependencies (source build)
rs_ns --> rs_lance
rs_lance --> rs_impls

%% Python/Java dependencies
py_core --> py_lance
java_core --> java_lance
rs_impls -.-> py_lance
rs_impls -.-> java_lance

%% Other implementations depend on core interfaces and lance bindings
py_core -.-> impls_repo
java_core -.-> impls_repo
py_lance -.-> impls_repo
java_lance -.-> impls_repo

style this_repo fill:#1565c0,color:#fff
style lance_repo fill:#e65100,color:#fff
style impls_repo fill:#7b1fa2,color:#fff
style rust_modules fill:#ff8a65,color:#000
```

## Repository structure

This repository currently contains the following components:

| Component | Language | Path | Description |
|----------------------|----------|----------------------------------------|------------------------------------------------------------|
| spec | | docs/src/spec | Lance Namespace Specification |
| Rust Reqwest Client | Rust | rust/lance-namespace-reqwest-client | Generated Rust reqwest client for Lance REST Namespace |
| Python UrlLib3 Client| Python | python/lance_namespace_urllib3_client | Generated Python urllib3 client for Lance REST Namespace |
| Java Apache Client | Java | java/lance-namespace-apache-client | Generated Java Apache HTTP client for Lance REST Namespace |
| Java Springboot Server| Java | java/lance-namespace-springboot-server | Generated Java SpringBoot server for Lance REST Namespace |
| Component | Language | Path | Description |
|-----------------------|----------|----------------------------------------|------------------------------------------------------------|
| Spec | | docs/src | Lance Namespace Specification |
| Python Core | Python | python/lance_namespace | Core LanceNamespace interface and connect functionality |
| Python UrlLib3 Client | Python | python/lance_namespace_urllib3_client | Generated Python urllib3 client for Lance REST Namespace |
| Java Core | Java | java/lance-namespace-core | Core LanceNamespace interface and connect functionality |
| Java Apache Client | Java | java/lance-namespace-apache-client | Generated Java Apache HTTP client for Lance REST Namespace |
| Java SpringBoot Server| Java | java/lance-namespace-springboot-server | Generated Java SpringBoot server for Lance REST Namespace |
| Rust Reqwest Client | Rust | rust/lance-namespace-reqwest-client | Generated Rust reqwest client for Lance REST Namespace |


## Install uv
Expand Down Expand Up @@ -74,15 +155,20 @@ Start the server with:
make serve-docs
```

### Generated Doc from OpenAPI Spec
### Generated Model Documentation

The OpenAPI spec at `docs/src/rest.yaml` is digested and generated as Markdown documents for better readability.
Generate the latest documents with:
The operation request and response model documentation is generated from the Java Apache Client.
When building or serving docs, the Java client must be generated first to produce the model Markdown files,
which are then copied to `docs/src/operations/models/`.

This happens automatically when running:

```shell
make gen-docs
make build-docs # or make serve-docs
```

These commands depend on `gen-java` to ensure the Java client docs are up-to-date before building the documentation.

### Understanding the Build Process

The contents in `lance-namespace/docs` are for the ease of contributors to edit and preview.
Expand Down
10 changes: 3 additions & 7 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -50,16 +50,12 @@ gen-java:
build-java:
cd java; make build

.PHONY: sync gen-docs
gen-docs:
cd docs; make gen

.PHONY: build-docs
build-docs:
build-docs: gen-java
cd docs; make build

.PHONY: serve-docs
serve-docs:
serve-docs: gen-java
cd docs; make serve

.PHONY: sync
Expand All @@ -70,7 +66,7 @@ sync:
clean: clean-rust clean-python clean-java

.PHONY: gen
gen: lint gen-docs gen-rust gen-python gen-java
gen: lint gen-rust gen-python gen-java

.PHONY: build
build: lint build-docs build-rust build-python build-java
22 changes: 17 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,22 @@
# Lance Namespace

**Lance Namespace** is an open specification on top of the storage-based Lance data format
to standardize access to a collection of Lance tables (a.k.a. Lance datasets).
It describes how a metadata service like Apache Hive MetaStore (HMS),
Apache Gravitino, Unity Catalog, etc. should store and use Lance tables,
as well as how ML/AI tools and analytics compute engines should integrate with Lance tables.
**Lance Namespace** is an open specification for describing access and operations against a collection of tables in a multimodal lakehouse.
The spec provides a unified model for table-related objects, their relationships within a hierarchy,
and the operations available on these objects — enabling integration with metadata services and compute engines alike.

The Lance Namespace spec consists of three main parts:

1. **Client-Side Standardized Access Spec**: A consistent abstraction that adapts to various catalog specifications
(e.g. Apache Gravitino, Apache Polaris, Unity Catalog, Apache Hive Metastore, Apache Iceberg REST Catalog),
allowing users to choose any catalog to store and use tables.

2. **Directory Namespace Spec**: A natively maintained storage-only catalog spec that is compliant with the
Lance Namespace client-side access spec. It requires no external metadata service — tables are organized directly
on storage (local filesystem, S3, GCS, etc.) with metadata stored alongside the data.

3. **REST Namespace Spec**: A natively maintained REST-based catalog spec that is compliant with the Lance
Namespace client-side access spec. It is suitable for teams that want to develop their own custom handling,
ideal for adoption by data infrastructure teams in enterprise environments with high customization requirements.

For more details, please visit the [documentation website](https://lance.org/format/namespace).

Expand Down
34 changes: 29 additions & 5 deletions docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,38 @@
# See the License for the specific language governing permissions and
# limitations under the License.

.PHONY: gen
gen:
cd src; uv run update_line_numbers.py
# Java model docs source and destination
JAVA_DOCS_SRC := ../java/lance-namespace-apache-client/docs
MODELS_DEST := src/operations/models

# API files to exclude (Java-specific, not data models)
API_FILES := DataApi.md IndexApi.md MetadataApi.md NamespaceApi.md TableApi.md TagApi.md TransactionApi.md

.PHONY: gen-models
gen-models:
@echo "Copying model docs from Java generated docs..."
@rm -rf $(MODELS_DEST)
@mkdir -p $(MODELS_DEST)
@for f in $(JAVA_DOCS_SRC)/*.md; do \
filename=$$(basename "$$f"); \
skip=false; \
for api in $(API_FILES); do \
if [ "$$filename" = "$$api" ]; then \
skip=true; \
break; \
fi; \
done; \
if [ "$$skip" = "false" ]; then \
cp "$$f" $(MODELS_DEST)/; \
fi; \
done
@echo "title: Models" > $(MODELS_DEST)/.pages
@echo "Model docs copied to $(MODELS_DEST)"

.PHONY: build
build: gen
build: gen-models
uv run mkdocs build

.PHONY: serve
serve: gen
serve: gen-models
uv run mkdocs serve
9 changes: 8 additions & 1 deletion docs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,11 @@ theme:
markdown_extensions:
- admonition
- pymdownx.details
- pymdownx.superfences
- pymdownx.superfences:
custom_fences:
- name: mermaid
class: mermaid
format: !!python/name:pymdownx.superfences.fence_code_format
- pymdownx.highlight:
anchor_linenums: true
line_spans: __span
Expand All @@ -64,3 +68,6 @@ extra:
- icon: fontawesome/brands/discord
link: https://discord.gg/lance

extra_javascript:
- https://unpkg.com/mermaid@10/dist/mermaid.min.js

3 changes: 2 additions & 1 deletion docs/src/.pages
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
nav:
- index.md
- operations
- impls
- Directory Namespace: dir
- REST Namespace: rest
3 changes: 3 additions & 0 deletions docs/src/dir/.pages
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
nav:
- Catalog Spec: catalog-spec.md
- Implementation Spec: impl-spec.md
10 changes: 5 additions & 5 deletions docs/src/impls/dir.md → docs/src/dir/catalog-spec.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Lance Directory Namespace
# Lance Directory Namespace Spec

**Lance directory namespace** is a Lance namespace implementation that stores tables in a directory structure
on any local or remote storage system. It supports two modes:
on any local or remote storage system. It has gone through 2 major spec versions:

- **V1 (Directory Listing)**: A lightweight, simple 1-level namespace that discovers tables by scanning the directory.
- **V2 (Manifest)**: A more advanced implementation backed by a manifest table (a Lance table) that supports nested namespaces and better performance at scale.
Expand Down Expand Up @@ -140,17 +140,17 @@ Please visit [Lance ObjectStore Configurations](https://lance.org/guide/object_s

### Compatibility Mode

`manifest_enabled` and `dir_listing_enabled` are used to control using V1 or V2 scheme.
`manifest_enabled` and `dir_listing_enabled` are used to control using V1 or V2 spec.
By default we enable both V1 and V2, this means:

1. When checking if a table exists in root namespace, it first checks if the table exists in the manifest, then checks if the `<table_name>.lance` exists.
2. When listing tables in root namespace, it merges tables from both manifest and directory listing, deduplicating by location and table names, manifest tables taking precedence.
3. When creating tables in root namespaces, it registers them in the manifest and uses V1 `<table_name>.lance` naming for root namespace tables.
4. If a table in root namespace is renamed, it will start to follow the V2 path definition.
5. For operations in child namespaces, only V2 scheme is used.
5. For operations in child namespaces, only V2 spec is used.

### Migration from V1 to V2

A migration should add all the V1 table directory paths to the manifest.
Once the user is certain there is no table following v1 scheme,
Once the user is certain there is no table following v1 spec,
`dir_listing_enabled` can be set to `false` to disable the compatibility mode.
Loading