lancedb · prrao87 · Jun 29, 2026 · Jun 28, 2026 · Jun 29, 2026 · Jun 29, 2026
diff --git a/docs/docs.json b/docs/docs.json
@@ -63,6 +63,16 @@
               "performance"
             ]
           },
+          {
+            "group": "Model training",
+            "pages": [
+              "training/why-lancedb",
+              "training/index",
+              "training/torch",
+              "training/object-detection",
+              "training/vlm-finetuning"
+            ]
+          },
           {
             "group": "Guides",
             "pages": [
@@ -141,15 +151,6 @@
                   "storage/index",
                   "storage/configuration"
                 ]
-              },
-              {
-                "group": "Training",
-                "pages": [
-                  "training/index",
-                  "training/torch",
-                  "training/object-detection",
-                  "training/vlm-finetuning"
-                ]
               }
             ]
           },

diff --git a/docs/index.mdx b/docs/index.mdx
@@ -3,51 +3,85 @@ title: LanceDB
 sidebarTitle: "LanceDB"
 description: "Multimodal lakehouse for AI."
 icon: "/static/assets/logo/lancedb-icon-gray.svg"
-keywords: ["open source", "oss"]
+keywords: ["multimodal lakehouse", "training", "feature engineering", "search", "open source", "oss"]
 ---
 
-**LanceDB** is a [multimodal lakehouse](https://lancedb.com/blog/multimodal-lakehouse/) for
-AI, built on top of [Lance](/lance), an open-source lakehouse format. Below, we list a few
-ways LanceDB can help you build and scale your AI and ML workloads.
+**LanceDB** is a [multimodal lakehouse](https://lancedb.com/blog/multimodal-lakehouse/) for AI teams that need
+one data layer for curation, feature engineering, search and retrieval, and model training.
+It is built on top of [Lance](/lance), an open-source lakehouse format designed for multimodal AI data.
+
+Move from data exploration to model training on one, unified platform without needing to manage a
+fragmented stack of storage, feature, retrieval, and training systems.
+
+## Build better models, faster
+
+Training data and experimentation slow down when raw data, metadata, embeddings, features, and governance
+artifacts live in separate systems. LanceDB keeps them together in one versioned multimodal table, so AI teams spend less
+time stitching infrastructure together and more time improving datasets, testing features, and keeping GPUs fed.
+
+![Training data lifecycle: Curation, Feature Engineering, Search and Retrieval, Training](/static/assets/images/overview/training-data-lifecycle.svg)
+
+Use the same table to curate training data, add derived features, retrieve examples, and feed training jobs that rely on expensive GPUs.
+Training workloads can sample, shuffle, and scan projected columns from local storage or object storage, then assemble
+GPU-ready batches from a tagged dataset version.
+
+For a deeper look at how this works in training pipelines, start with [Why LanceDB for training](/training/why-lancedb).
+
+## LanceDB suite
+
+The LanceDB suite includes LanceDB OSS, an open-source embedded retrieval library, and LanceDB Enterprise,
+a multimodal lakehouse platform for the full AI data lifecycle.
+OSS is easy to set up on a local machine for search and regular-scale workflows. LanceDB Enterprise is built
+for teams that need scale without building bespoke infrastructure for curation,
+feature engineering, search and retrieval, and efficient training data access.
+
+![LanceDB suite: OSS search and Enterprise multimodal lakehouse on Lance format](/static/assets/images/overview/lancedb-suite.svg)
+
+## Why teams use LanceDB
 
 <Steps>
-  <Step title="High-performance random access and data management for model training">
-    Use LanceDB to curate, explore and distribute very large multimodal datasets for training and fine-tuning models.
-    LanceDB comes with built-in table versioning, schema evolution, and fast random access, making it far more efficient to do
-    dataset slicing, sampling, filtering and shuffles on large, rapidly evolving datasets.
+  <Step title="One table for the whole AI data loop">
+    Store images, video, audio, text, annotations, embeddings, and model-generated features together in one schema-enforced table.
+    The same table can support dataset curation, feature backfills, experiment splits, retrieval, and training.
+  </Step>
+  <Step title="High-throughput data access for training">
+    Training workloads mix fast random access with high-throughput sequential scans. LanceDB is designed for both, so
+    teams can shuffle data into GPU-ready batches more efficiently, improve input throughput, and iterate on experiments faster.
   </Step>
-  <Step title="Massively scalable, fast and high-quality retrieval − without breaking the bank">
-    Use LanceDB as the data + retrieval layer for production AI workloads: RAG, agents, semantic search,
-    recommendation systems, and more.
-    Keep multimodal data, metadata, and embeddings in the same table and query them via vector search,
-    full-text search or SQL. Easily add new features (columns in your tables) as your
-    application evolves, without copying existing data.
+  <Step title="Fast, versatile search and retrieval">
+    Whether the end user is a human or an agent, LanceDB powers production retrieval workloads such as semantic search,
+    hybrid search, RAG, agent memory, and recommendation systems. Retrieval runs against the same LanceDB tables used
+    for curation, feature engineering, and training workflows.
   </Step>
 </Steps>
 
-LanceDB is designed for a variety of workloads and deployment scenarios, and supports use cases
-that are way beyond traditional vector search. The LanceDB suite includes LanceDB OSS, an open-source embedded library,
-and LanceDB Enterprise, a distributed and managed multimodal lakehouse.
-Both are built on top of the same open-source Lance format and table abstractions.
-
-![](/static/assets/images/overview/lancedb-suite.png)
+## Start with your workload
 
-## Use cases
-
-- **Search**: Build high-performance search and retrieval applications using LanceDB's optimized storage, including vector search, full-text search, and hybrid search with secondary indexes.
-- **Data Curation**: Manage and filter on petabyte-scale multimodal datasets, including video and point cloud data, to gain insights, explore data and inform model development.
-- **Feature engineering**: Add new columns (features), create embeddings, and transform your data at
-scale. LanceDB lets you extend tables both vertically and horizontally with minimal I/O overhead.
-- **Training**: Efficiently access and manage large-scale multimodal datasets for training and fine-tuning AI models.
+<CardGroup cols={2}>
+  <Card title="Train and fine-tune models" icon="fire" href="/training/why-lancedb">
+    Learn why LanceDB works well as the data layer for training workloads.
+  </Card>
+  <Card title="Load data into PyTorch" icon="boxes-stacked" href="/training/">
+    Use LanceDB tables and permutations for projected, shuffled, random-access training reads.
+  </Card>
+  <Card title="Browse ready-to-use datasets" icon="database" href="/datasets">
+    Explore Lance-formatted multimodal datasets with raw bytes, metadata, embeddings, and indices.
+  </Card>
+  <Card title="Build search and retrieval" icon="search" href="/search/">
+    Use vector search, full-text search, hybrid search, reranking, filtering, and SQL.
+  </Card>
+</CardGroup>
 
-## Choose how you run LanceDB
+## From local development to production scale
 
-Depending on your needs, you can choose one of the following ways to run LanceDB.
+LanceDB OSS and LanceDB Enterprise share the same Lance format and table model. Start locally with the embedded OSS
+library, then move to Enterprise when your team needs distributed scale, managed infrastructure, private deployment,
+or higher-throughput curation, feature engineering, search and retrieval, and training workflows.
 
 ### 1. LanceDB OSS
 The fastest way to get started is the open-source embedded library, with client SDKs in Python, TypeScript
-and Rust. Run it locally during development, then use the same data model and APIs as you scale up
-and need a managed solution. Start here:
+and Rust. Run it locally in just a few steps, which lets you explore datasets, curate data, and run search and retrieval workloads
+for agents. Start here:
 
 <Columns cols={2}>
   <Card
@@ -59,19 +93,18 @@ and need a managed solution. Start here:
 </Card>
   <Card
     title="Basic Table Operations"
-    icon="search"
+    icon="table"
     href="/tables/"
   >
-    Create tables, search vectors, and modify data in LanceDB.
+    Create tables, evolve schemas, version data, and modify rows in LanceDB.
   </Card>
 </Columns>
 
 ### 2. LanceDB Enterprise
 
-[LanceDB Enterprise](/enterprise) is a distributed and managed **multimodal lakehouse** built for
-search, curation, feature engineering, and training-oriented data access workflows
-on top of the same core table abstraction. This eliminates the need for teams to build bespoke
-infrastructure to manage petabyte-scale multimodal datasets.
+[LanceDB Enterprise](/enterprise) is a petabyte-scale (and beyond), distributed **multimodal lakehouse** platform built for
+search, curation, feature engineering, and high-throughput training data access workflows on top of the same core table
+abstraction. This eliminates the need for teams to build bespoke infrastructure to manage large multimodal datasets.
 To set up LanceDB Enterprise in your organization, reach out to us at
 [contact@lancedb.com](mailto:contact@lancedb.com).
 
@@ -88,4 +121,4 @@ private deployments, and can operate under strict [security requirements](/enter
   href="/enterprise/quickstart"
 >
   Get started with LanceDB Enterprise in minutes.
-</Card>
+</Card>
diff --git a/docs/lance.mdx b/docs/lance.mdx
@@ -5,15 +5,15 @@ description: "Open-source lakehouse format for multimodal AI."
 icon: "/static/assets/logo/lance-logo-gray.svg"
 ---
 
-[Lance](https://lance.org/) is an open-source lakehouse format, which provides the
-foundation for LanceDB's capabilities. It provides a file format,
-table format, and catalog spec with multimodal data at the center of its design, allowing developers
+[Lance](https://lance.org/) is an open-source, columnar lakehouse format for multimodal AI.
+It provides a file format, table format, and lightweight catalog spec, allowing developers
 to build a complete open lakehouse on top of object storage.
 
-Building on top of open foundations and optimizing the format for AI workloads brings
-high-performance vector search, full-text search, random access, and feature engineering capabilities
-to a single unified system ([LanceDB](/enterprise)), eliminating the need for bespoke ETL and data pipelines that move data
-to multiple other specialized data systems.
+Building on top of open foundations and optimizing the format for random access
+(without compromising scan performance) enables
+high-performance vector search, full-text search, indexing, and feature engineering capabilities.
+[LanceDB](/enterprise) builds on these capabilities so teams can work with one multimodal data layer
+instead of moving data across separate storage, search, feature, and training systems.
 
 <Card
   title="Lance format documentation"
@@ -23,15 +23,17 @@ to multiple other specialized data systems.
   Visit the Lance format documentation to learn more about its design, features, and how it enables the multimodal lakehouse.
 </Card>
 
-## Advantages of the Lance format
+## Capabilities of the Lance format
 
-Advantage | Description
+Capability | What it enables
 --- | ---
-Multimodal storage | Efficiently holds vectors, images, videos, audio, text, and more
-Version control | Built-in data versioning for reproducible ML experiments and data lineage
-ML-optimized | Designed for training and inference workloads with fast random access
-Query performance | Columnar storage enables blazing-fast vector search and analytics
-Cloud-native | Seamless integration with cloud object stores (S3, GCS, Azure Blob)
+Multimodal storage | Store images, video, audio, text, embeddings, annotations, metadata, features, and more, all in one table.
+First-class blob API | Store large binary objects such as images, video, audio, and model artifacts in blob columns with lazy reads and streaming byte access.
+Fast random access and scans | Sample, shuffle, and retrieve individual rows efficiently without giving up high-throughput sequential reads.
+Flexible data evolution | Add, drop, rename, or alter columns as datasets change, often without rewriting existing data files.
+Versioned tables | Reproduce experiments, restore previous states, and tie downstream artifacts to the exact table version they used.
+Hybrid search and indexing | Combine vector search, full-text search, and scalar filters on the same dataset with Lance indexes.
+Open lakehouse interoperability | Build on object storage and connect Lance tables to open engines such as PyTorch, Ray, Spark, Trino, DuckDB and Polars.
 
 ## Key concepts
 

diff --git a/docs/static/assets/images/overview/lancedb-suite.svg b/docs/static/assets/images/overview/lancedb-suite.svg