ColBERT Live!

ColBERT Live! implements efficient ColBERT and ColPali search on top of vector indexes that support live updates (without rebuilding the entire index) as well as arbitrary predicates against other indexed fields.

Background

ColBERT (Contextualized Late Interaction over BERT) is a state-of-the-art semantic search model that combines the effectiveness of BERT-based language models with the performance required for practical, large-scale search applications.

Compared to traditional dense passage retrieval (i.e. vector-per-passage) ColBERT is particularly strong at handling unusual terms and short queries.

It is reasonable to think of ColBERT as combining the best of semantic vector search with traditional keyword search a la BM25, but without having to tune the weighting of hybrid search or dealing with corner cases where the vector and keyword sides play poorly together.

However, the initial ColBERT implementation is designed around a custom index that cannot be updated incrementally, and can only be combined with other indexes with difficulty. Adding, modifying, or removing documents from the custom index requires reindexing the entire collection, which can be prohibitively slow for large datasets.

ColBERT Live!

ColBERT Live! implements ColBERT on any vector database. This means you can add, modify, or remove documents from your search system without the need for costly reindexing of the entire collection, making it ideal for dynamic content environments. It also means that you can easily apply other predicates such as access controls or metadata filters from your database to your vector searches. ColBERT Live! features

Efficient ColBERT search implementation
Support for live updates to the vector index
Abstraction layer for database backends, starting with AstraDB and SQLite
State of the art ColBERT techniques including:
- Answer.AI ColBERT model for higher relevance
- Document embedding pooling for reduced storage requirements
- Query embedding pooling for improved search performance

Installation

You can install ColBERT Live! using pip:

pip install colbert-live

Usage

Subclass your database backend and implement the required methods for retrieving embeddings:

from colbert_live.db.astra import AstraCQL
# or
from colbert_live.db.sqlite import Sqlite3DB

class MyDB(AstraCQL):
  ...

db = MyDB()

Instantiate:

model = colbert_live.models.ColbertModel() 
# or
model = colbert_live.models.ColpaliModel()

Initialize the ColbertLive instance:
```
colbert = ColbertLive(db, model)
```
Call search:
```
colbert.search(query_str, top_k)
```

Two cheat sheets are available:

Using ColBERT Live! with Astra: for humans; for LLMs
Implementing a new DB subclass: for humans; for LLMs

Supported databases

ColBERT Live! initially supports DataStax Astra and SQLite out of the box. Adding support for other databases is straightforward; check out the Astra implementation for an example to follow. If you're not concerned about making it reusable, you just have to implement the two methods of the base DB class.

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
colbert_live		colbert_live
docs		docs
example		example
tests		tests
LICENSE.txt		LICENSE.txt
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ColBERT Live!

Background

ColBERT Live!

Installation

Usage

Supported databases

License

About

Releases

Packages

Languages

License

jbellis/colbert-live

Folders and files

Latest commit

History

Repository files navigation

ColBERT Live!

Background

ColBERT Live!

Installation

Usage

Supported databases

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages