Vexpresso

Vexpresso is a simple and scalable multi-modal vector database built with Daft

Querying Pokemon with images and text

Features

🍵 Simple: Vexpresso is lightweight and is very easy to get started!

🔌 Flexible: Unlike many other vector databases, Vexpresso supports arbitrary datatypes. This means that you can query muti-modal objects (images, audio, video, etc...)

🌐 Scalable: Because Vexpresso uses Daft, it can be scaled using Ray to multi-gpu / cpu clusters.

📚 Persistent: Easy Saving and Loading functionality: Vexpresso has easily accessible functions for saving / loading to huggingface datasets.

Installation

To install from PyPi:

pip install vexpresso

To install from source:

git clone git@github.com:shyamsn97/vexpresso.git
cd vexpresso
pip install -e .

Usage

🔥 Check out our Showcase notebook for a more detailed walkthrough!

In this simple example, we create a simple collection and embed using huggingface sentence transformers.

from typing import List, Any
import vexpresso
# import embedding functions from vexpresso
import vexpresso.embedding_functions as ef

# creating a collection object!
collection = vexpresso.create(
    data = {
        "documents":[
            "This is document1",
            "This is document2",
            "This is document3",
            "This is document4",
            "This is document5",
            "This is document6"
        ],
        "source":["notion", "google-docs", "google-docs", "notion", "google-docs", "google-docs"],
        "num_lines":[10, 20, 30, 40, 50, 60]
    }
    # backend="ray" # turn this flag on to start / connect to a ray cluster!
)

# create a simple embedding function from sentence_transformers
def hf_embed_fn(content: List[Any]):
    from sentence_transformers import SentenceTransformer
    model = SentenceTransformer("sentence-transformers/all-mpnet-base-v2")
    return model.encode(content, convert_to_tensor=True).detach().cpu().numpy()

# or use a langchain embedding function
def langchain_embed_fn(content: List[Any]):
    from langchain.embeddings import OpenAIEmbeddings
    embeddings_model = OpenAIEmbeddings()
    return embeddings_model.embed_documents(content)

# embed function creates a column in the collection with embeddings. There can be more than one embedding column!
# lazy execution until .execute is called
collection = collection.embed(
    "documents",
    embedding_fn=hf_embed_fn,
    to="document_embeddings",
    # lazy=False # if this is false, execute doesn't need to be called
).execute()

# creating a queried collection with a subset of content closest to query
queried_collection = collection.query(
    "document_embeddings",
    query="query document6",
    k = 4, # return 2 closest
    lazy=False
    # query_embedding=[query1, query2, ...]
    # filter_conditions={"metadata_field":{"operator, ex: 'eq'":"value"}} # optional metadata filter
)

# batch query -- return a list of collections
# batch_queried_collection = collection.batch_query(
#     "document_embeddings",
#     queries=["doc1", "doc2"],
#     k = 2
# )

# filter collection for documents with num_lines less than or equal to 30
filtered_collection = queried_collection.filter(
    {
        "num_lines": {"lte":30}
    }
).execute()

# show dataframe
filtered_collection.show()

# convert to dictionary
filtered_dict = filtered_collection.to_dict()
documents = filtered_dict["documents"]

# add an entry!
collection = collection.add(
    [
        {"documents":"new documents 1", "source":"notion", "num_lines":2},
        {"documents":"new documents 2", "source":"google-docs", "num_lines":40}
    ]
)
collection = collection.execute()

Resources

Contributing

Feel free to make a pull request or open an issue for a feature!

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
tests		tests
vexpresso		vexpresso
.flake8		.flake8
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

docs

docs

examples

examples

tests

tests

vexpresso

vexpresso

.flake8

.flake8

.gitignore

.gitignore

.readthedocs.yaml

.readthedocs.yaml

LICENSE

LICENSE

Makefile

Makefile

README.md

README.md

mkdocs.yml

mkdocs.yml

pyproject.toml

pyproject.toml

Repository files navigation

Vexpresso

Features

Installation

Usage

Resources

Contributing

About

Releases 2

Packages

Languages

License

jesterlabs/vexpresso

Folders and files

Latest commit

History

Repository files navigation

Vexpresso

Features

Installation

Usage

Resources

Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Languages