# VLite

>[VLite](https://github.com/sdan/vlite) is a simple and fast vector database for storing and retrieving embeddings.

It supports:
- Exact and approximate nearest neighbor search
- Cosine distance
- Injesting text, PDF, CSV, PPTX, and webpages
- Metadata filtering
- PDF OCR support for extracting text from scanned PDFs

This notebook shows how to use the VLite vector database.

## Installation

Install VLite with pip:
```bash
pip install vlite
```

For PDF OCR support, install the `vlite[ocr]` extra:
```bash
pip install vlite[ocr]
```

In [None]:
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import VLite

In [None]:
loader = TextLoader("../../modules/state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embeddings = HuggingFaceEmbeddings(model_name="mixedbread-ai/mxbai-embed-large-v1")

## Similarity Search

In [None]:
db = VLite.from_documents(docs, embeddings)

In [None]:
query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)

In [None]:
print(docs[0].page_content)

## Similarity Search with Score

In [None]:
docs = db.similarity_search_with_score(query)

In [None]:
docs[0]

## Adding Documents

Documents can be added to an existing VLite instance.

In [None]:
db.add("This is a new document", metadata={"source": "new_source"})

In [None]:
docs = db.similarity_search("new document")
print(docs[0].page_content)

## Retrieving Documents

Documents can be retrieved from VLite based on their IDs or metadata.

In [None]:
db.get(where={"source": "new_source"})

## Updating Documents

Documents in VLite can be updated by their ID.

In [None]:
db.update("doc_id", text="Updated document text", metadata={"new_key": "new_value"})

## Deleting Documents

Documents can be deleted from VLite by their IDs.

In [None]:
db.delete("doc_id")

## Counting Documents

The number of documents in the VLite collection can be retrieved.

In [None]:
db.count()

## Saving and Loading

The VLite instance can be saved to disk and loaded later.

In [None]:
db.save()

loaded_db = VLite.load("path/to/saved/db")