# Vectorstore

This is an implementation of a LangChain vectorstore using `mariadb` as the backend.

MariaDB requires version 11.7.1 or later for vector support.

You can run the following command to spin up a MariaDB container:

```shell
docker run --name mariadb-container -e MARIADB_ROOT_PASSWORD=langchain -e MARIADB_DATABASE=langchain -p 3306:3306 -d mariadb:11.7-rc
```

## Status

This code provides a MariaDB vectorstore implementation with the following features:

* Uses MariaDB's native vector similarity search capabilities
* Supports both cosine and euclidean distance metrics
* Provides comprehensive metadata filtering
* Uses connection pooling for better performance
* Supports custom table and column configurations

Currently, there is **no mechanism** that supports easy data migration on schema changes. So any schema changes in the vectorstore will require the user to recreate the tables and re-add the documents.

## Install dependencies

In [1]:
## install c/c connector
# on ubuntu
sudo apt install libmariadb3 libmariadb-dev
# on CentOS, RHEL, Rocky Linux
sudo yum install MariaDB-shared MariaDB-devel
!pip install --quiet -U langchain_openai mariadb

## Initialize the vectorstore

In [2]:
from langchain_openai import OpenAIEmbeddings
from langchain_mariadb import MariaDBStore
from langchain_core.documents import Document
import mariadb

# Create a connection pool
pool = mariadb.ConnectionPool(
    pool_name="mypool",
    pool_size=3,
    host="localhost",
    port=3306,
    user="langchain",
    password="langchain",
    database="langchain"
)

# Create a new vector store
vectorstore = MariaDBStore(
    embeddings=OpenAIEmbeddings(),
    embedding_length=1536,
    pool=pool,
    collection_name="my_docs"
)

## Drop tables

If you need to drop tables (e.g., updating the embedding to a different dimension or just updating the embedding provider): 

```python
vectorstore.drop_tables()
```

## Add documents

Add documents to the vectorstore

In [5]:
docs = [
    Document(page_content='there are cats in the pond', metadata={"id": 1, "location": "pond", "topic": "animals"}),
    Document(page_content='ducks are also found in the pond', metadata={"id": 2, "location": "pond", "topic": "animals"}),
    Document(page_content='fresh apples are available at the market', metadata={"id": 3, "location": "market", "topic": "food"}),
    Document(page_content='the market also sells fresh oranges', metadata={"id": 4, "location": "market", "topic": "food"}),
    Document(page_content='the new art exhibit is fascinating', metadata={"id": 5, "location": "museum", "topic": "art"}),
]
vectorstore.add_documents(docs)


## Add from text 

create embedding from text to the vectorstore

In [6]:
texts = [
    'a sculpture exhibit is also at the museum',
    'a new coffee shop opened on Main Street',
    'the book club meets at the library',
    'the library hosts a weekly story time for kids',
    'a cooking class for beginners is offered at the community center'
]

# metadata are optionnals
metadatas = [
    {"id": 6, "location": "museum", "topic": "art"},
    {"id": 7, "location": "Main Street", "topic": "food"},
    {"id": 8, "location": "library", "topic": "reading"},
    {"id": 9, "location": "library", "topic": "reading"},
    {"id": 10, "location": "community center", "topic": "classes"}
]

vectorstore.add_texts(texts=texts, metadatas=metadatas)


## Searching similarity 

search using the vectorstore

In [6]:
# Search similar texts
results = vectorstore.similarity_search("Hello", k=2)

# Search with metadata filter
results = vectorstore.similarity_search(
    "Hello",
    filter={"category": "greeting"}
)


## Filtering Support

The vectorstore supports a set of filters that can be applied against the metadata fields of the documents.

| Operator  | Meaning/Category        |
|-----------|-------------------------|
| \$eq      | Equality (==)           |
| \$ne      | Inequality (!=)         |
| \$lt      | Less than (<)           |
| \$lte     | Less than or equal (<=) |
| \$gt      | Greater than (>)        |
| \$gte     | Greater than or equal (>=) |
| \$in      | Special Cased (in)      |
| \$nin     | Special Cased (not in)  |
| \$like    | Text (like)             |
| \$nlike   | Text (not like)         |
| \$and     | Logical (and)           |
| \$or      | Logical (or)            |
| \$not     | Logical (not)           |

In [7]:
# Search with simple filter
results = vectorstore.similarity_search('kitty', k=10, filter={
    'id': {'$in': [1, 5, 2, 9]}
})

In [8]:
# Search with multiple conditions (AND)
results = vectorstore.similarity_search('ducks', k=10, filter={
    'id': {'$in': [1, 5, 2, 9]},
    'location': {'$in': ["pond", "market"]}
})

In [9]:
# Search with explicit AND operator
results = vectorstore.similarity_search('ducks', k=10, filter={
    '$and': [
        {'id': {'$in': [1, 5, 2, 9]}},
        {'location': {'$in': ["pond", "market"]}},
    ]
})

In [10]:
# Search with NOT operator
results = vectorstore.similarity_search('bird', k=10, filter={
    'location': { "$ne": 'pond'}
})

In [11]:
# Search with complex filter
results = vectorstore.similarity_search('animal', k=10, filter={
    '$or': [
        {'topic': 'animals'},
        {'$and': [
            {'location': {'$like': '%park%'}},
            {'topic': {'$ne': 'art'}}
        ]}
    ]
})