# TiDB Vector SDK V2

A powerful Python SDK for vector storage and retrieval operations with TiDB.

- 🔄 Automatic embedding generation
- 🔍 Vector similarity search
- 🎯 Advanced filtering capabilities
- 📦 Bulk operations support

## Installation

In [None]:
%pip install autoflow-ai==0.0.1.dev10
%pip install dotenv ipywidgets pymysql sqlmodel

#### Configure environment variable

Go [tidbcloud.com](http://tidbcloud.com/) or using [tiup playground](https://docs.pingcap.com/tidb/stable/tiup-playground/) to create a free TiDB database cluster

Configuration can be provided through environment variables, or using `.env`:

In [None]:
# Create .env file, then edit your .env, for example:
# $ cat .env
# DATABASE_URL='mysql+pymysql://root@localhost:4000/test'
# OPENAI_API_KEY='your_openai_api_key'
%cp .env.example .env

## Quickstart

### Connect to TiDB

In [39]:
import os
from autoflow.storage.tidb import TiDBClient

# Format: mysql+pymysql://<username>:<password>@<host>:4000/<database>
db = TiDBClient.connect(os.getenv("DATABASE_URL"))

### Create table

In [40]:
from typing import Optional, Any
from autoflow.storage.tidb.base import TiDBModel
from sqlmodel import Field
from autoflow.llms.embeddings import EmbeddingFunction

# Define your embedding model.
text_embed = EmbeddingFunction("openai/text-embedding-3-small")

class Chunk(TiDBModel, table=True):    
    __tablename__ = "chunks"
    __table_args__ = {'extend_existing': True}

    id: int = Field(primary_key=True)
    text: str = Field()
    text_vec: Optional[Any] = text_embed.VectorField(source_field="text")   # 👈 Define the vector field.
    user_id: int = Field()

table = db.create_table(schema=Chunk)

### Insert Data

🔢 Auto embedding: when you insert new data, the SDK automatically embeds the corpus for you.

In [46]:
table.insert(Chunk(id=1, text="The quick brown fox jumps over the lazy dog", user_id=1),)
table.bulk_insert([
    Chunk(id=2, text="A quick brown dog runs in the park", user_id=2),
    Chunk(id=3, text="The lazy fox sleeps under the tree", user_id=2),
    Chunk(id=4, text="A dog and a fox play in the park", user_id=3)
])
table.rows()

4

### Vector Search

In [53]:
from autoflow.storage.tidb import DistanceMetric

chunks = (
    table.search("A quick fox in the park")         # 👈 The query will be embedding automatically.
        # .distance_metric(metric=DistanceMetric.COSINE)
        # .num_candidate(20)
        .filter({
            "user_id": 2
        })
        .limit(2)
        .to_pydantic()
)
[(c.text, c.score) for c in chunks]

[('A quick brown dog runs in the park', 0.665493189763966),
 ('The lazy fox sleeps under the tree', 0.554631888866523)]

### Advanced Filtering

TiDB Client supports various filter operators for flexible querying:

| Operator | Description | Example |
|----------|-------------|---------|
| `$eq` | Equal to | `{"field": {"$eq": "hello"}}` |
| `$gt` | Greater than | `{"field": {"$gt": 1}}` |
| `$gte` | Greater than or equal | `{"field": {"$gte": 1}}` |
| `$lt` | Less than | `{"field": {"$lt": 1}}` |
| `$lte` | Less than or equal | `{"field": {"$lte": 1}}` |
| `$in` | In array | `{"field": {"$in": [1, 2, 3]}}` |
| `$nin` | Not in array | `{"field": {"$nin": [1, 2, 3]}}` |
| `$and` | Logical AND | `{"$and": [{"field1": 1}, {"field2": 2}]}` |
| `$or` | Logical OR | `{"$or": [{"field1": 1}, {"field2": 2}]}` |


In [49]:
chunks = table.query({"user_id": 1})
[
    (c.id,  c.text, c.user_id)
    for c in chunks
]

[(1, 'The quick brown fox jumps over the lazy dog', 1)]

### Truncate table

Clear all data in the table:

In [45]:
table.truncate()
table.rows()

0