# TiDB Python SDK V2

A powerful Python SDK for vector storage and retrieval operations with TiDB.

- 🔄 Automatic embedding generation
- 🔍 Vector similarity search
- 🎯 Advanced filtering capabilities
- 📦 Bulk operations support

## Installation

In [1]:
%pip install autoflow-ai==0.0.1.dev20

Note: you may need to restart the kernel to use updated packages.


## Configuration

- Go [tidbcloud.com](https://tidbcloud.com/) or using [tiup playground](https://docs.pingcap.com/tidb/stable/tiup-playground/) to create a free TiDB database cluster.
- Go [OpenAI platform](https://platform.openai.com/api-keys) to create your API key.

Configuration can be provided through environment variables, or using `.env`:

In [None]:
# Create .env file, then edit your .env, for example:
# $ cat .env
# TIDB_HOST=localhost
# TIDB_PORT=4000
# TIDB_USERNAME=root
# TIDB_PASSWORD=
# OPENAI_API_KEY='your_openai_api_key'
#
# Or you can use DATABASE_URL to connect to TiDB, for example:
# $ cat .env
# DATABASE_URL=mysql+pymysql://root:@localhost:4000/test
#
# If you are using TiDB Serverless, the DATABASE_URL should be like:
# DATABASE_URL=mysql+pymysql://<USERNAME:<PASSWORD>@<HOST>:4000/test&ssl_verify_cert=true&ssl_verify_identity=true
%cp .env.example .env

In [2]:
import dotenv

dotenv.load_dotenv()

True

## Quickstart

### Connect to TiDB

In [3]:
import os
from autoflow.storage.tidb import TiDBClient

db = TiDBClient.connect(
    host=os.getenv("TIDB_HOST"),
    port=int(os.getenv("TIDB_PORT")),
    username=os.getenv("TIDB_USERNAME"),
    password=os.getenv("TIDB_PASSWORD"),
    database=os.getenv("TIDB_DATABASE"),
)

# If you are using DATABASE_URL
# db = TiDBClient.connect(database_url=os.getenv("DATABASE_URL"))

### Create table

In [4]:
from typing import Optional, Any
from autoflow.storage.tidb import TiDBModel, Field
from autoflow.llms.embeddings import EmbeddingFunction

# Define your embedding model.
text_embed = EmbeddingFunction("openai/text-embedding-3-small")


class Chunk(TiDBModel, table=True):
    __tablename__ = "chunks"
    __table_args__ = {"extend_existing": True}

    id: int = Field(primary_key=True)
    text: str = Field()
    text_vec: Optional[Any] = text_embed.VectorField(
        source_field="text"
    )  # 👈 Define the vector field.
    user_id: int = Field()


table = db.create_table(schema=Chunk)

### Insert Data

🔢 Auto embedding: when you insert new data, the SDK automatically embeds the corpus for you.

In [5]:
table.insert(
    Chunk(text="The quick brown fox jumps over the lazy dog", user_id=1),
)
table.bulk_insert(
    [
        Chunk(text="A quick brown dog runs in the park", user_id=2),
        Chunk(text="The lazy fox sleeps under the tree", user_id=2),
        Chunk(text="A dog and a fox play in the park", user_id=3),
    ]
)
table.rows()

4

### Vector Search

In [6]:
chunks = (
    table.search(
        "A quick fox in the park"
    )  # 👈 The query will be embedding automatically.
    .filter({"user_id": 2})
    .limit(2)
    .to_pydantic()
)
[(c.text, c.score) for c in chunks]

[('A quick brown dog runs in the park', 0.665493189763966),
 ('The lazy fox sleeps under the tree', 0.554631888866523)]

### Advanced Filtering

TiDB Client supports various filter operators for flexible querying:

| Operator | Description | Example |
|----------|-------------|---------|
| `$eq` | Equal to | `{"field": {"$eq": "hello"}}` |
| `$gt` | Greater than | `{"field": {"$gt": 1}}` |
| `$gte` | Greater than or equal | `{"field": {"$gte": 1}}` |
| `$lt` | Less than | `{"field": {"$lt": 1}}` |
| `$lte` | Less than or equal | `{"field": {"$lte": 1}}` |
| `$in` | In array | `{"field": {"$in": [1, 2, 3]}}` |
| `$nin` | Not in array | `{"field": {"$nin": [1, 2, 3]}}` |
| `$and` | Logical AND | `{"$and": [{"field1": 1}, {"field2": 2}]}` |
| `$or` | Logical OR | `{"$or": [{"field1": 1}, {"field2": 2}]}` |


In [7]:
chunks = table.query({"user_id": 1})
[(c.id, c.text, c.user_id) for c in chunks]

[(1, 'The quick brown fox jumps over the lazy dog', 1)]

### Multiple Tables Join

In [8]:
# Create a table to stored user data:
class User(TiDBModel, table=True):
    __tablename__ = "users"
    __table_args__ = {"extend_existing": True}

    id: int = Field(primary_key=True)
    name: str = Field(max_length=20)


user_table = db.create_table(schema=User)

In [12]:
user_table.insert(User(id=1, name="Alice"))

User(name='Alice', id=1)

In [9]:
from sqlmodel import select, Session

db_engine = db.db_engine
with Session(db_engine) as db_session:
    query = (
        select(Chunk).join(User, Chunk.user_id == User.id).where(User.name == "Alice")
    )
    chunks = db_session.exec(query).all()

[(c.id, c.text, c.user_id) for c in chunks]

[(1, 'The quick brown fox jumps over the lazy dog', 1)]

### Delete Data

In [10]:
table.delete(filters={"user_id": 2})
table.rows()

2

### Execute raw SQL

In [11]:
db.execute("SELECT id, text, user_id FROM chunks")

{'success': True,
 'result': [(1, 'The quick brown fox jumps over the lazy dog', 1),
  (4, 'A dog and a fox play in the park', 3)],
 'error': None}

### Truncate table

Clear all data in the table:

In [12]:
table.truncate()
table.rows()

0