---
sidebar_label: AI Functions
---


# OceanBase AI Functions

This notebook covers how to use OceanBase AI functions, including AI_EMBED, AI_COMPLETE, and AI_RERANK functions available in OceanBase 4.4.1+ and SeekDB.

## Table of Contents

- [Setup](#setup) - Deploy OceanBase and install dependencies
- [Initialization](#initialization) - Configure and create AI functions client
- [Model Configuration Steps](#model-configuration-steps) - Step-by-step model and endpoint configuration
- [Test AI Functions](#test-ai-functions) - Test AI_COMPLETE, AI_EMBED, and AI_RERANK
- [AI_EMBED](#ai_embed) - Convert text to vector embeddings
- [AI_COMPLETE](#ai_complete) - Generate text using LLM
- [AI_RERANK](#ai_rerank) - Rerank search results for better accuracy
- [Batch Operations](#batch-operations) - Process multiple texts efficiently
- [Use Cases](#use-cases) - Real-world application examples


## Setup

To use OceanBase AI functions, you'll need to deploy OceanBase 4.4.1+ or SeekDB:


In [None]:
docker run --name=oceanbase -e MODE=mini -e OB_SERVER_IP=127.0.0.1 -p 2881:2881 -d oceanbase/oceanbase-ce:4.4.1.0-100000032025101610


And install the `langchain-oceanbase` integration package:


In [None]:
pip install -qU "langchain-oceanbase"


Check the connection to OceanBase and set the memory usage ratio for vector data:


In [None]:
from pyobvector import ObVecClient

tmp_client = ObVecClient()
tmp_client.perform_raw_text_sql("ALTER SYSTEM ob_vector_memory_limit_percentage = 30")


## Initialization

Configure the connection parameters and initialize the OceanBase AI functions client:


In [None]:
from langchain_oceanbase.ai_functions import OceanBaseAIFunctions

connection_args = {
    "host": "127.0.0.1",
    "port": "2881",
    "user": "root@test",
    "password": "",
    "db_name": "test",
}

ai_functions = OceanBaseAIFunctions(connection_args=connection_args)
print("AI Functions client initialized successfully!")


## Query AI Models and Endpoints

You can query all configured AI models and endpoints to check your configuration.

### List AI Models


In [None]:
# Query all configured AI models
models = ai_functions.list_ai_models()

print(f"Found {len(models)} AI model(s):\n")
for i, model in enumerate(models, 1):
    print(f"Model {i}:")
    print(f"  Name: {model.get('model_name')}")
    print(f"  Type: {model.get('type')} (1=embedding, 3=completion)")
    print(f"  Model ID: {model.get('model_id')}")
    print(f"  Created: {model.get('gmt_create')}")
    print()


### List AI Model Endpoints


In [None]:
# Query all configured AI model endpoints
endpoints = ai_functions.list_ai_model_endpoints()

print(f"Found {len(endpoints)} AI model endpoint(s):\n")
for i, endpoint in enumerate(endpoints, 1):
    print(f"Endpoint {i}:")
    print(f"  Name: {endpoint.get('ENDPOINT_NAME')}")
    print(f"  AI Model: {endpoint.get('AI_MODEL_NAME')}")
    print(f"  URL: {endpoint.get('URL')}")
    print(f"  Provider: {endpoint.get('PROVIDER')}")
    print(f"  Scope: {endpoint.get('SCOPE')}")
    print()


**Note**: AI functions are only supported in OceanBase 4.4.1+ or SeekDB. If you're using an older version, initialization will raise a `ValueError`.


### Step 3: Configure Model Endpoints

Set up API endpoints for each model:


In [None]:
# Configure embedding model endpoint
ai_functions.create_ai_model_endpoint(
    endpoint_name="embedding_endpoint",
    ai_model_name="your-embedding-model",
    url="https://api.example.com/v1",
    access_key="YOUR_API_KEY",
    provider="openai"
)

# Configure completion model endpoint
ai_functions.create_ai_model_endpoint(
    endpoint_name="complete_endpoint",
    ai_model_name="your-completion-model",
    url="https://api.example.com/v1",
    access_key="YOUR_API_KEY",
    provider="openai"
)

print("✅ Endpoints configured successfully!")


### Step 4: Query Model Endpoints

Verify that endpoints are configured:


In [None]:
# Query all configured AI model endpoints
endpoints = ai_functions.list_ai_model_endpoints()

print(f"Found {len(endpoints)} AI model endpoint(s):\n")
for i, endpoint in enumerate(endpoints, 1):
    print(f"Endpoint {i}:")
    print(f"  Name: {endpoint.get('ENDPOINT_NAME')}")
    print(f"  AI Model: {endpoint.get('AI_MODEL_NAME')}")
    print(f"  URL: {endpoint.get('URL')}")
    print(f"  Provider: {endpoint.get('PROVIDER')}")
    print(f"  Scope: {endpoint.get('SCOPE')}")
    print()


### Step 5: Delete Models (Optional)

If you need to remove models:


In [None]:
# Delete models (Note: delete endpoints first)
# ai_functions.drop_ai_model("your-embedding-model")
# ai_functions.drop_ai_model("your-completion-model")
print("Note: Model deletion is commented out. Uncomment to delete models.")


### Step 6: Delete Model Endpoints

Remove model endpoints:


In [None]:
# Delete model endpoints
# ai_functions.drop_ai_model_endpoint("embedding_endpoint")
# ai_functions.drop_ai_model_endpoint("complete_endpoint")
print("Note: Endpoint deletion is commented out. Uncomment to delete endpoints.")


## Test AI Functions

After configuration, test each AI function to verify they work correctly.

### Step 7: Test AI_COMPLETE

Test text generation:


In [None]:
# Test AI_COMPLETE
completion = ai_functions.ai_complete(
    prompt="Explain what machine learning is in one sentence",
    model_name="your-completion-model"
)
print(f"Completion: {completion}")


### Step 8: Test AI_EMBED

Test text embedding:


In [None]:
# Test AI_EMBED
vector = ai_functions.ai_embed(
    text="Test text: Machine learning is a subset of artificial intelligence",
    model_name="your-embedding-model"
)
print(f"✅ Embedding successful: {len(vector)} dimensions")
print(f"First 5 values: {vector[:5]}")


### Step 9: Test AI_RERANK

Test document reranking:


In [None]:
# Test AI_RERANK
query = "machine learning algorithms"
documents = [
    "Deep learning is a branch of machine learning that uses multi-layer neural networks",
    "Python is a popular programming language widely used in data science",
    "Supervised learning requires labeled data to train models"
]

reranked = ai_functions.ai_rerank(
    query=query,
    documents=documents,
    model_name="your-embedding-model",
    top_k=2
)

print("Reranked results:")
for result in reranked:
    print(f"Rank {result['rank']}: Score {result['score']:.4f}")
    print(f"  Document: {result['document']}")
    print()


## AI_EMBED

The `AI_EMBED` function converts text to vector embeddings, which can be used for semantic search and similarity matching.

### Basic Usage


In [None]:
# Embed text to vector
text = "Machine learning is a subset of artificial intelligence"
vector = ai_functions.ai_embed(text=text)
print(f"Embedding dimension: {len(vector)}")
print(f"First 5 values: {vector[:5]}")


### With Model Name


In [None]:
# Specify embedding model
vector = ai_functions.ai_embed(
    text="Hello, world!",
    model_name="your-embedding-model"
)
print(f"Embedding generated with model: {len(vector)} dimensions")


### With Dimension


In [None]:
# Specify embedding dimension
vector = ai_functions.ai_embed(
    text="Natural language processing",
    model_name="your-embedding-model",
    dimension=384
)
print(f"Embedding with specified dimension: {len(vector)}")


## AI_COMPLETE

The `AI_COMPLETE` function generates text completions using Large Language Models (LLMs).

### Basic Usage


In [None]:
# Generate text completion
prompt = "What is machine learning?"
completion = ai_functions.ai_complete(prompt=prompt)
print(f"Completion: {completion}")


### With Model Name


In [None]:
# Specify LLM model
completion = ai_functions.ai_complete(
    prompt="Explain quantum computing in simple terms",
    model_name="text-generation-model"
)
print(f"Completion: {completion}")


### With Content Replacement


In [None]:
# Use template with {{TEXT}} placeholder
prompt = "Translate to English: {{TEXT}}"
completion = ai_functions.ai_complete(
    prompt=prompt,
    model_name="text-generation-model",
    content="Hello world"
)
print(f"Translation: {completion}")


### With Options


In [None]:
# Customize generation parameters
options = {
    "temperature": 0.7,
    "top_p": 0.9,
    "presence_penalty": 0.1
}

completion = ai_functions.ai_complete(
    prompt="Write a short story about AI",
    model_name="text-generation-model",
    options=options
)
print(f"Completion: {completion}")


## AI_RERANK

The `AI_RERANK` function reranks search results to improve relevance by using semantic understanding.

### Basic Usage


In [None]:
# Rerank documents
query = "machine learning algorithms"
documents = [
    "Deep learning uses neural networks for pattern recognition",
    "Supervised learning requires labeled training data",
    "Python is a popular programming language",
    "Reinforcement learning learns through trial and error",
    "Databases store structured information"
]

reranked = ai_functions.ai_rerank(
    query=query,
    documents=documents,
    top_k=3
)

print("Reranked results:")
for result in reranked:
    print(f"Rank {result['rank']}: Score {result['score']:.4f}")
    print(f"  Document: {result['document'][:50]}...")
    print()


### With Model Name


In [None]:
# Specify reranking model
reranked = ai_functions.ai_rerank(
    query="artificial intelligence",
    documents=[
        "Machine learning enables computers to learn from data",
        "Natural language processing understands human language",
        "Computer vision interprets visual information"
    ],
    model_name="rerank-model",
    top_k=2
)

print("Top 2 reranked results:")
for result in reranked:
    print(f"Rank {result['rank']}: {result['document']}")
    print(f"  Score: {result['score']:.4f}\n")


### Rerank All Documents


In [None]:
# Return all reranked documents (no top_k limit)
reranked = ai_functions.ai_rerank(
    query="neural networks",
    documents=[
        "Convolutional neural networks excel at image recognition",
        "Recurrent neural networks process sequential data",
        "Transformers revolutionized NLP tasks"
    ]
)

print("All reranked results:")
for result in reranked:
    print(f"Rank {result['rank']}: Score {result['score']:.4f}")
    print(f"  {result['document']}\n")


## Batch Operations

Process multiple texts efficiently using batch operations.

### Batch Embedding


In [None]:
# Embed multiple texts at once
texts = [
    "Machine learning algorithms",
    "Deep learning neural networks",
    "Natural language processing",
    "Computer vision systems"
]

vectors = ai_functions.batch_ai_embed(
    texts=texts,
    model_name="your-embedding-model"
)

print(f"Generated {len(vectors)} embeddings")
print(f"Each embedding has {len(vectors[0])} dimensions")


## Use Cases

### Use Case 1: Building a Semantic Search System

Combine AI_EMBED with vector search for semantic search:


In [None]:
# Step 1: Embed query
query = "How does neural network training work?"
query_vector = ai_functions.ai_embed(
    text=query,
    model_name="your-embedding-model"
)

# Step 2: Use vector for similarity search
# (This would typically be done with OceanbaseVectorStore)
# vector_store.similarity_search_by_vector(query_vector, k=5)

print(f"Query vector dimension: {len(query_vector)}")


### Use Case 2: RAG with Reranking

Improve RAG results by reranking retrieved documents:


In [None]:
# Step 1: Retrieve documents (example)
retrieved_docs = [
    "Neural networks consist of layers of interconnected nodes",
    "Training involves forward and backward propagation",
    "Gradient descent optimizes network parameters",
    "Python libraries like TensorFlow simplify implementation"
]

# Step 2: Rerank for better relevance
query = "How to train a neural network?"
reranked = ai_functions.ai_rerank(
    query=query,
    documents=retrieved_docs,
    model_name="rerank-model",
    top_k=2
)

print("Most relevant documents:")
for result in reranked:
    print(f"{result['document']}\n")


### Use Case 3: Text Generation Pipeline

Use AI_COMPLETE for content generation:


In [None]:
# Generate summaries
documents = [
    "Machine learning is transforming industries by enabling computers to learn from data without explicit programming.",
    "Deep learning enables breakthrough applications in image recognition, natural language processing, and autonomous systems.",
    "AI is reshaping the future of technology, creating new possibilities and challenges."
]

for doc in documents:
    prompt = f"Summarize the following text in one sentence: {{TEXT}}"
    summary = ai_functions.ai_complete(
        prompt=prompt,
        model_name="text-generation-model",
        content=doc
    )
    print(f"Summary: {summary}\n")


### Use Case 4: Multi-language Support

Use AI_COMPLETE for translation:


In [None]:
# Translate text
texts = [
    "Hello, how are you?",
    "Machine learning is fascinating",
    "Thank you for your help"
]

for text in texts:
    prompt = "Translate to Chinese: {{TEXT}}"
    translation = ai_functions.ai_complete(
        prompt=prompt,
        model_name="text-generation-model",
        content=text
    )
    print(f"{text} -> {translation}")


## Key Features

### Version Support
- **OceanBase 4.4.1+**: Full support for all AI functions
- **SeekDB**: Full support for all AI functions
- **Automatic version checking**: Validates database version on initialization

### Function Capabilities
- **AI_EMBED**: Convert text to high-dimensional vector embeddings
- **AI_COMPLETE**: Generate text using state-of-the-art LLMs
- **AI_RERANK**: Improve search result relevance with semantic reranking

### Error Handling
- Graceful handling of missing model configurations
- Clear error messages for unsupported database versions
- Fallback mechanisms for batch operations

### Performance
- Efficient batch processing for multiple texts
- Optimized SQL execution for AI function calls
- Support for concurrent operations


## API Reference

For detailed documentation of all OceanBaseAIFunctions methods and parameters, see the API reference:

### AI Functions
- `ai_embed()`: Convert text to vector embeddings
- `ai_complete()`: Generate text completions
- `ai_rerank()`: Rerank documents by relevance
- `batch_ai_embed()`: Batch process multiple texts

### Model Configuration and Query
- `create_ai_model()`: Create an AI model
- `drop_ai_model()`: Drop an AI model
- `create_ai_model_endpoint()`: Create a model endpoint
- `alter_ai_model_endpoint()`: Alter a model endpoint
- `drop_ai_model_endpoint()`: Drop a model endpoint
- `list_ai_models()`: List all configured AI models
- `list_ai_model_endpoints()`: List all configured AI model endpoints

## References

- [OceanBase AI Functions Documentation](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000004018305)
- [OceanBase AI Functions Guide](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000004018306)
