# Working with Voyage AI in Pixeltable

Pixeltable's Voyage AI integration enables you to access state-of-the-art embedding and reranker models via the Voyage AI API.

### Prerequisites
- A Voyage AI account with an API key (https://www.voyageai.com/)

### Important Notes

- Voyage AI usage may incur costs based on your Voyage AI plan.
- Be mindful of sensitive data and consider security measures when integrating with external services.


First you'll need to install required libraries and enter your Voyage AI API key.


In [None]:
%pip install -qU voyageai

In [None]:
import os
import getpass
if 'VOYAGEAI_API_KEY' not in os.environ:
    os.environ['VOYAGEAI_API_KEY'] = getpass.getpass('Enter your Voyage AI API key:')

Now let's create a Pixeltable directory to hold the tables for our demo.


In [None]:
import pixeltable as pxt

# Remove the 'voyageai_demo' directory and its contents, if it exists
pxt.drop_dir('voyageai_demo', force=True)
pxt.create_dir('voyageai_demo')

Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'voyageai_demo'.


<pixeltable.catalog.dir.Dir at 0x1690fe590>

## Text Embeddings

Voyage AI provides state-of-the-art embedding models for semantic search and RAG applications.


In [None]:
from pixeltable.functions import voyageai

# Create a table for document embeddings
docs_t = pxt.create_table('voyageai_demo.documents', {'text': pxt.String})

# Add computed column with Voyage embeddings
docs_t.add_computed_column(
    embedding=voyageai.embeddings(
        docs_t.text,
        model='voyage-3.5',
        input_type='document'
    )
)

Created table 'documents'.
Added 0 column values with 0 errors.


No rows affected.

In [None]:
# Insert some sample documents
documents = [
    "The Mediterranean diet emphasizes fish, olive oil, and vegetables, believed to reduce chronic diseases.",
    "Photosynthesis in plants converts light energy into glucose and produces essential oxygen.",
    "20th-century innovations, from radios to smartphones, centered on electronic advancements.",
    "Rivers provide water, irrigation, and habitat for aquatic species, vital for ecosystems.",
    "Apple's conference call to discuss fourth fiscal quarter results is scheduled for Thursday, November 2, 2023.",
    "Shakespeare's works, like 'Hamlet' and 'A Midsummer Night's Dream,' endure in literature."
]

docs_t.insert([{'text': doc} for doc in documents])

Inserting rows into `documents`: 6 rows [00:00, 734.30 rows/s]
Inserted 6 rows with 0 errors.


6 rows inserted, 12 values computed.

In [5]:
# View the embeddings
docs_t.select(docs_t.text, docs_t.embedding).head(3)

text,embedding
"The Mediterranean diet emphasizes fish, olive oil, and vegetables, believed to reduce chronic diseases.",[ 0.048 0.016 0.002 0.026 0.038 0.013 ... -0.015 -0.034 -0.016 0.007 0.046 -0.011]
Photosynthesis in plants converts light energy into glucose and produces essential oxygen.,[ 0.013 0.023 -0.004 0.052 0.037 0.022 ... -0.013 -0.042 0.001 0.008 -0.02 -0.016]
"20th-century innovations, from radios to smartphones, centered on electronic advancements.",[ 4.373e-03 4.474e-02 -6.796e-05 2.745e-02 4.904e-02 6.148e-03 ... -1.833e-02 -4.274e-02 -4.713e-03 -1.739e-02 -1.540e-03 -2.306e-02]


## Embedding Index for Similarity Search

You can use Voyage AI embeddings with Pixeltable's embedding index for efficient similarity search.


In [6]:
# Create a table with an embedding index
search_t = pxt.create_table('voyageai_demo.search', {'text': pxt.String})

# Add embedding index for similarity search
embed_fn = voyageai.embeddings.using(model='voyage-3.5', input_type='document')
search_t.add_embedding_index('text', string_embed=embed_fn)

# Insert documents
search_t.insert([{'text': doc} for doc in documents])

# Perform similarity search
sim = search_t.text.similarity("What are the health benefits of Mediterranean food?")
search_t.order_by(sim, asc=False).limit(3).select(search_t.text, score=sim)


Created table 'search'.
Inserting rows into `search`: 6 rows [00:00, 914.36 rows/s]
Inserted 6 rows with 0 errors.


Name,Type,Expression
text,String,text
score,Required[Float],text.similarity('What are the health benefits of Mediterranean food?')

0,1
From,search
Order By,text.similarity('What are the health benefits of Mediterranean food?') desc
Limit,3


## Reranking

Voyage AI's rerankers can refine search results by providing more accurate relevance scores.


In [7]:
# Create a table for reranking
rerank_t = pxt.create_table(
    'voyageai_demo.rerank',
    {'query': pxt.String, 'documents': pxt.Json}
)

# Add computed column with reranking results
rerank_t.add_computed_column(
    reranked=voyageai.rerank(
        rerank_t.query,
        rerank_t.documents,
        model='rerank-2.5',
        top_k=3
    )
)


Created table 'rerank'.
Added 0 column values with 0 errors.


No rows affected.

In [8]:
# Insert query and documents to rerank
rerank_t.insert([{
    'query': "When is Apple's conference call scheduled?",
    'documents': documents
}])


Inserting rows into `rerank`: 1 rows [00:00, 499.74 rows/s]
Inserted 1 row with 0 errors.


1 row inserted, 2 values computed.

In [9]:
# View reranking results
results = rerank_t.select(rerank_t.query, rerank_t.reranked).collect()
reranked_results = results[0]['reranked']['results']

print("Top 3 most relevant documents:\n")
for i, result in enumerate(reranked_results, 1):
    print(f"{i}. (Score: {result['relevance_score']:.3f})")
    print(f"   {result['document']}\n")


Top 3 most relevant documents:

1. (Score: 0.930)
   Apple's conference call to discuss fourth fiscal quarter results is scheduled for Thursday, November 2, 2023.

2. (Score: 0.283)
   20th-century innovations, from radios to smartphones, centered on electronic advancements.

3. (Score: 0.264)
   The Mediterranean diet emphasizes fish, olive oil, and vegetables, believed to reduce chronic diseases.



## Multimodal Embeddings

Voyage AI's multimodal embeddings can embed images with optional text captions.


In [None]:
# Create a table for multimodal embeddings
mm_t = pxt.create_table('voyageai_demo.multimodal', {'image': pxt.Image, 'caption': pxt.String}, if_exists='replace')

# Add computed column with multimodal embeddings
mm_t.add_computed_column(
    embedding=voyageai.multimodal_embed(
        mm_t.image,
        mm_t.caption,
        input_type='document'
    )
)

# Insert a sample image with caption
mm_t.insert([{
    'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000139.jpg',
    'caption': 'A person standing next to an elephant'
}])

mm_t.select(mm_t.caption, mm_t.embedding).head()


Error: Path 'voyageai_demo.multimodal' is an existing table

### Learn More

To learn more about RAG operations in Pixeltable, check out the [RAG Operations in Pixeltable](https://docs.pixeltable.com/notebooks/use-cases/rag-operations) tutorial.

For more information about Voyage AI models and features, visit:
- [Voyage AI Documentation](https://docs.voyageai.com/)
- [Text Embeddings](https://docs.voyageai.com/docs/embeddings)
- [Multimodal Embeddings](https://docs.voyageai.com/docs/multimodal-embeddings)
- [Rerankers](https://docs.voyageai.com/docs/reranker)

If you have any questions, don't hesitate to reach out.
