# Working with Twelve Labs in Pixeltable

Pixeltable's Twelve Labs integration enables you to create multimodal embeddings from text, images, audio, and video using the Twelve Labs API.

### Prerequisites

- A Twelve Labs account with an API key (https://playground.twelvelabs.io/)

### Important notes

- Twelve Labs usage may incur costs based on your Twelve Labs plan.
- Be mindful of sensitive data and consider security measures when integrating with external services.


First you'll need to install required libraries and enter your Twelve Labs API key.


In [None]:
%pip install -qU pixeltable twelvelabs

In [None]:
import os
import getpass

if 'TWELVELABS_API_KEY' not in os.environ:
    os.environ['TWELVELABS_API_KEY'] = getpass.getpass('Enter your Twelve Labs API key: ')

Now let's create a Pixeltable directory to hold the tables for our demo.


In [None]:
import pixeltable as pxt

# Remove the 'twelvelabs_demo' directory and its contents, if it exists
pxt.drop_dir('twelvelabs_demo', force=True)
pxt.create_dir('twelvelabs_demo')

Created directory 'twelvelabs_demo'.


<pixeltable.catalog.dir.Dir at 0x32561bd90>

## Text embeddings

Twelve Labs provides powerful multimodal embedding models. Let's start with text embeddings.


In [None]:
from pixeltable.functions import twelvelabs

# Create a table for text embeddings
text_t = pxt.create_table('twelvelabs_demo.text', {'input': pxt.String})

# Add computed column with Twelve Labs embeddings
text_t.add_computed_column(
    embedding=twelvelabs.embed(model_name='marengo3.0', text=text_t.input)
)

Created table 'text'.
Added 0 column values with 0 errors.


No rows affected.

In [None]:
# Insert sample text
text_t.insert([
    {'input': 'Twelve Labs provides multimodal embedding models.'},
    {'input': 'The model can embed text, images, audio, and video.'},
    {'input': 'Embeddings enable semantic search and retrieval.'}
])

Inserting rows into `text`: 3 rows [00:00, 1283.18 rows/s]
Inserted 3 rows with 0 errors.


3 rows inserted, 6 values computed.

In [None]:
# View the embeddings
text_t.select(text_t.input, text_t.embedding).head()

input,embedding
Twelve Labs provides multimodal embedding models.,[-0.086 -0.08 0.002 -0.065 0.011 0.022 ... -0.047 -0.108 -0.065 -0.023 0.034 -0.036]
"The model can embed text, images, audio, and video.",[-0.099 -0.052 -0.009 -0.062 0.019 -0.051 ... -0.045 -0.124 -0.04 -0.005 0.032 -0.006]
Embeddings enable semantic search and retrieval.,[-0.076 -0.019 0.03 -0.073 0.032 0.014 ... -0.075 -0.073 -0.071 -0.016 0.076 -0.021]


## Image embeddings

Twelve Labs can also create embeddings from images.


In [None]:
# Create a table for image embeddings
image_t = pxt.create_table('twelvelabs_demo.images', {'image': pxt.Image})

# Add computed column with image embeddings
image_t.add_computed_column(
    embedding=twelvelabs.embed(model_name='marengo3.0', image=image_t.image)
)

Created table 'images'.
Added 0 column values with 0 errors.


No rows affected.

In [None]:
# Insert sample images
image_t.insert([
    {'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000009.jpg'},
    {'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000025.jpg'}
])

Inserting rows into `images`: 2 rows [00:00, 1567.67 rows/s]
Inserted 2 rows with 0 errors.


2 rows inserted, 6 values computed.

In [None]:
# View the image embeddings
image_t.select(image_t.image, image_t.embedding).head()

image,embedding
,[-0.028 0.035 0.001 0.052 0.002 0.002 ... 0.046 0.016 -0.034 0.049 0.008 -0.032]
,[-0.009 0.02 -0.04 0.086 -0.053 0.031 ... 0.055 0.02 -0.027 0.028 0.039 -0.035]


## Text + Image embeddings

A unique feature of Twelve Labs is the ability to create combined text-image embeddings.


In [None]:
# Create a table for combined text+image embeddings
combined_t = pxt.create_table('twelvelabs_demo.combined', {
    'text': pxt.String,
    'image': pxt.Image
})

# Add computed column with combined text+image embeddings
combined_t.add_computed_column(
    embedding=twelvelabs.embed(
        model_name='marengo3.0',
        text=combined_t.text,
        image=combined_t.image
    )
)

Created table 'combined'.
Added 0 column values with 0 errors.


No rows affected.

In [None]:
# Insert sample data with both text and image
combined_t.insert([
    {
        'text': 'A cat sitting on a couch',
        'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000009.jpg'
    }
])

Inserting rows into `combined`: 1 rows [00:00, 626.48 rows/s]
Inserted 1 row with 0 errors.


1 row inserted, 4 values computed.

In [42]:
# View the combined embeddings
combined_t.select(combined_t.text, combined_t.image, combined_t.embedding).head()


text,image,embedding
A cat sitting on a couch,,[-0.057 -0.014 0.073 -0.037 0.021 0.025 ... -0.046 -0.019 0.015 -0.014 0.07 -0.028]


## Audio embeddings

Twelve Labs can create embeddings from audio content. Note that audio segments must be at least 4 seconds long.


In [None]:
# Create a table for audio
audio_base = pxt.create_table('twelvelabs_demo.audio_base', {'audio': pxt.Audio})

# Create a view that splits audio into chunks (minimum 4 seconds for Twelve Labs)
audio_v = pxt.create_view(
    'twelvelabs_demo.audio_chunks',
    audio_base,
    iterator=pxt.iterators.AudioSplitter.create(
        audio=audio_base.audio,
        chunk_duration_sec=5.0,
        min_chunk_duration_sec=4.0
    )
)

# Add embedding column
audio_v.add_computed_column(
    embedding=twelvelabs.embed(model_name='marengo3.0', audio=audio_v.audio_chunk)
)

# Insert sample audio
audio_base.insert([
    {'audio': 'https://github.com/pixeltable/pixeltable/raw/main/docs/resources/10-minute%20tour%20of%20Pixeltable.mp3'}
])

Created table 'audio_base'.
Added 0 column values with 0 errors.
Inserting rows into `audio_base`: 1 rows [00:00, 988.29 rows/s]
Inserting rows into `audio_chunks`: 60 rows [00:00, 11978.59 rows/s]
Inserted 61 rows with 0 errors.


61 rows inserted, 2 values computed.

In [None]:
# View audio chunk embeddings
audio_v.select(audio_v.audio_chunk, audio_v.embedding).head(3)

audio_chunk,embedding
,[-0.053 0.077 -0.025 0.002 -0.109 0.088 ... -0.005 -0.003 -0.077 0.015 0.039 -0.027]
,[-0.04 0.005 -0.008 -0.098 -0.119 0.044 ... -0.041 -0.05 -0.037 0.005 0.035 -0.018]
,[-0.011 0.059 0.015 -0.025 0.013 0.005 ... -0.058 0.014 -0.073 -0.031 0.008 -0.049]


## Video embeddings

Twelve Labs excels at video understanding. You can create embeddings from video segments.


In [None]:
# Create a table for video
video_base = pxt.create_table('twelvelabs_demo.video_base', {'video': pxt.Video})

# Create a view that splits video into segments
video_v = pxt.create_view(
    'twelvelabs_demo.video_segments',
    video_base,
    iterator=pxt.iterators.VideoSplitter.create(
        video=video_base.video,
        duration=5.0,
        min_segment_duration=4.0
    )
)

# Add embedding column for video segments
video_v.add_computed_column(
    embedding=twelvelabs.embed(model_name='marengo3.0', video=video_v.video_segment)
)

# Insert sample video
video_base.insert([
    {'video': 'https://github.com/pixeltable/pixeltable/raw/main/docs/resources/bangkok.mp4'}
])

Created table 'video_base'.
Added 0 column values with 0 errors.
Inserting rows into `video_base`: 1 rows [00:00, 695.46 rows/s]
Inserting rows into `video_segments`: 3 rows [00:00, 929.93 rows/s]
Inserted 4 rows with 0 errors.


4 rows inserted, 2 values computed.

In [None]:
# View video segment embeddings
video_v.select(video_v.video_segment, video_v.embedding).head(3)

video_segment,embedding
,[-0.011 0.025 0.053 0.054 -0.002 0.095 ... 0.077 0.012 0.005 0.032 -0.015 0.035]
,[-0.016 0.024 0.045 0.044 0.002 0.102 ... 0.078 0.008 -0.007 0.033 -0.024 0.028]
,[-0.013 0.019 0.048 0.051 -0.005 0.099 ... 0.079 0.008 0.002 0.03 -0.023 0.023]


## Available models

Twelve Labs provides several embedding models:

| Model | Embedding Dimension | Description |
|-------|---------------------|-------------|
| `marengo3.0` | 512 | Latest multimodal model |
| `Marengo-retrieval-2.7` | 1024 | Optimized for retrieval tasks |

Check the [Twelve Labs documentation](https://docs.twelvelabs.io/v1.3/docs/guides/create-embeddings) for the latest available models.


### Learn more

To learn more about advanced techniques like RAG operations in Pixeltable, check out the [RAG Operations in Pixeltable](https://docs.pixeltable.com/howto/use-cases/rag-operations) tutorial.

If you have any questions, don't hesitate to reach out.
