# 10-Minute Tour

Welcome to Pixeltable! In this tutorial, we'll survey how to create tables, populate them with data, and enhance them with built-in and user-defined transformations and AI operations.

## Install Python packages

First run the following command to install Pixeltable and related libraries needed for this tutorial.

In [None]:
%pip install -qU torch transformers openai pixeltable

## Creating a table

Let's begin by creating a `demo` directory (if it doesn't already exist) and a table that can hold image data, `demo/first`. The table will initially have just a single column to hold our input images, which we'll call `input_image`. We also need to specify a type for the column: `pxt.Image`.

In [1]:
import pixeltable as pxt

# Create the directory `demo`, dropping it first (if it exists)
# to ensure a clean environment.
pxt.drop_dir('demo', force=True)
pxt.create_dir('demo')

# Create the table `demo/first` with a single column `input_image`
t = pxt.create_table('demo/first', {'input_image': pxt.Image})

Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory 'demo'.
Created table 'first'.


We can use `t.describe()` to examine the table schema. We see that it now contains a single column, as expected.

In [2]:
t.describe()

<style type="text/css">
#T_7cd98_row0_col0 {
  white-space: pre-wrap;
  text-align: left;
  font-weight: bold;
}
</style>
<table id="T_7cd98">
  <thead>
  </thead>
  <tbody>
    <tr>
      <td id="T_7cd98_row0_col0" class="data row0 col0" >table 'demo/first'</td>
    </tr>
  </tbody>
</table>

<style type="text/css">
#T_34a9f th {
  text-align: left;
}
#T_34a9f_row0_col0, #T_34a9f_row0_col1, #T_34a9f_row0_col2 {
  white-space: pre-wrap;
  text-align: left;
}
</style>
<table id="T_34a9f">
  <thead>
    <tr>
      <th id="T_34a9f_level0_col0" class="col_heading level0 col0" >Column Name</th>
      <th id="T_34a9f_level0_col1" class="col_heading level0 col1" >Type</th>
      <th id="T_34a9f_level0_col2" class="col_heading level0 col2" >Computed With</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td id="T_34a9f_row0_col0" class="data row0 col0" >input_image</td>
      <td id="T_34a9f_row0_col1" class="data row0 col1" >Image</td>
      <td id="T_34a9f_row0_col2" class="data row0 col2" ></td>
    </tr>
  </tbody>
</table>


The new table is initially empty, with no rows:

In [3]:
t.count()

0

Now let's put an image into it! We can add images simply by giving Pixeltable their URLs. The example images in this demo come from the [COCO dataset](https://cocodataset.org/), and we'll be referencing copies of them in the Pixeltable github repo. But in practice, the images can come from anywhere: an S3 bucket, say, or the local file system.

When we add the image, we see that Pixeltable gives us some useful status updates indicating that the operation was successful.

In [4]:
t.insert(
    [
        {
            'input_image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000025.jpg'
        }
    ]
)

Inserted 1 row with 0 errors in 0.21 s (4.86 rows/s)


1 row inserted.

We can use `t.head()` to examine the contents of the table.

In [5]:
t.head()

input_image


## Adding computed columns

Great! Now we have a table containing some data. Let's add an object detection model to our workflow. Specifically, we're going to use the ResNet-50 object detection model, which runs using the Huggingface DETR ("DEtection TRansformer") model class. Pixeltable contains a built-in adapter for this model family, so all we have to do is call the `detr_for_object_detection` Pixeltable function. A nice thing about the Huggingface models is that they run locally, so you don't need an account with a service provider in order to use them.

This is our first example of a __computed column__, a key concept in Pixeltable. Recall that when we created the `input_image` column, we specified a type, `ImageType`, indicating our intent to populate it with data in the future. When we create a _computed_ column, we instead specify a function that operates on other columns of the table. By default, when we add the new computed column, Pixeltable immediately evaluates it against all existing data in the table - in this case, by calling the `detr_for_object_detection` function on the image.

Depending on your setup, it may take a minute for the function to execute. In the background, Pixeltable is downloading the model from Huggingface (if necessary), instantiating it, and caching it for later use.

In [6]:
from pixeltable.functions import huggingface

t.add_computed_column(
    detections=huggingface.detr_for_object_detection(
        t.input_image, model_id='facebook/detr-resnet-50'
    )
)

Added 1 column value with 0 errors in 3.26 s (0.31 rows/s)


1 row updated.

Let's examine the results.

In [7]:
t.head()

input_image,detections
,"{""boxes"": [[51.942, 356.174, 181.481, 413.975], [383.225, 58.66, 605.64, 361.346]], ""labels"": [25, 25], ""scores"": [0.99, 0.999], ""label_text"": [""giraffe"", ""giraffe""]}"


We see that the model returned a JSON structure containing a lot of information. In particular, it has the following fields:

- `label_text`: Descriptions of the objects detected
- `boxes`: Bounding boxes for each detected object
- `scores`: Confidence scores for each detection
- `labels`: The DETR model's internal IDs for the detected objects

Perhaps this is more than we need, and all we really want are the text labels. We could add another computed column to extract `label_text` from the JSON struct:

In [8]:
t.add_computed_column(detections_text=t.detections.label_text)
t.head()

input_image,detections,detections_text
,"{""boxes"": [[51.942, 356.174, 181.481, 413.975], [383.225, 58.66, 605.64, 361.346]], ""labels"": [25, 25], ""scores"": [0.99, 0.999], ""label_text"": [""giraffe"", ""giraffe""]}","[""giraffe"", ""giraffe""]"


If we inspect the table schema now, we see how Pixeltable distinguishes between ordinary and computed columns.

In [9]:
t.describe()

<style type="text/css">
#T_d5455_row0_col0 {
  white-space: pre-wrap;
  text-align: left;
  font-weight: bold;
}
</style>
<table id="T_d5455">
  <thead>
  </thead>
  <tbody>
    <tr>
      <td id="T_d5455_row0_col0" class="data row0 col0" >table 'demo/first'</td>
    </tr>
  </tbody>
</table>

<style type="text/css">
#T_9c797 th {
  text-align: left;
}
#T_9c797_row0_col0, #T_9c797_row0_col1, #T_9c797_row0_col2, #T_9c797_row1_col0, #T_9c797_row1_col1, #T_9c797_row1_col2, #T_9c797_row2_col0, #T_9c797_row2_col1, #T_9c797_row2_col2 {
  white-space: pre-wrap;
  text-align: left;
}
</style>
<table id="T_9c797">
  <thead>
    <tr>
      <th id="T_9c797_level0_col0" class="col_heading level0 col0" >Column Name</th>
      <th id="T_9c797_level0_col1" class="col_heading level0 col1" >Type</th>
      <th id="T_9c797_level0_col2" class="col_heading level0 col2" >Computed With</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td id="T_9c797_row0_col0" class="data row0 col0" >input_image</td>
      <td id="T_9c797_row0_col1" class="data row0 col1" >Image</td>
      <td id="T_9c797_row0_col2" class="data row0 col2" ></td>
    </tr>
    <tr>
      <td id="T_9c797_row1_col0" class="data row1 col0" >detections</td>
      <td id="T_9c797_row1_col1" class="data row1 col1" >Json</td>
      <td id="T_9c797_row1_col2" class="data row1 col2" >detr_for_object_detection(input_image, model_id='facebook/detr-resnet-50')</td>
    </tr>
    <tr>
      <td id="T_9c797_row2_col0" class="data row2 col0" >detections_text</td>
      <td id="T_9c797_row2_col1" class="data row2 col1" >Json</td>
      <td id="T_9c797_row2_col2" class="data row2 col2" >detections.label_text</td>
    </tr>
  </tbody>
</table>


Now let's add some more images to our table. This demonstrates another important feature of computed columns: by default, they update incrementally any time new data shows up on their inputs. In this case, Pixeltable will run the ResNet-50 model against each new image that is added, then extract the labels into the `detect_text` column. Pixeltable will orchestrate the execution of any sequence (or DAG) of computed columns.

Note how we can pass multiple rows to `t.insert` with a single statement, which will insert them more efficiently.

In [10]:
more_images = [
    'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000030.jpg',
    'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000034.jpg',
    'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000042.jpg',
    'https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/resources/images/000000000061.jpg',
]
t.insert({'input_image': image} for image in more_images)

Inserted 4 rows with 0 errors in 1.51 s (2.65 rows/s)


4 rows inserted.

Let's see what the model came up with. We'll use `t.select` to suppress the display of the `detect` column, since right now we're only interested in the text labels.

In [11]:
t.select(t.input_image, t.detections_text).head()

input_image,detections_text
,"[""giraffe"", ""giraffe""]"
,"[""vase"", ""potted plant""]"
,"[""zebra""]"
,"[""dog"", ""dog""]"
,"[""person"", ""person"", ""bench"", ""person"", ""elephant"", ""elephant"", ""person""]"


## Pixeltable is persistent

An important feature of Pixeltable is that _everything is persistent_. Unlike in-memory Python libraries such as Pandas, Pixeltable is a database: all your data, transformations, and computed columns are stored and preserved between sessions. To see this, let's clear all the variables in our notebook and start fresh. You can optionally restart your notebook kernel at this point, to demonstrate how Pixeltable data persists across sessions.

In [12]:
# Clear all variables in the notebook
%reset -f

# Instantiate a new client object
import pixeltable as pxt

t = pxt.get_table('demo/first')

# Display just the first two rows, to avoid cluttering the tutorial
t.select(t.input_image, t.detections_text).head(2)

input_image,detections_text
,"[""giraffe"", ""giraffe""]"
,"[""vase"", ""potted plant""]"


## GPT-4o

For comparison, let's try running our examples through a generative model, Open AI's `gpt-4o-mini`. For this section, you'll need an OpenAI account with an API key. You can use the following command to add your API key to the environment (just enter your API key when prompted):

In [None]:
import getpass
import os

if 'OPENAI_API_KEY' not in os.environ:
    os.environ['OPENAI_API_KEY'] = getpass.getpass(
        'Enter your OpenAI API key:'
    )

Now we can connect to OpenAI through Pixeltable. This may take some time, depending on how long OpenAI takes to process the query.

In [13]:
from pixeltable.functions import openai

# Construct a message dict for OpenAI. It follows the same pattern
# as the OpenAI SDK, except that in place of an image URL, we can
# put a reference to our image column, and Pixeltable will do the
# substitution once for each row of the table.

messages = [
    {
        'role': 'user',
        'content': [
            {'type': 'text', 'text': "What's in this image?"},
            {'type': 'image_url', 'image_url': t.input_image},
        ]
    }
]

t.add_computed_column(
    vision=openai.chat_completions(
        messages, model='gpt-4o-mini'
    )
)

Added 5 column values with 0 errors in 6.98 s (0.72 rows/s)


5 rows updated.

Let's see how GPT-4's responses compare to the traditional discriminative (DETR) model.

In [14]:
t.select(t.input_image, t.detections_text, t.vision).head()

input_image,detections_text,vision
,"[""giraffe"", ""giraffe""]","{""id"": ""chatcmpl-DCYw7EsCy0gWSmikqyiT89Z7iABX4"", ""model"": ""gpt-4o-mini-2024-07-18"", ""usage"": {""total_tokens"": 14238, ""prompt_tokens"": 14179, ""completion_tokens"": 59, ""prompt_tokens_details"": {""audio_tokens"": 0, ""cached_tokens"": 0}, ""completion_tokens_details"": {""audio_tokens"": 0, ""reasoning_tokens"": 0, ""accepted_prediction_tokens"": 0, ""rejected_prediction_tokens"": 0}}, ""object"": ""chat.completion"", ""choices"": [{""index"": 0, ""message"": {""role"": ""assistant"", ""content"": ""The image shows two giraffes in a natural setting. One giraffe is closer to the camera, standing upright and likely reaching for leaves on a tree, ...... ther giraffe is further back and partially obscured. There are trees and greenery surrounding them, suggesting an open or safari-like environment."", ""refusal"": null, ""annotations"": []}, ""logprobs"": null, ""finish_reason"": ""stop""}], ""created"": 1771886603, ""service_tier"": ""default"", ""system_fingerprint"": ""fp_0a8a757e2a""}"
,"[""vase"", ""potted plant""]","{""id"": ""chatcmpl-DCYwA0flHt6LKtKnUgWfQSsHFnRmO"", ""model"": ""gpt-4o-mini-2024-07-18"", ""usage"": {""total_tokens"": 14248, ""prompt_tokens"": 14179, ""completion_tokens"": 69, ""prompt_tokens_details"": {""audio_tokens"": 0, ""cached_tokens"": 0}, ""completion_tokens_details"": {""audio_tokens"": 0, ""reasoning_tokens"": 0, ""accepted_prediction_tokens"": 0, ""rejected_prediction_tokens"": 0}}, ""object"": ""chat.completion"", ""choices"": [{""index"": 0, ""message"": {""role"": ""assistant"", ""content"": ""The image features a decorative white vase or urn that contains a vibrant arrangement of flowers. The bouquet includes various types of flowers, p ...... railing in a setting that appears to be outdoors, surrounded by soft, blurred greenery in the background, suggesting a garden or lush environment."", ""refusal"": null, ""annotations"": []}, ""logprobs"": null, ""finish_reason"": ""stop""}], ""created"": 1771886606, ""service_tier"": ""default"", ""system_fingerprint"": ""fp_373a14eb6f""}"
,"[""zebra""]","{""id"": ""chatcmpl-DCYwAq4XiJOFLgafcdlTeoQ4FExaT"", ""model"": ""gpt-4o-mini-2024-07-18"", ""usage"": {""total_tokens"": 14217, ""prompt_tokens"": 14179, ""completion_tokens"": 38, ""prompt_tokens_details"": {""audio_tokens"": 0, ""cached_tokens"": 0}, ""completion_tokens_details"": {""audio_tokens"": 0, ""reasoning_tokens"": 0, ""accepted_prediction_tokens"": 0, ""rejected_prediction_tokens"": 0}}, ""object"": ""chat.completion"", ""choices"": [{""index"": 0, ""message"": {""role"": ""assistant"", ""content"": ""The image shows a zebra grazing on green grass. The zebra has distinctive black and white stripes and is depicted in a natural setting. The background is mostly green, indicating a grassland environment."", ""refusal"": null, ""annotations"": []}, ""logprobs"": null, ""finish_reason"": ""stop""}], ""created"": 1771886606, ""service_tier"": ""default"", ""system_fingerprint"": ""fp_0a8a757e2a""}"
,"[""dog"", ""dog""]","{""id"": ""chatcmpl-DCYwAfDVom1lhA4mRuZxIFulvEfrv"", ""model"": ""gpt-4o-mini-2024-07-18"", ""usage"": {""total_tokens"": 14231, ""prompt_tokens"": 14179, ""completion_tokens"": 52, ""prompt_tokens_details"": {""audio_tokens"": 0, ""cached_tokens"": 0}, ""completion_tokens_details"": {""audio_tokens"": 0, ""reasoning_tokens"": 0, ""accepted_prediction_tokens"": 0, ""rejected_prediction_tokens"": 0}}, ""object"": ""chat.completion"", ""choices"": [{""index"": 0, ""message"": {""role"": ""assistant"", ""content"": ""The image shows a collection of shoes and flip-flops on a shoe rack, with a dog resting on top of them. The dog appears to have a curly coat and is nestled among the footwear. The background features a wall and part of the shoe rack."", ""refusal"": null, ""annotations"": []}, ""logprobs"": null, ""finish_reason"": ""stop""}], ""created"": 1771886606, ""service_tier"": ""default"", ""system_fingerprint"": ""fp_373a14eb6f""}"
,"[""person"", ""person"", ""bench"", ""person"", ""elephant"", ""elephant"", ""person""]","{""id"": ""chatcmpl-DCYwAzHazyDJBylIIFQogDU4AGJAq"", ""model"": ""gpt-4o-mini-2024-07-18"", ""usage"": {""total_tokens"": 14234, ""prompt_tokens"": 14179, ""completion_tokens"": 55, ""prompt_tokens_details"": {""audio_tokens"": 0, ""cached_tokens"": 0}, ""completion_tokens_details"": {""audio_tokens"": 0, ""reasoning_tokens"": 0, ""accepted_prediction_tokens"": 0, ""rejected_prediction_tokens"": 0}}, ""object"": ""chat.completion"", ""choices"": [{""index"": 0, ""message"": {""role"": ""assistant"", ""content"": ""The image depicts a dense jungle scene with two elephants carrying riders, who appear to be exploring the lush greenery. The surrounding vegetatio ...... l trees, creating a vibrant natural setting. The atmosphere suggests a tropical or subtropical environment, typical of areas rich in biodiversity."", ""refusal"": null, ""annotations"": []}, ""logprobs"": null, ""finish_reason"": ""stop""}], ""created"": 1771886606, ""service_tier"": ""default"", ""system_fingerprint"": ""fp_0a8a757e2a""}"


It looks like OpenAI returned a whole range of context information along with the image descriptions. Let's pluck out just the response content from inside those JSON structures, so that it's easier to see in the table. Note that we can unpack JSON columns in Pixeltable the same way we would with ordinary Python dicts and lists.

In [15]:
t.select(t.input_image, t.detections_text, t.vision['choices'][0]['message']['content']).head()

input_image,detections_text,vision_choices0_message_content
,"[""giraffe"", ""giraffe""]","The image shows two giraffes in a natural setting. One giraffe is closer to the camera, standing upright and likely reaching for leaves on a tree, while another giraffe is further back and partially obscured. There are trees and greenery surrounding them, suggesting an open or safari-like environment."
,"[""vase"", ""potted plant""]","The image features a decorative white vase or urn that contains a vibrant arrangement of flowers. The bouquet includes various types of flowers, predominantly shades of pink and white, with some greenery. The vase is placed on a railing in a setting that appears to be outdoors, surrounded by soft, blurred greenery in the background, suggesting a garden or lush environment."
,"[""zebra""]","The image shows a zebra grazing on green grass. The zebra has distinctive black and white stripes and is depicted in a natural setting. The background is mostly green, indicating a grassland environment."
,"[""dog"", ""dog""]","The image shows a collection of shoes and flip-flops on a shoe rack, with a dog resting on top of them. The dog appears to have a curly coat and is nestled among the footwear. The background features a wall and part of the shoe rack."
,"[""person"", ""person"", ""bench"", ""person"", ""elephant"", ""elephant"", ""person""]","The image depicts a dense jungle scene with two elephants carrying riders, who appear to be exploring the lush greenery. The surrounding vegetation includes thick foliage and tall trees, creating a vibrant natural setting. The atmosphere suggests a tropical or subtropical environment, typical of areas rich in biodiversity."


In addition to adapters for local models and inference APIs, Pixeltable can perform a range of more basic image operations. These image operations can be seamlessly chained with API calls, and Pixeltable will keep track of the sequence of operations, constructing new images and caching when necessary to keep things running smoothly. Just for fun (and to demonstrate the power of computed columns), let's see what OpenAI thinks of our sample images when we rotate them by 180 degrees.

In [16]:
t.add_computed_column(rot_image=t.input_image.rotate(180))

# This is identical to the preceding messages dict, but with
# `t.rot_image` in place of `t.input_image`.

messages = [
    {
        'role': 'user',
        'content': [
            {'type': 'text', 'text': "What's in this image?"},
            {'type': 'image_url', 'image_url': t.rot_image},
        ]
    }
]

t.add_computed_column(
    rot_vision=openai.chat_completions(
        messages, model='gpt-4o-mini'
    )
)

Added 5 column values with 0 errors in 6.19 s (0.81 rows/s)


5 rows updated.

In [17]:
t.select(t.rot_image, t.rot_vision['choices'][0]['message']['content']).head()

rot_image,rotvision_choices0_message_content
,"The image features two giraffes in a natural setting. One giraffe is prominently displayed in the foreground, with its distinctive long neck and spotted coat. The background consists of greenery and trees, creating a serene and picturesque scene."
,"The image shows a white decorative vase hanging upside down, adorned with a bouquet of flowers. The bouquet features various types of flowers, including pink and white blooms, adding a vibrant and decorative touch. The background appears to be a lush, green area, suggesting a natural setting."
,"The image contains a zebra lying on grass. The zebra is characterized by its distinctive black and white stripes, and it appears to be in a natural outdoor setting."
,"The image appears to show a small, fluffy dog lying among various pairs of shoes on a shelf. The shoes include sneakers and sandals. The setting looks like an indoor space, possibly near an entrance."
,"The image appears to depict a lush, green landscape, likely in a forest or jungle setting. There are various plants and foliage, creating a dense environment. Some objects or individuals are visible, but it's difficult to make out specific details due to the overwhelming greenery. The overall scene conveys a sense of nature and wilderness."


## UDFs: Enhancing Pixeltable's capabilities

Another important principle of Pixeltable is that, although Pixeltable has a built-in library of useful operations and adapters, it will never prescribe a particular way of doing things. Pixeltable is built from the ground up to be extensible.

Let's take a specific example. Recall our use of the ResNet-50 detection model, in which the `detect` column contains a JSON blob with bounding boxes, scores, and labels. Suppose we want to create a column containing the single label with the highest confidence score. There's no built-in Pixeltable function to do this, but it's easy to write our own. In fact, all we have to do is write a Python function that does the thing we want, and mark it with the `@pxt.udf` decorator.

In [18]:
@pxt.udf
def top_detection(detect: dict) -> str:
    scores = detect['scores']
    label_text = detect['label_text']
    # Get the index of the object with the highest confidence
    i = scores.index(max(scores))
    # Return the corresponding label
    return label_text[i]

In [19]:
t.add_computed_column(top=top_detection(t.detections))

Added 5 column values with 0 errors in 0.11 s (45.52 rows/s)


5 rows updated.

In [20]:
t.select(t.detections_text, t.top).show()

detections_text,top
"[""person"", ""person"", ""bench"", ""person"", ""elephant"", ""elephant"", ""person""]",elephant
"[""zebra""]",zebra
"[""giraffe"", ""giraffe""]",giraffe
"[""dog"", ""dog""]",dog
"[""vase"", ""potted plant""]",vase


Congratulations! You've reached the end of the tutorial. Hopefully, this gives a good overview of the capabilities of Pixeltable, but there's much more to explore. As a next step, you might check out one of the other tutorials, depending on your interests:

- [Object Detection in Videos](https://docs.pixeltable.com/howto/use-cases/object-detection-in-videos)
- [RAG Operations in Pixeltable](https://docs.pixeltable.com/howto/use-cases/rag-operations)
- [Working with OpenAI in Pixeltable](https://docs.pixeltable.com/howto/providers/working-with-openai)