[![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://github.com/pixeltable/pixeltable/blob/master/docs/tutorials/pixeltable-basics.ipynb)&nbsp;&nbsp;
<a target="_blank" href="https://colab.research.google.com/github/pixeltable/pixeltable/blob/master/docs/tutorials/pixeltable-basics.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Pixeltable Basics

Welcome to Pixeltable! In this tutorial, we'll survey how to create tables, populate them with data, and enhance them with built-in and user-defined transformations and AI operations.

If you want to follow along with this tutorial interactively, there are two ways to go.
- Use a Kaggle or Colab container (easiest): Click on one of the badges above.
- Locally in a self-managed Python environment: You'll probably want to create your own empty notebook, then copy-paste each command from the website. Be sure your Jupyter kernel is running in a Python virtual environment; you can check out the [Getting Started with Pixeltable](https://pixeltable.github.io/pixeltable/getting-started/) guide for step-by-step instructions.

## Install Python Packages

First run the following command to install Pixeltable and related libraries needed for this tutorial.

In [None]:
%pip install torch transformers openai pixeltable

## Creating a Table

Let's begin by creating a table that can hold image data. The first thing to do is to instantiate a Pixeltable client.

In [2]:
import pixeltable as pxt
cl = pxt.Client()

Connected to Pixeltable database at: postgresql://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata


Next we create a namespace (if it doesn't already exist) along with our new table, `demo.first`. The table will initially have just a single column to hold our input images, which we'll call `input_image`. We also need to specify a type for the column: `pxt.ImageType()`.

In [3]:
# Create the namespace `demo` (if it doesn't already exist)
cl.create_dir('demo', ignore_errors=True)

# Create the table `demo.first` with a single column `input_image`
t = cl.create_table('demo.first', {'input_image': pxt.ImageType()})

Created directory `demo`.
Created table `first`.


We can use `t.describe()` to examine the table schema. We see that it now contains a single column, as expected:

In [4]:
t.describe()

Column Name,Type,Computed With
input_image,image,


The new table is initially empty, with no rows:

In [5]:
t.count()

0

Now let's put an image into it! We can add images simply by giving Pixeltable their URLs. The example images in this demo come from the [COCO dataset](https://cocodataset.org/), and we'll be referencing copies of them in the Pixeltable github repo. But in practice, the images can come from anywhere: an S3 bucket, say, or the local file system.

When we add the image, we see that Pixeltable gives us some useful status updates indicating that the operation was successful.

In [6]:
t.insert(input_image='https://raw.github.com/pixeltable/pixeltable/master/docs/source/data/images/000000000009.jpg')

Inserting rows into `first`: 1 rows [00:00, 144.95 rows/s]
Inserted 1 row with 0 errors.


UpdateStatus(num_rows=1, num_computed_values=0, num_excs=0, updated_cols=[], cols_with_excs=[])

We can use `t.show()` to examine the contents of the table.

In [7]:
t.show()

input_image


## Adding Computed Columns

Great! Now we have a table containing some data. Let's add an object detection model to our workflow. Specifically, we're going to use the ResNet-50 object detection model, which runs using the Huggingface DETR ("DEtection TRansformer") model class. Pixeltable contains a built-in adapter for this model family, so all we have to do is call the `detr_for_object_detection` Pixeltable function. A nice thing about the Huggingface models is that they run locally, so you don't need an account with a service provider in order to use them.

This is our first example of a __computed column__, a key concept in Pixeltable. Recall that when we created the `input_image` column, we specified a type, `ImageType`, indicating our intent to populate it with data in the future. When we create a _computed_ column, we instead specify a function that operates on other columns of the table. By default, when we add the new computed column, Pixeltable immediately evaluates it against all existing data in the table - in this case, by calling the `detr_for_object_detection` function on the image.

Depending on your setup, it may take a minute for the function to execute. In the background, Pixeltable is downloading the model from Huggingface (if necessary), instantiating it, and caching it for later use.

In [8]:
from pixeltable.functions import huggingface
t['detect'] = huggingface.detr_for_object_detection(t.input_image, model_id='facebook/detr-resnet-50')

Added column `detect` to table `first`.
Computing cells: 100%|████████████████████████████████████████████| 1/1 [00:05<00:00,  5.48s/ cells]
Added 1 column value with 0 errors.


Let's examine the results.

In [9]:
t.show()

input_image,detect
,"{'boxes': [[0.5802387595176697, 0.08485905826091766, 0.7063707709312439, 0.18905319273471832], [0.7233275175094604, 0.08670079708099365, 0.8174974918365479, 0.18644198775291443], [0.0006367266178131104, 0.027185142040252686, 0.6707271337509155, 0.8104317784309387], [-1.1205673217773438e-05, 0.39773815870285034, 0.9554940462112427, 0.9870153069496155], [0.5725255608558655, 0.0005173087120056152, 0.7135650515556335, 0.14135970175266266], [0.40589094161987305, 0.48307526111602783, 0.8861547708511353, 0.9869147539138794], [0.6049936413764954, 0.14735108613967896, 0.7330133318901062, 0.2990023195743561], [0.48979803919792175, 0.0009566247463226318, 0.9874167442321777, 0.5126445293426514]], 'labels': [55, 55, 51, 51, 55, 56, 55, 51], 'scores': [0.9640840291976929, 0.9740563035011292, 0.9654219150543213, 0.9887107014656067, 0.9860836863517761, 0.9976664781570435, 0.9637528657913208, 0.9985143542289734], 'label_text': ['orange', 'orange', 'bowl', 'bowl', 'orange', 'broccoli', 'orange', 'bowl']}"


We see that the model returned a JSON struct containing a lot of information. In particular, it has the following fields:
- `label_text`: Descriptions of the objects detected
- `boxes`: Bounding boxes for each detected object
- `scores`: Confidence scores for each detection
- `labels`: The DETR model's internal IDs for the detected objects

Perhaps this is more than we need, and all we really want are the text labels. We could add another computed column to extract `label_text` from the JSON struct:

In [10]:
t['detect_text'] = t.detect.label_text
t.show()

Added column `detect_text` to table `first`.
Computing cells: 100%|███████████████████████████████████████████| 1/1 [00:00<00:00, 246.25 cells/s]
Added 1 column value with 0 errors.


input_image,detect,detect_text
,"{'boxes': [[0.5802387595176697, 0.08485905826091766, 0.7063707709312439, 0.18905319273471832], [0.7233275175094604, 0.08670079708099365, 0.8174974918365479, 0.18644198775291443], [0.0006367266178131104, 0.027185142040252686, 0.6707271337509155, 0.8104317784309387], [-1.1205673217773438e-05, 0.39773815870285034, 0.9554940462112427, 0.9870153069496155], [0.5725255608558655, 0.0005173087120056152, 0.7135650515556335, 0.14135970175266266], [0.40589094161987305, 0.48307526111602783, 0.8861547708511353, 0.9869147539138794], [0.6049936413764954, 0.14735108613967896, 0.7330133318901062, 0.2990023195743561], [0.48979803919792175, 0.0009566247463226318, 0.9874167442321777, 0.5126445293426514]], 'labels': [55, 55, 51, 51, 55, 56, 55, 51], 'scores': [0.9640840291976929, 0.9740563035011292, 0.9654219150543213, 0.9887107014656067, 0.9860836863517761, 0.9976664781570435, 0.9637528657913208, 0.9985143542289734], 'label_text': ['orange', 'orange', 'bowl', 'bowl', 'orange', 'broccoli', 'orange', 'bowl']}","[orange, orange, bowl, bowl, orange, broccoli, orange, bowl]"


If we inspect the table schema now, we see how Pixeltable distinguishes between ordinary and computed columns.

In [11]:
t.describe()

Column Name,Type,Computed With
input_image,image,
detect,json,"huggingface.detr_for_object_detection(input_image, model_id='facebook/detr-resnet-50')"
detect_text,json,detect.label_text


Now let's add some more images to our table. This demonstrates another important feature of computed columns: by default, they update incrementally any time new data becomes available upstream. In this case, Pixeltable will run the ResNet-50 model against each new image that is added, then extract the labels into the `detect_text` column. Pixeltable will orchestrate the execution of any sequence (or DAG) of computed columns.

Note how we can pass multiple rows to `t.insert` with a single statement, which will insert them more efficiently.

In [12]:
more_images = [
    'https://raw.github.com/pixeltable/pixeltable/master/docs/source/data/images/000000000025.jpg',
    'https://raw.github.com/pixeltable/pixeltable/master/docs/source/data/images/000000000030.jpg',
    'https://raw.github.com/pixeltable/pixeltable/master/docs/source/data/images/000000000034.jpg',
    'https://raw.github.com/pixeltable/pixeltable/master/docs/source/data/images/000000000042.jpg',
    'https://raw.github.com/pixeltable/pixeltable/master/docs/source/data/images/000000000061.jpg'
]
t.insert({'input_image': image} for image in more_images)

Computing cells:  50%|█████████████████████▌                     | 5/10 [00:10<00:10,  2.07s/ cells]
Inserting rows into `first`: 5 rows [00:00, 1580.73 rows/s]
Computing cells: 100%|██████████████████████████████████████████| 10/10 [00:10<00:00,  1.04s/ cells]
Inserted 5 rows with 0 errors.


UpdateStatus(num_rows=5, num_computed_values=10, num_excs=0, updated_cols=[], cols_with_excs=[])

Let's see what the model came up with. We'll use `t.select` to suppress the display of the `detect` column, since right now we're only interested in the text labels.

In [13]:
t.select(t.input_image, t.detect_text).show()

input_image,detect_text
,"[orange, orange, bowl, bowl, orange, broccoli, orange, bowl]"
,"[giraffe, giraffe]"
,"[vase, potted plant]"
,[zebra]
,"[dog, dog]"
,"[person, person, bench, person, elephant, elephant, person]"


## Pixeltable Is Persistent

An important feature of Pixeltable is that _everything is persistent_. Unlike in-memory Python libraries such as Pandas, Pixeltable is a database: all your data, transformations, and computed columns are stored and preserved between sessions. To see this, let's clear all the variables in our notebook and start fresh with a new client.

In [14]:
# Clear all variables in the notebook
%reset -f

# Instantiate a new client object
import pixeltable as pxt
cl = pxt.Client()
t = cl.get_table('demo.first')

# Display just the first two rows, to avoid cluttering the tutorial
t.select(t.input_image, t.detect_text).show(2)

input_image,detect_text
,"[orange, orange, bowl, bowl, orange, broccoli, orange, bowl]"
,"[giraffe, giraffe]"


## GPT-4 Vision

For comparison, let's try running our examples through a generative model, Open AI's GPT-4 Vision. For this section, you'll need an OpenAI account with an API key. You can use the following command to add your API key to the environment (replace the string `my_api_key` with your API key).
<!--
Create a file `~/.pixeltable/config.yaml` with the following structure:
```
openai:
  api_key: <my_api_key>
```
Replace the text `<my_api_key>` with your API key. Then restart the kernel again (Kernel / Restart Kernel) to ensure Pixeltable picks up the API key, and issue the following commands. Be patient; it sometimes takes time for OpenAI to fulfill the queries.
-->

In [None]:
import os
os.environ['OPENAI_API_KEY'] = 'my_api_key'

Now we can connect to OpenAI through Pixeltable.

In [2]:
from pixeltable.functions import openai
t['vision'] = openai.vision(prompt="Describe what's in this image.", image=t.input_image)

Computing cells: 100%|████████████████████████████████████████████| 6/6 [01:08<00:00, 11.46s/ cells]
Added 6 column values with 0 errors.


Let's see how GPT-4's responses compare to the traditional discriminative (DETR) model.

In [3]:
t.select(t.input_image, t.detect_text, t.vision).show()

input_image,detect_text,vision
,"[giraffe, giraffe]","The image features two giraffes in what appears to be a natural habitat enclosure, likely in a zoo or wildlife park. The closest giraffe is standing upright and seems to be either feeding on leaves from a tree or possibly using its long neck to reach for something higher up. The distinct pattern of spots and the elongated neck immediately identify these animals as giraffes. In the background, trees and foliage suggesting a well-vegetated area can be seen, creating a serene environment. There's no visible fencing immediately around the giraffes, which gives the scene a more natural look, although safety measures are likely in place but not captured in the photo."
,"[orange, orange, bowl, bowl, orange, broccoli, orange, bowl]","The image depicts a colorful segmented lunch container filled with an assortment of foods. \n\nTop left compartment: It has a section with what looks like a piece of bread with butter on it and some almonds. \n\nTop right compartment: This section contains what appears to be canned or fresh pineapple chunks.\n\nBottom left compartment: This portion of the lunch box seems to contain some form of meatball or veggie ball alongside what could be a serving of condiment or dip, likely tomato-based given its color. \n\nBottom right compartment: There's a portion of steamed or blanched green vegetable, which looks like broccoli.\n\nForeground/bottom of the image: It features what seem to be chocolate chip cookies, suggesting a dessert option included in the meal.\n\nOverall, the lunch box seems to offer a balanced meal with components from various food groups, suggesting an emphasis on nutrition. The vibrant colors of the containers add a playful and inviting touch to the presentation of the food."
,"[vase, potted plant]","This is an image of a bouquet of flowers arranged in a white, decorative vase. The vase is placed on a white surface, which appears to be a ledge or railing, given the cast shadows suggesting outdoor lighting. The flowers within the vase are a mix of colors, primarily white with some deep pink or red accents, and there are a variety of blooms and foliage types contributing to the overall arrangement. The background of the image is out of focus, but it shows a bright, sunny day with greenery that hints at a garden or lawn setting."
,[zebra],"In the image, there is a zebra grazing on grass. The zebra is depicted from a side angle, allowing for a clear view of its distinctive black-and-white striped pattern, which covers its body, legs, and head. The mane stands erect along the neck, and the animal is shown in a green field under bright sunlight, which casts shadows on the ground. The zebra appears to be in a peaceful setting, perhaps in a natural reserve or a wildlife park."
,"[dog, dog]","This image shows a small tan or light brown curly-furred dog sleeping or resting on a wire shoe rack. The dog appears to be very comfortable nestled among various shoes, including sandals, sneakers, and possibly a garden shoe. The presence of a sports racket cover with the label ""Rucanor"" indicates that someone in the household may play racket sports. The shoe rack is situated in a space with a terracotta tiled floor, suggesting an indoor location, such as a mudroom, garage, or entryway to a home. The blue item in the top right corner is not fully visible, so it's difficult to identify what it is; it could be a piece of equipment or decoration. Overall, the scene depicts a cozy and somewhat amusing moment with the dog finding an unconventional spot to relax among the footwear."
,"[person, person, bench, person, elephant, elephant, person]","In the image, there are two elephants with riders on their backs. The elephants are adorned with what appears to be riding gear, including seats for the riders. They are walking through a dense jungle environment, with thick green foliage surrounding them. The setting is lush and verdant, suggesting a tropical or subtropical climate. It looks like a scene commonly associated with elephant trekking or safari experiences offered in certain regions where elephants are part of the local fauna and tourism activities."


In addition to adapters for local models and inference APIs, Pixeltable can of course do a range of more basic image operations. These image operations can be seamlessly chained with API calls, and Pixeltable will keep track of the sequence of operations, constructing new images and caching when necessary to keep things running smoothly. Just for fun (and to demonstrate the power of computed columns), let's see what OpenAI thinks of our sample images when we rotate them by 180 degrees.

In [4]:
t['rot_image'] = t.input_image.rotate(180)
t['rot_vision'] = openai.vision(prompt="Describe what's in this image.", image=t.rot_image)

Computing cells: 100%|████████████████████████████████████████████| 6/6 [01:04<00:00, 10.71s/ cells]
Added 6 column values with 0 errors.


In [5]:
t.select(t.rot_image, t.rot_vision).show()

rot_image,rot_vision
,"This image appears to be rotated upside down. It shows a giraffe reaching down, likely to feed on some vegetation. You can see the distinctive brown and white patterned coat of the giraffe, its long neck, and the surrounding environment which includes trees and shrubs. The viewpoint makes it seem as though the giraffe is defying gravity, but in reality, the image is just flipped, which creates a playful perspective."
,"The image shows a colorful meal arranged in a sectional container. Each section is a different vibrant color, making it visually appealing, particularly for a child or someone who enjoys playful presentations.\n\nIn the top section, which is yellow, there appears to be some curly, dark green leafy vegetables, which could be kale or a similar type of greens. Next to them, there are two brown baked items that could be some sort of muffin or meatless patties.\n\nThe middle section, which is pink, contains a handful of almonds, and adjacent to that appears to be some dried fruit, possibly apple chips, considering their shape and color.\n\nIn the bottom section, which is a purplish pink, there are some chunks of fresh fruit, likely pineapple, and some white bread with a spread that could be butter or margarine.\n\nOverall, the arrangement looks like a balanced meal with a focus on plant-based foods, suitable for a packed lunch or a snack box with diverse tastes and textures."
,"This image features a bouquet of flowers hanging upside down from what appears to be the underside of a white ledge or shelf. The bouquet includes a variety of flowers in different colors—including white and pink— intermixed with some greenery. The background is diffused with natural light, likely depicting an outdoor setting, with more greenery and what might be a tree seen blurred in the distance. This upside-down hanging of flowers could be a part of a decorative arrangement or a method for drying flowers."
,"The image shows a zebra lying on its back on a grassy ground. The zebra's legs are in the air and its head is turned to one side, looking towards the camera. The zebra's distinctive black and white striped pattern is clearly visible against the green background of the grass. It appears to be a sunny day, and the zebra seems relaxed in this unusual, playful, or resting position."
,"This image shows a small part of a dog's body, likely the head or neck area, peeking through a white wire rack or shelving unit. The dog appears to have curly fur, possibly indicating that it's a poodle or a poodle mix. On the wire rack above the dog, there are various pairs of shoes, including what appears to be sneakers, sandals, and flip-flops. One of the flip-flops has a red strap with cartoon decorations on it. The background suggests an indoor setting with a tiled floor."
,"The image shows an upside-down view of a dense forest area. There's lush green foliage throughout, and it appears to be a tropical or subtropical environment given the density and type of vegetation. The foliage is thick enough that the forest floor or sky is not visible. In the middle of the image, there is a clearing where a part of a vehicle is showing through the greenery. The upside-down presentation of the image gives it an unusual and disorienting effect."


## UDFs: Enhancing Pixeltable's Capabilities

Another important principle of Pixeltable is that, although Pixeltable has a built-in library of useful operations and adapters, it will never prescribe a particular way of doing things. Pixeltable is built from the ground up to be extensible.

Let's take a specific example. Recall our use of the ResNet-50 detection model, in which the `detect` column contains a JSON blob with bounding boxes, scores, and labels. Suppose we want to create a column containing the single label with the highest confidence score. There's no built-in Pixeltable function to do this, but it's easy to write our own. In fact, all we have to do is write a Python function that does the thing we want, and stamp it with the `@pxt.udf` decorator.

In [6]:
@pxt.udf
def detect_top(detect: dict) -> str:
    scores = detect['scores']
    label_text = detect['label_text']
    # Get the index of the object with the highest confidence
    i = scores.index(max(scores))
    # Return the corresponding label
    return label_text[i]

In [7]:
t['top'] = detect_top(t.detect)

Computing cells: 100%|███████████████████████████████████████████| 6/6 [00:00<00:00, 680.27 cells/s]
Added 6 column values with 0 errors.


In [8]:
t.select(t.detect_text, t.top).show()

detect_text,top
"[orange, orange, bowl, bowl, orange, broccoli, orange, bowl]",bowl
"[vase, potted plant]",vase
[zebra],zebra
"[giraffe, giraffe]",giraffe
"[dog, dog]",dog
"[person, person, bench, person, elephant, elephant, person]",elephant


Congratulations! You've reached the end of the tutorial. Hopefully, this gives a good overview of the capabilities of Pixeltable, but there's much more to explore. As a next step, you might check out one of the other tutorials, depending on your interests:
- [RAG Operations in Pixeltable](https://pixeltable.github.io/pixeltable/tutorials/rag-demo/)
- [Object Detection in Videos](https://pixeltable.github.io/pixeltable/tutorials/object-detection-in-videos/)
- [Using the OpenAI API with Pixeltable](https://pixeltable.github.io/pixeltable/tutorials/openai-demo/)