[![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://github.com/pixeltable/pixeltable/blob/release/docs/release/fundamentals/queries-and-expressions.ipynb)&nbsp;&nbsp;
[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pixeltable/pixeltable/blob/release/docs/release/fundamentals/queries-and-expressions.ipynb)&nbsp;&nbsp;
<a href="https://raw.githubusercontent.com/pixeltable/pixeltable/release/docs/release/fundamentals/queries-and-expressions.ipynb" download><img src="https://img.shields.io/badge/%E2%AC%87-Download%20Notebook-blue" alt="Download Notebook"></a>

# Pixeltable Fundamentals

## Section 3: Queries and Expressions

Welcome to Section 3 of the __Pixeltable Fundamentals__ tutorial, __Queries and Expressions__.

In the previous section of this tutorial, [Computed Columns](https://docs.pixeltable.com/docs/computed-columns), we saw how to issue queries over Pixeltable tables, such as:

```python
pop_t.select(yoy_change=(pop_t.pop_2023 - pop_t.pop_2022)).collect()
```

We also saw how to define __computed columns__ that become part of the table and are updated automatically when new rows are inserted:

```python
pop_t.add_column(yoy_change=(pop_t.pop_2023 - pop_t.pop_2022))
```

Both these examples reference the _Pixeltable expression_ `pop_t.pop_2023 - pop_t.pop_2022`. We've seen a number of other expressions as well, such as the chain of image operations

```python
t.source.convert('RGBA').rotate(10)
```

and the model invocation

```python
detr_for_object_detection(
    t.source,
    model_id='facebook/detr-resnet-50',
    threshold=0.8
)
```

Expressions are the basic building blocks of Pixeltable workloads. An expression can be included in a `select()` statement, which will cause it to be evaluated dynamically, or in an `add_column()` statement, which will add it to the table schema as a computed column. In this section, we'll dive deeper into the different kinds of Pixeltable expressions and their uses. We'll:

- Understand the relationship between Pixeltable expressions and query execution
- Survey the different types of expressions and queries
- Learn more about the Pixeltable type system

To get started, let's import the necessary libraries for this tutorial and set up an example table.

In [None]:
%pip install -qU pixeltable datasets timm

In this section of the tutorial, we're going to work with a subset of the MNIST dataset, a classic reference database of hand-drawn digits. A copy of the MNIST dataset is hosted on the Hugging Face datasets repository, so we use Pixeltable's built-in Hugging Face data importer to load it into a Pixeltable table.

In [1]:
import pixeltable as pxt

pxt.drop_dir('demo', force=True)
pxt.create_dir('demo')

Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory `demo`.


<pixeltable.catalog.dir.Dir at 0x338ff7d60>

In [2]:
import datasets

# Download the first 50 images of the MNIST dataset
ds = datasets.load_dataset('ylecun/mnist', split='train[:50]')

# Import them into a Pixeltable table
t = pxt.io.import_huggingface_dataset('demo.mnist', ds)

Created table `mnist_tmp_33217005`.
Inserting rows into `mnist_tmp_33217005`: 50 rows [00:00, 17081.96 rows/s]
Inserted 50 rows with 0 errors.


In [3]:
t.head(3)

image,label
,5
,0
,4


### Column References

The most basic type of expression is a __column reference__: that's what you get when you type, say, `t.image`. An expression by itself is just a Python object; it doesn't contain any actual data, and no data will be loaded until you use the expression in a `select()` query or `add_column()` statement. Here's what we get if we type `t.image` by itself:

In [4]:
t.image

Column Name,Type,Computed With
image,Image,


This is true of all Pixeltable expressions: we can freely create them and manipulate them in various ways, but no actual data will be loaded until we use them in a query.

### JSON Collections (Dicts and Lists)

Data is commonly presented in JSON format; for example, API responses and model output often take the shape of JSON dictionaries or lists of dictionaries. Pixeltable has native support for JSON accessors. To demonstrate this, let's run an image classification model against the images in our dataset.

In [8]:
from pixeltable.functions.huggingface import vit_for_image_classification

t.add_column(classification=vit_for_image_classification(
    t.image, model_id='farleyknight-org-username/vit-base-mnist'
))

Computing cells: 100%|██████████████████████████████████████████| 50/50 [00:01<00:00, 30.87 cells/s]
Added 50 column values with 0 errors.


UpdateStatus(num_rows=50, num_computed_values=50, num_excs=0, updated_cols=[], cols_with_excs=[])

In [10]:
t.select(t.classification).head(3)

classification
"[{""p"": 0.981, ""class"": 5, ""label"": ""5""}, {""p"": 0.013, ""class"": 3, ""label"": ""3""}, {""p"": 0.002, ""class"": 2, ""label"": ""2""}, {""p"": 0.001, ""class"": 8, ""label"": ""8""}, {""p"": 0.001, ""class"": 7, ""label"": ""7""}]"
"[{""p"": 0.997, ""class"": 0, ""label"": ""0""}, {""p"": 0., ""class"": 6, ""label"": ""6""}, {""p"": 0., ""class"": 9, ""label"": ""9""}, {""p"": 0., ""class"": 8, ""label"": ""8""}, {""p"": 0., ""class"": 1, ""label"": ""1""}]"
"[{""p"": 0.997, ""class"": 4, ""label"": ""4""}, {""p"": 0.001, ""class"": 1, ""label"": ""1""}, {""p"": 0., ""class"": 9, ""label"": ""9""}, {""p"": 0., ""class"": 7, ""label"": ""7""}, {""p"": 0., ""class"": 0, ""label"": ""0""}]"


We see that the output is returned as a list of dicts, each containing a class number, a label (in this case, just the string form of the class number), and a probability. The Pixeltable type of the `classification` column is `pxt.Json`:

In [11]:
t

Column Name,Type,Computed With
image,Image,
label,String,
test2,Required[Json],"detr_for_object_detection(image, model_id='fxmarty/resnet-tiny-mnist')"
test3,Required[Json],"detr_for_object_detection(image, model_id='fxmarty/resnet-tiny-mnist', revision='main')"
classification,Required[Json],"vit_for_image_classification(image, model_id='farleyknight-org-username/vit-base-mnist')"


Pixeltable provides a range of operators on `Json`-typed output that behave just as you'd expect. To select a specific entry from a list, use the syntax `t.classification[0]`:

In [12]:
t.select(t.classification[0]).head(3)

classification_0
"{""p"": 0.981, ""class"": 5, ""label"": ""5""}"
"{""p"": 0.997, ""class"": 0, ""label"": ""0""}"
"{""p"": 0.997, ""class"": 4, ""label"": ""4""}"


`t.classification[0]` is another Pixeltable expression; you can think of it as saying, "extract element 0 from every list in the column `t.classification`, and return the result as a new column." As before, the expression by itself contains no data; it's the query that actually does the work of retrieving data. Here's what we see if we just give the expression by itself, without a query:

In [13]:
t.classification[0]

<pixeltable.exprs.json_path.JsonPath at 0x458524ac0>

Pixeltable supports slicing over lists in the usual way:

In [14]:
t.select(t.classification[0:2]).head(3)

classification_02
"[{""p"": 0.981, ""class"": 5, ""label"": ""5""}, {""p"": 0.013, ""class"": 3, ""label"": ""3""}]"
"[{""p"": 0.997, ""class"": 0, ""label"": ""0""}, {""p"": 0., ""class"": 6, ""label"": ""6""}]"
"[{""p"": 0.997, ""class"": 4, ""label"": ""4""}, {""p"": 0.001, ""class"": 1, ""label"": ""1""}]"


As well as to look up values in a dictionary:

In [15]:
t.select(t.classification[0]['label']).head(3)

classification_0_class
5
0
4


You can also use "attribute" syntax for dictionary lookups:

In [17]:
t.select(t.classification[0].label).head(3)

classification_0_p
0.981
0.997
0.997


The "attribute" syntax isn't fully general (it won't work for dictionary keys that are not valid Python identifiers), but it's handy when it works.

Pixeltable is resilient against out-of-bounds indices or dictionary keys. If an index or key doesn't exist for a particular row, you'll get a `None` output for that row.

In [21]:
t.select(t.classification[0].not_a_key).head(3)

classification_0_notakey
""
""
""


As always, any expression can be used to create a computed column.

In [20]:
t.add_column(pred_label=t.classification[0].label)
t

Column Name,Type,Computed With
image,Image,
label,String,
test2,Required[Json],"detr_for_object_detection(image, model_id='fxmarty/resnet-tiny-mnist')"
test3,Required[Json],"detr_for_object_detection(image, model_id='fxmarty/resnet-tiny-mnist', revision='main')"
classification,Required[Json],"vit_for_image_classification(image, model_id='farleyknight-org-username/vit-base-mnist')"
pred_label,Required[Json],classification[0].label


### Function Calls

### Arithmetic and Boolean Operations

### Other Expressions