In [None]:
import sys, glob, os
import pandas as pd
import numpy as np
import PIL
import json
import urllib.request
import tempfile
import tqdm

import torch, torchvision
from torchvision import transforms
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor

In [None]:
import pixeltable as pt
import pixeltable.functions
%load_ext autoreload
%autoreload 2

# Table of Contents

- [Creating databases and tables and inserting data](#Creating-databases-and-tables-and-inserting-data)
    - [Creating a table](#Creating-a-table)
    - [Inserting data](#Inserting-data)
    - [Versioning in Pixeltable](#Versioning-in-Pixeltable)
    - [Data persistence](#Data-persistence)
- [Retrieving data](#Retrieving-data)
    - [Filtering rows](#Filtering-rows)
    - [Selecting output](#Selecting-output)
        - [Operations on JSON data](#Operations-on-JSON-data)<br>
        - [Operations on image data](#Operations-on-image-data)<br>
        - [Image similarity search](#Image-similarity-search)<br>
- [User-defined functions](#User-defined-functions)<br>
    - [Stored functions](#Stored-functions)<br>
    - [Computed columns](#Computed-columns)<br>

# Creating databases and tables and inserting data

In Pixeltable, all data resides in tables, which in turn are assigned to databases.

Let's start by creating a client and a `tutorial` database:

In [None]:
cl = pt.Client()
cl.drop_db('tutorial', force=True)
db = cl.create_db('tutorial')

In this tutorial we're going to be a working with a subset of the COCO dataset (10 samples each for the train, test, and validation splits). To avoid further installs, the tutorial comes pre-packaged with a data file (of JSON records) and a set of images, which we're going to download into a temp directory now:

In [None]:
download_prefix = 'https://gitlab.com/pixeltable/python-sdk/-/raw/master/tutorials'
json_data_url = f'{download_prefix}/coco-records.json'

records = json.loads(urllib.request.urlopen(json_data_url).read().decode('utf-8'))

image_dir = tempfile.mkdtemp()
for r in tqdm.notebook.tqdm(records):
    filename = r['filepath'].split('/')[1]
    out_filepath = f'{image_dir}/{filename}'
    url = f'{download_prefix}/{r["filepath"]}'
    r['filepath'] = out_filepath
    img_data = urllib.request.urlopen(url).read()
    with open(out_filepath, 'wb') as img_file:
        img_file.write(img_data)

Each data record is a dictionary with top-level fields `filepath`, `tag`, `metadata`, and `ground_truth`:

In [None]:
records[0]

## Creating a table

A table for this data requires a column for each top-level field: `filepath`, `tag`, `metadata`, `ground_truth`.

Instead of a file path per image we are going to store the image directly. The table columns are as follows:

In [None]:
schema = [
    pt.Column('img', pt.ImageType(), nullable=False, indexed=True),
    pt.Column('tag', pt.StringType(), nullable=False),
    pt.Column('metadata', pt.JsonType(), nullable=False),
    pt.Column('ground_truth', pt.JsonType(), nullable=True),
]

`nullable=False` means the values in this column can't be `None`, which Pixeltable will check at data insertion time. `indexed=True` tells Pixeltable to create a vector index for embeddings (using CLIP) for the images in this column, which enables text and image similarity search. More on that later.

The available data types in Pixeltable are:

|Pixeltable type|Python type|
|:---|:---|
| `pt.StringType()`| `str` |
| `pt.IntType()`| `int` |
| `pt.FloatType()`| `float` |
| `pt.BoolType()`| `bool` |
| `pt.TimestampType()`| `datetime.datetime` |
| `pt.JsonType()`| lists and dicts that can be converted to JSON|
| `pt.ArrayType()`| `numpy.ndarray`|
| `pt.ImageType()`| `PIL.Image.Image`|
| `pt.VideoType()`| `str` (the file path)|


We then create a table `data`:

In [None]:
data = db.create_table('data', schema)

At this point, table `data` contains no data:

In [None]:
data.count()

## Inserting data

In order to populate `data` with what's in `records`, we turn the latter into a Pandas DataFrame and insert that with the `insert_pandas()` function (and we rename the `filepath` column to `img` to match the table definition):

In [None]:
pd_df = pd.DataFrame.from_records(records).rename({'filepath': 'img'}, axis=1)
data.insert_pandas(pd_df)
data.count()

In Pixeltable, images are 'inserted' as file paths, and Pixeltable only stores these paths and not the images themselves, so there is no duplication of storage.

Let's look at the first 3 rows:

In [None]:
data.show(3)

You can also insert the data directly, without prior conversion to a Pandas DataFrame, with the `insert_rows()` function, which requires a list of rows, each of which is a list of column values:

In [None]:
rows = [
    [r['filepath'], r['tag'], r['metadata'], r['ground_truth']] for r in records
]
rows[0]

In [None]:
data.insert_rows(rows)

We have now loaded our data twice:

In [None]:
data.count()

## Versioning in Pixeltable

Pixeltable maintains a version history for data changes to tables (ie, inserting data and adding/dropping columns). The `revert()` method lets you go back to the preceding version.

For our table `data`, since we don't want duplicates, we revert the last update:

In [None]:
data.revert()
data.count()

## Data persistence

Unlike "computational containers" such as Pandas or Dask DataFrames, tables in Pixeltable are persistent. To illustrate that, let's create a new Pixeltable client and a new handle to the `data` table:

In [None]:
cl = pt.Client()

We already have a database `tutorials`, so now we call `get_db()` instead of `create_db()` (in fact, the latter would return with an exception). Likewise, we call `get_table()` to get a handle to the already present `data` table:

In [None]:
db = cl.get_db('tutorial')
data = db.get_table('data')
data.count()

# Retrieving data

The Pixeltable retrieval interface is patterned after Pandas DataFrame operations: the index (`[]`) operator is used both to select columns for output and filter rows.

## Filtering rows

For example, to only look at test data:

In [None]:
data[data.tag == 'test'].show(2)

Or at data for images that are less than 640 pixels wide:

In [None]:
data[data.metadata.width < 640].show(2)

Pixeltable supports the standard comparison operators (`<`, `<=`, `>`, `>=`, `==`) and logical operators (`&` for `and`, `|` for `or`, `~` for `not`). Like in Pandas, logical operators need to be wrapped in parentheses:

In [None]:
data[(data.tag == 'test') & (data.metadata.width < 640)].show(2)

## Selecting output

Let's retrieve columns `tag` and `metadata`:

In [None]:
data[data.tag, data.metadata].show(2)

In general, each element in `[]` needs to be a Pixeltable **expression**. In the previous example, the expressions were simple column references, but Pixeltable also supports most standard arithmetic operators as well as a set of type-specific functions (more on those in a bit). For example, to retrieve the total number of pixels per image:

In [None]:
data[data.tag, data.metadata.width * data.metadata.height].show(2)

### Operations on JSON data

The previous example illustrates the use of path expressions against JSON-typed data: `width` is a field in the `metadata` column, which we can simply access as `data.metadata.width`.

Another example: retrieve only the bounding boxes from the `ground_truth` column. This will come in handy later when we need to pass those bounding boxes (and not the surrounding dictionary) into a function.

In [None]:
data[data.ground_truth.detections['*'].bounding_box].show(2)

The field `detections` contains a list, and the `'*'` index indicates that you want all elements in that list. You can also use standard Python list indexing and slicing operations, such as

In [None]:
data[data.ground_truth.detections[0].bounding_box].show(2)

to select only the first bounding box, or

In [None]:
data[data.ground_truth.detections[::-1].bounding_box].show(2)

to select the bounding boxes in reverse.

### Operations on image data

Image data has properties `width`, `height`, and `mode`:

In [None]:
data[data.img.width, data.img.height, data.img.mode].show(2)

Pixeltable also has a number of built-in functions for images (these are a subset of what is available for `PIL.Image.Image`): 

|Image function||
|:---|:---|
|[`convert()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.convert)|Returns a converted copy of this image|
|[`crop()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.crop)|Returns a rectangular region from this image|
|[`effect_spread()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.effect_spread)|Randomly spread pixels in an image|
|[`entropy()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.entropy)|Calculates and returns the entropy for the image|
|[`filter()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.filter)|Filters this image using the given filter|
|[`getbands()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.getbands)|Returns a tuple containing the name of each band in this image|
|[`getbbox()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.getbbox)|Calculates the bounding box of the non-zero regions in the image|
|[`getchannel()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.getchannel)|Returns an image containing a single channel of the source image|
|[`getcolors()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.getcolors)|Returns a list of colors used in this image|
|[`getextrema()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.getextrema)|Gets the minimum and maximum pixel values for each band in the image|
|[`getpalette()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.getpalette)|Returns the image palette as a list|
|[`getpixel()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.getpixel)|Returns the pixel value at a given position|
|[`getprojection()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.getprojection)|Get projection to x and y axes|
|[`histogram()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.histogram)|Returns a histogram for the image|
|[`point()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.point)|Maps this image through a lookup table or function|
|[`quantize()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.quantize)|Convert the image to ‘P’ mode with the specified number of colors|
|[`reduce()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.reduce)|Returns a copy of the image reduced factor times|
|[`remap_palette()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.remap_palette)|Rewrites the image to reorder the palette|
|[`resize()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.resize)|Returns a resized copy of this image|
|[`rotate()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.rotate)|Returns a rotated copy of this image|
|[`transform()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.transform)|Transforms this image|
|[`transpose()`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.transpose)|Transpose image (flip or rotate in 90 degree steps)|

These functions are invoked in the style of method calls and can be chained, as in this example, which rotates the image by 30 degrees and converts it to BW:

In [None]:
data[data.img.rotate(30).convert('L')].show(2)

### Image similarity search

When we created the `frame` column we specified `indexed=True`, which creates a vector index of CLIP embeddings for the images in that column. We can take advantage of that with the search functions `nearest()` and `matches()`. First, let's get a sample image from `data`:

In [None]:
sample_img = data[data.img].show(1)[0, 0]
sample_img

`show()` returns a result set, which is a two-dimensional structure you can access with standard Python indexing operations (ie, `[<row-idx>, <column-idx>]`. In this case, we're selecting the first column value of the first row, which is a `PIL.Image.Image`:

In [None]:
type(sample_img)

To look for images like this one, use `nearest()`:

In [None]:
data[data.img.nearest(sample_img)][data.img].show(2)

To look for images based on text, use `matches()`:

In [None]:
data[data.img.matches('car')][data.img].show(2)

# User-defined functions

User-defined functions let you customize Pixeltable's functionality for your own data.

In this example, we're going use a `torchvision` object detection model (Faster R-CNN) against the images in `data` with a user-defined function:

In [None]:
model = torchvision.models.detection.fasterrcnn_mobilenet_v3_large_320_fpn(weights="DEFAULT")
model.eval()  # switch to inference mode

Our function converts the image to PyTorch format and obtains a prediction from the model, which is a list of dictionaries with fields `boxes`, `labels`, and `scores` (one per input image). The fields themselves are PyTorch tensors, and we convert them to standard Python lists (so they become JSON-serializable data):

In [None]:
def fasterrcnn_detect(img):
    t = transforms.ToTensor()(img)
    t = transforms.ConvertImageDtype(torch.float)(t)
    result = model([t])[0]
    return {
        'boxes': result['boxes'].tolist(), 'labels': result['labels'].tolist(), 'scores': result['scores'].tolist()
    }

Let's confirm that `fasterrcnn_detect()` works as expected:

In [None]:
sample_result = fasterrcnn_detect(sample_img)
sample_result

We now need to create a wrapper to `fasterrcnn_detect()` in order to tell Pixeltable what arguments the function takes and what it returns:

In [None]:
detect = pt.make_function(pt.JsonType(), [pt.ImageType()], fasterrcnn_detect)

The first `make_function()` parameter is the return type, the second parameter is the list of parameter types, and the last parameter is the actual function we want Pixeltable to call.

We can then use `detect` in the Pixeltable index operator using standard Python function call syntax:

In [None]:
data[data.img, detect(data.img)].show(1)

`detect` returns JSON data, and we can use Pixeltable's JSON functionality to access that as well. For example, if we're only interested in the first detected bounding box and the first label:

In [None]:
data[detect(data.img).boxes[0], detect(data.img).labels[0]].show(1)

When running this query, Pixeltable evalutes `detect(data.img)` only once per row.

## Stored functions

Functions, like tables, can be stored in the database, which assigns them a name and makes them persistent (via pickling):

In [None]:
db.create_function('frcnn_detect', detect)

Just like a table, you can now get a handle to the function and use it without having access to the code:

In [None]:
detect_udf = db.get_function('frcnn_detect')
data[data.img, detect_udf(data.img).boxes[0]].show(1)

We can also check that it's still the same function with `list()`:

In [None]:
detect_udf.list()

We can see what functions are available across all databases with `list_functions()`:

In [None]:
cl.list_functions()

## Computed columns

Being able to run models against any image stored in Pixeltable is very useful, but the runtime cost of model inference makes it impractical to run it every time we want to do something with the model output. In Pixeltable, we can use computed columns to precompute and cache the output of a function:

In [None]:
data.add_column(pt.Column('detections', computed_with=detect(data.img)))

`detections` is now a column in `data` which holds the model prediction for the `img` column. Like any other column, it is persistent. Pixeltables runs the computation  automatically whenever new data is added to the table. Let's see what `data` looks like now:

In [None]:
data.describe()

In general, the `computed_with` keyword argument can be any Pixeltable expression. In this example, we're making the first label recorded in `detections` available as a separate column:

In [None]:
data.add_column(pt.Column('first_label', computed_with=data.detections.labels[0]))
data[data.detections.labels, data.first_label].show(1)