# UDFs in Pixeltable

Pixeltable has a library of built-in functions and integrations, but sooner or later, you'll want to introduce some customized logic into your workflow. This is where Pixeltable's rich UDF capability comes in. Pixeltable UDFs let you write code in Python, then directly insert your custom logic into Pixeltable expressions and computed columns. In this how-to guide, we'll show how to define UDFs, extend their capabilities, and use them in computed columns.

To start, we'll install the necessary dependencies, create a Pixeltable directory and table to experiment with, and add some sample data.

In [None]:
%pip install -q pixeltable

In [47]:
import pixeltable as pxt

cl = pxt.Client()
cl.create_dir('udf_demo', ignore_errors=True)
cl.drop_table('udf_demo.udfs', ignore_errors=True)
t = cl.create_table('udf_demo.udfs', {'input': pxt.StringType()})
t.insert([{'input': 'Hello, world!'}, {'input': 'You can do a lot with Pixeltable UDFs.'}])
t.show()

Created table `udfs`.
Inserting rows into `udfs`: 2 rows [00:00, 1773.12 rows/s]
Inserted 2 rows with 0 errors.


input
"Hello, world!"
You can do a lot with Pixeltable UDFs.


For our first UDF, we'll do something very simple: write a function to extract the longest word from a string. In Python that might look something like this:

In [48]:
def longest_word(sentence: str, strip_punctuation: bool = False) -> str:
    words = sentence.split()
    if strip_punctuation:
        # Remove non-alphanumeric characters from each word
        words = [''.join(filter(str.isalnum, word)) for word in words]
    max_len = max(len(word) for word in words)
    return next(word for word in words if len(word) == max_len)

In [38]:
longest_word("Let's check that it works", strip_punctuation=True)

'check'

The `longest_word` Python function isn't a Pixeltable UDF (yet), because it operates on individual strings, while a Pixeltable UDF needs to be able to operate on _columns_ of strings. But turning it into a Pixeltable UDF is easy: all we need to do is add the `@pxt.udf` decorator to the function signature.

In [49]:
@pxt.udf
def longest_word(sentence: str, strip_punctuation: bool = False) -> str:
    words = sentence.split()
    if strip_punctuation:
        # Remove non-alphanumeric characters from each word
        words = [''.join(filter(str.isalnum, word)) for word in words]
    max_len = max(len(word) for word in words)
    return next(word for word in words if len(word) == max_len)

Now we can create a computed column using our new UDF. Pixeltable orchestrates the computation like it does with any other function, applying the UDF in turn to each existing row of the table, then updating incrementally each time a new row is added.

In [50]:
t['longest_word'] = longest_word(t.input)

Computing cells: 100%|███████████████████████████████████████████| 2/2 [00:00<00:00, 340.46 cells/s]
Added 2 column values with 0 errors.


In [51]:
t.show()

input,longest_word
"Hello, world!","Hello,"
You can do a lot with Pixeltable UDFs.,Pixeltable


In [52]:
t.insert(input='Pixeltable updates tables incrementally.')
t.show()

Computing cells:   0%|                                                    | 0/1 [00:00<?, ? cells/s]
Inserting rows into `udfs`: 1 rows [00:00, 372.86 rows/s]
Computing cells: 100%|███████████████████████████████████████████| 1/1 [00:00<00:00, 160.54 cells/s]
Inserted 1 row with 0 errors.


input,longest_word
"Hello, world!","Hello,"
You can do a lot with Pixeltable UDFs.,Pixeltable
Pixeltable updates tables incrementally.,incrementally.


Oops, those trailing punctuation marks are kind of annoying. Let's add another column, this time using the handy `strip_punctuation` parameter from our UDF. (We could alternatively drop the first column before adding the new one, but for purposes of this tutorial it's useful to see how Pixeltable executes both variants side-by-side.)

In [53]:
t['longest_word_2'] = longest_word(t.input, strip_punctuation=True)
t.show()

Computing cells: 100%|███████████████████████████████████████████| 3/3 [00:00<00:00, 543.19 cells/s]
Added 3 column values with 0 errors.


input,longest_word,longest_word_2
"Hello, world!","Hello,",Hello
You can do a lot with Pixeltable UDFs.,Pixeltable,Pixeltable
Pixeltable updates tables incrementally.,incrementally.,incrementally


## Types in Pixeltable

You might have noticed that the `longest_word` UDF has _type hints_ in its signature.

```python
def longest_word(sentence: str, strip_punctuation: bool = False) -> str: ...
```

The `sentence` parameter, `strip_punctuation` parameter, and return value all have explicit types (`str`, `bool`, and `str` respectively). In general Python code, type hints are usually optional. But Pixeltable is a database system: _everything_ in Pixeltable has a type. And since Pixeltable is also an orchestrator - meaning it sets up workflows and computed columns _before_ executing them - these types need to be known in advance. That's the reasoning behind a fundamental principle of Pixeltable UDFs:
- Type hints are _required_.

You can turn almost any Python function into a Pixeltable UDF, provided that it has type hints, and provided that Pixeltable supports the types that it uses.