# Working with Hugging Face

Pixeltable unifies data and computation into a table interface. In this tutorial, we'll go into more depth on the Hugging Face integration between datasets and how Hugging Face models can be incorporated into Pixeltable workflows to run models locally.

In [None]:
%pip install -qU pixeltable datasets torch transformers tiktoken spacy

Now let's load the Hugging Face dataset, as described in the [Hugging Face documentation](https://huggingface.co/docs/datasets/en/package_reference/loading_methods).

In [None]:
import datasets

padoru = (
    datasets.load_dataset("not-lain/padoru", split='train')
    .select_columns(['Image', 'ImageSize', 'Name', 'ImageSource'])
)

It preserves the Hugging Face information about whether the data is part of the *test*, *train* or *validation* split.

In [None]:
padoru

## Create a Pixeltable table from a Hugging Face dataset

Now we create a table and Pixeltable will map column types as needed. Check out other ways to bring data into Pixeltable with [pixeltable.io](https://docs.pixeltable.com/sdk/latest/io) such as csv, parquet, pandas, json and others.

In [None]:
import pixeltable as pxt

pxt.drop_dir('hf_demo', force=True)
pxt.create_dir('hf_demo')
t = pxt.create_table('hf_demo.padoru', source=padoru)

In [None]:
t.head(3)

## Leveraging Hugging Face models with Pixeltable's embedding functionality

Pixeltable contains a built-in adapter for certain model families, so all we have to do is call the [Pixeltable function for Hugging Face](https://docs.pixeltable.com/sdk/latest/huggingface). A nice thing about the Huggingface models is that they run locally, so you don't need an account with a service provider in order to use them.

Pixeltable can also create and populate an index with `table.add_embedding_index()` for string and image embeddings. That definition is persisted as part of the table's metadata, which allows Pixeltable to maintain the index in response to updates to the table.

In this example we are using `CLIP`. You can use any embedding function you like, via Pixeltable's UDF mechanism (which is described in detail our [guide to user-defined functions](https://docs.pixeltable.com/platform/udfs-in-pixeltable)).

In [None]:
from pixeltable.functions.huggingface import clip
import PIL.Image

# create embedding index on the 'Image' column
t.add_embedding_index(
    'Image',
    embedding=clip.using(model_id='openai/clip-vit-base-patch32')
)

In [None]:
sample_img = t.select(t.Image).head(1)[0]['Image']

sim = t.Image.similarity(sample_img)

# use 'similarity()' in the order_by() clause and apply a limit in order to utilize the index
t.order_by(sim, asc=False).limit(3).select(t.Image, sim=sim).collect()

You can learn more about how to leverage indexes in detail with our tutorial: [Working with Embedding and Vector Indexes](https://docs.pixeltable.com/platform/embedding-indexes)