## Feature Engineering with LanceDB and Geneva

This notebook will focus on the crucial process of feature engineering. We'll start with a raw dataset of fashion products, ingest it in LanceDB, then enrich our data with meaningful features that we could use to build a search engine or train a model.

We will cover the following steps:
1. **Data Ingestion**: Downloading a fashion dataset and loading it into a LanceDB table.
2. **Declarative Feature Engineering**: Using Geneva to define and compute features on-the-fly.
3. **Embedding Generation**: Creating vector embeddings for both images and text to enable semantic search.
4. **Updating**: Adding more raw data to our table and rerunning our backfills on only the new data.

## Note about Colab

This notebook runs on Google Colab, even the free tier, but it will be slow, because it has to start a local Ray cluster and execute multiple ML models on its workers. We recommend downloading this notebook and running it locally. But if you do run on Colab, we recommend:
- using a GPU instance (Runtime -> Change runtime type)
- running on only 100 rows
- not drawing conclusions about speed from this notebook. This notebook is meant as a demo of the basic workflow of feature engineering with LanceDB, not a benchmark or speed demo.

In [None]:
!uv pip install --upgrade geneva lancedb kubernetes "ray[default]" rerankers pandas torch torchvision open-clip-torch
# Pin transformers to a compatible version for BLIP models (needs >=4.40.0 for Unpack import)
# If you encounter "ImportError: cannot import name 'Unpack'", ensure transformers>=4.40.0
!uv pip install "transformers>=4.40.0,<5.0.0"
!uv pip install pillow
# Pin protobuf to avoid MessageFactory.GetPrototype AttributeError (removed in protobuf 6.30.0+)
!uv pip install "protobuf<6.30.0"
# working around a quirk on Colab:
!uv pip install --force-reinstall numpy scipy

[2mUsing Python 3.12.12 environment at: /Users/dantasse/src/vectordb-recipes/venv[0m
[2K[37m⠙[0m [2mResolving dependencies...                                                     [0m



[2K[2mResolved [1m117 packages[0m [2min 265ms[0m[0m                                       [0m
[2K[2mPrepared [1m4 packages[0m [2min 0.91ms[0m[0m                                            
[2mUninstalled [1m4 packages[0m [2min 157ms[0m[0m
[2K[2mInstalled [1m4 packages[0m [2min 55ms[0m[0m                                [0m
 [31m-[39m [1mprotobuf[0m[2m==5.29.5[0m
 [32m+[39m [1mprotobuf[0m[2m==6.33.2[0m
 [31m-[39m [1mtokenizers[0m[2m==0.19.1[0m
 [32m+[39m [1mtokenizers[0m[2m==0.22.2[0m
 [31m-[39m [1mtransformers[0m[2m==4.44.2[0m
 [32m+[39m [1mtransformers[0m[2m==4.57.3[0m
 [31m-[39m [1murllib3[0m[2m==2.6.3[0m
 [32m+[39m [1murllib3[0m[2m==2.3.0[0m
[2mUsing Python 3.12.12 environment at: /Users/dantasse/src/vectordb-recipes/venv[0m
[2K[2mResolved [1m2 packages[0m [2min 48ms[0m[0m                                          [0m
[2K[2mPrepared [1m2 packages[0m [2min 0.36ms[0m[0m                         

## 1. Data Ingestion

First, let's download our dataset. We're using a small version of the Fashion Product Images dataset from Kaggle. This dataset contains images and metadata for a variety of fashion products.

In [21]:
#!sudo rm -r db fashion-dataset # Uncomment and run this to delete the dataset if it already exists

# Download the dataset if it doesn't exist
!test -d fashion-dataset && test -n "$(ls -A fashion-dataset 2>/dev/null)" && \
  echo "Dataset already exists, skipping download" || \
    (curl -L -o fashion-product-images-small.zip https://www.kaggle.com/api/v1/datasets/download/paramaggarwal/fashion-product-images-small \
    && unzip -q fashion-product-images-small.zip -d fashion-dataset/)

Dataset already exists, skipping download


## Set Scale based on your environment

This tutorial uses Ray locally to build features, which means the scale of concurrent jobs will be limited to the system you're working on. These parameters are good defaults, but feel free to adjust them if you'd like.

In [22]:
# Especially if on Colab, start with just 100 rows for testing.
DATASET_SIZE = 100
# Increase this if you're running locally, you have more CPUs available and want it to run faster.
CONCURRENCY = 4
CHECKPOINT_SIZE = min(300, DATASET_SIZE / 2)

In [23]:
import io
import geneva
from geneva import udf
import lancedb
import pandas as pd
import pyarrow as pa
import numpy as np
from pathlib import Path
from PIL import Image
import torch
import open_clip
from typing import Callable
from transformers import BlipProcessor, BlipForConditionalGeneration

IMG_DIR = Path("fashion-dataset/images")
STYLE_CSV = Path("fashion-dataset/styles.csv")
DB_PATH = "./db"
TABLE_NAME = "products"


Now, let's load the data into a LanceDB table. We'll read the CSV file with the product metadata, and for each product, we'll also load the corresponding image from the `images` directory. We'll then create a LanceDB table and add the data to it in batches. LanceDB can store objects(images in this case) along with vector embeddings and metadata.

In [24]:
# STYLE_CSV has info about the clothes: IDs, descriptions, and image paths
# Images themselves are stored in IMG_DIR; generate_rows will combine them with the metadata
# in STYLE_CSV and load them all in to our LanceDB table.
df = pd.read_csv(STYLE_CSV, on_bad_lines='skip')
df = df.dropna(subset=["id", "productDisplayName"])
df = df.drop_duplicates(subset=["id"], keep="first")

def generate_rows(df, img_dir, start=0, end=DATASET_SIZE):
    for _, row in df.iloc[start:end].iterrows():
        img_path = img_dir / f"{row['id']}.jpg"
        if not img_path.exists():
            continue
        with open(img_path, "rb") as f:
            yield {
                "id": int(row["id"]),
                "description": row["productDisplayName"],
                "image_bytes": f.read()
            }


db = lancedb.connect(DB_PATH)
# Drop the table if it already exists so we can recreate it
try:
    table = db.drop_table(TABLE_NAME)
except ValueError as e:
    pass
    
data_stream = generate_rows(df, IMG_DIR)
table = None

# Create the table and load rows in in batches
rows = []
for row in data_stream:
    rows.append(row)
    if len(rows) == min(1000, DATASET_SIZE / 2):
        if table:
            table.add(rows)
        else:
            # You have to provide schema or some data to create the table. Here we create the 
            # table with the first batch of data for simplicity.
            table = db.create_table(TABLE_NAME, data=rows)
        rows = []
if rows:
    table.add(rows)
    
len(table)

100

## 2. Feature Engineering with Geneva

Now that we have our data in a LanceDB table, we can start engineering features. We'll use Geneva to create new features for our products. 

### Defining geneva UDF

Geneva uses Python User Defined Functions (UDFs) to define features as columns in a Lance dataset. Adding a feature is straightforward:

1. Prototype your Python function in your favorite environment.
2. Wrap the function with small UDF decorator.
3. Register the UDF as a virtual column using Table.add_columns().
4. Trigger a backfill operation

UDFs can work on one row or a batch at a time, and can be stateful (e.g. some work is done to set up a model the first time, and future runs use the same model) or stateless. [Read more about geneva UDFs here.](https://docs.lancedb.com/geneva/udfs)

### Simple Feature Extraction

Let's start with a simple feature: extracting color tags from the product description. We'll define a User-Defined Function (UDF) that takes the product description as input and returns a comma-separated string of colors found in the description.

In [25]:
db = geneva.connect(DB_PATH)
table = db.open_table(TABLE_NAME)

In [26]:
@udf
def color_tags(description: str)-> str:
    colors = ["black", "white", "red", "blue", "green", "yellow", "pink", "brown", "grey", "silver"]
    return ", ".join([c for c in colors if c in description.lower()])

### Adding a Computed Column

Now that we've defined our feature-generating UDF, we can add it to our table as a computed column. Computed columns are computed on-the-fly when you perform a backfill operation.

In [27]:
table.add_columns({
    "color_tags": color_tags,
})

INFO:geneva.table:Adding column: udf={'color_tags': UDF(func=<function color_tags at 0x419233d80>, name='color_tags', cuda=False, num_cpus=1.0, num_gpus=0.0, memory=None, batch_size=None, checkpoint_size=None, task_size=None, error_handling=None, input_columns=['description'], data_type=DataType(string), version='3776862a65c8823be3d91c0e69a4ea6f', _checkpoint_key_override=None, field_metadata={})}


Let's inspect the table schema to see our newly registered UDF.

In [28]:
table.schema

id: int64
description: string
image_bytes: binary
color_tags: string
  -- field metadata --
  virtual_column.udf_backend: 'DockerUDFSpecV1'
  virtual_column.udf: '_udfs/cfbbb9dcbfc95e5b11045f473079dbeac427eedfee72' + 20
  virtual_column.platform.arch: 'arm64'
  virtual_column.udf_inputs: '["description"]'
  virtual_column.platform.python_version: '3.12.12'
  virtual_column.platform.system: 'Darwin'
  virtual_column.udf_name: 'color_tags'
  virtual_column: 'true'

### Backfilling Features

Triggering backfill creates a distributed job to run the UDF and populate the column values in your LanceDB table. The Geneva framework simplifies several aspects of distributed execution.

Environment management: Geneva automatically packages and deploys your Python execution environment to worker nodes. This ensures that distributed execution occurs in the same environment and depedencies as your prototype.

Checkpoints: Each batch of UDF execution is checkpointed so that partial results are not lost in case of job failures. Jobs can resume and avoid most of the expense of having to recalculate values.

`backfill()` accepts various params to customise scale of your workload, here we'll use:

* `checkpoint_size` - the number of rows that are processed before writing a checkpoint
* `concurrency` - how many nodes are used for parallelization

Here, we're using geneva locally, so we won't set up a Ray cluster, but you can also use the same setup and run distributed jobs remotely on Ray clusters.

In [29]:
table.backfill("color_tags", checkpoint_size=CHECKPOINT_SIZE, concurrency=CONCURRENCY)

 [00:00]

[products - color_tags (2 fragments)] Rows checkpointed:   0%|          | 0/100 [00:00<?, ?it/s]

[products - color_tags (2 fragments)] Rows ready for commit:   0%|          | 0/100 [00:00<?, ?it/s]

[products - color_tags (2 fragments)] Rows committed (every 64 fragments):   0%|          | 0/100 [00:00<?, ?i…

[36m(run_ray_add_column_remote pid=83330)[0m [90m[[0m2026-01-09T16:29:14Z [33mWARN [0m lance::dataset::transaction[90m][0m Building manifest with DataReplacement operation. This operation is not stable yet, please use with caution.
[36m(run_ray_add_column_remote pid=83330)[0m Final metric reconciliation timed out; metrics will still be correct on the next tracker flush if the actor stays alive


'85a6a2e8-757e-428d-bc7c-0d74819199ed'

Let's take a look at our enriched data.

In [30]:
table.search().limit(2).to_pandas()

Unnamed: 0,id,description,image_bytes,color_tags
0,15970,Turtle Check Men Navy Blue Shirt,b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00...,blue
1,39386,Peter England Men Party Blue Jeans,b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00...,blue
2,44984,Maxima Women White Dial Watch,b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x01...,white
3,10268,Clarks Men Hang Work Leather Black Formal Shoes,b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00...,black


## 3. Embedding Generation

Now that we have our text-based features, let's create some vector embeddings. Embeddings are numerical representations of data that capture its semantic meaning. We'll create embeddings for our product images and for our new `summary` and `occasion` features.

### Image Embeddings

We'll use a pretrained CLIP model to generate embeddings for our product images. We'll define a UDF that takes a batch of image bytes as input, preprocesses them, and then uses the CLIP model to generate embeddings.

In [31]:
@udf(version="0.1", num_gpus=1 if torch.cuda.is_available() else 0, data_type=pa.list_(pa.float32(), 512))
class GenImageEmbeddings(Callable):

    def __init__(self):
        self.is_loaded=False


    def setup(self):
        self.model, _, self.preprocess = open_clip.create_model_and_transforms("ViT-B-32", pretrained="laion2b_s34b_b79k")
        self.tokenizer = open_clip.get_tokenizer("ViT-B-32")
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        self.model = self.model.to(self.device).eval()

        self.is_loaded=True

    def __call__(self, image_bytes:pa.Array) -> pa.Array:
        if not self.is_loaded:
            self.setup()

        embeddings = []
        for b in image_bytes:
            this_image_bytes = b.as_buffer().to_pybytes()

            image_stream = io.BytesIO(this_image_bytes)
            img = Image.open(image_stream).convert("RGB")
            img_tensor = self.preprocess(img).unsqueeze(0).to(self.device)
            with torch.no_grad():
                emb_tensor = self.model.encode_image(img_tensor)
                emb_tensor /= emb_tensor.norm(dim=-1, keepdim=True)
            np_emb = emb_tensor.squeeze().cpu().numpy().astype(np.float32)

            flat = pa.array(np_emb) # 1D float32 vector of shape (512,)
            embeddings.append(flat)

        stacked = pa.FixedSizeListArray.from_arrays(pa.concat_arrays(embeddings), 512)
        return stacked

In [32]:
@udf(version="0.1", num_gpus=1 if torch.cuda.is_available() else 0, data_type=pa.string())
class GenCaptions(Callable):

    def __init__(self):
        self.is_loaded=False

    def setup(self):
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        self.processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base", use_fast=True)
        self.model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")
        self.model = self.model.to(self.device).eval()
        self.is_loaded=True

    def __call__(self, image_bytes:pa.Array) -> pa.Array:
        if not self.is_loaded:
            self.setup()

        captions = []
        for b in image_bytes:
            this_image_bytes = b.as_buffer().to_pybytes()
            
            image_stream = io.BytesIO(this_image_bytes)
            img = Image.open(image_stream).convert("RGB")
            
            inputs = self.processor(img, return_tensors="pt").to(self.device)
            # Use greedy decoding (num_beams=1) and short max_length for speed in this demo
            with torch.no_grad():
                out = self.model.generate(**inputs, max_length=30, num_beams=3, do_sample=False)
            
            caption = self.processor.decode(out[0], skip_special_tokens=True)
            captions.append(caption)

        return pa.array(captions)

### Adding and Backfilling Embedding and Caption Columns

Now, let's add our new generators as virtual columns and then backfill them.

In [33]:
for column in ["image_embedding", "caption", "caption_embedding"]:
    if (column in table.schema.names):
        table.drop_columns([column])
        
table.add_columns({
    "image_embedding": GenImageEmbeddings(),
    "caption": GenCaptions(),
})

INFO:geneva.table:Adding column: udf={'image_embedding': UDF(func=<__main__.GenImageEmbeddings object at 0x4193deea0>, name='GenImageEmbeddings', cuda=False, num_cpus=1.0, num_gpus=0.0, memory=None, batch_size=None, checkpoint_size=None, task_size=None, error_handling=None, input_columns=['image_bytes'], data_type=FixedSizeListType(fixed_size_list<item: float>[512]), version='0.1', _checkpoint_key_override=None, field_metadata={})}
  self._validate_udf_input_columns(udf, input_columns)
INFO:geneva.table:Adding column: udf={'caption': UDF(func=<__main__.GenCaptions object at 0x3863f53a0>, name='GenCaptions', cuda=False, num_cpus=1.0, num_gpus=0.0, memory=None, batch_size=None, checkpoint_size=None, task_size=None, error_handling=None, input_columns=['image_bytes'], data_type=DataType(string), version='0.1', _checkpoint_key_override=None, field_metadata={})}
  self._validate_udf_input_columns(udf, input_columns)


In [34]:
import logging
import sys
logging.basicConfig(level=logging.INFO, stream=sys.stderr, force=True)

In [35]:
table.backfill("image_embedding", checkpoint_size=CHECKPOINT_SIZE, concurrency=CONCURRENCY)

  validate_backfill_args(self, col_name, udf, read_version=read_version)
  validate_backfill_args(self, col_name, udf, read_version=read_version)


 [00:00]

[products - image_embedding (2 fragments)] Rows checkpointed:   0%|          | 0/100 [00:00<?, ?it/s]

[products - image_embedding (2 fragments)] Rows ready for commit:   0%|          | 0/100 [00:00<?, ?it/s]

[products - image_embedding (2 fragments)] Rows committed (every 64 fragments):   0%|          | 0/100 [00:00<…

[36m(run_ray_add_column_remote pid=83330)[0m [90m[[0m2026-01-09T16:29:59Z [33mWARN [0m lance::dataset::transaction[90m][0m Building manifest with DataReplacement operation. This operation is not stable yet, please use with caution.
[36m(run_ray_add_column_remote pid=83330)[0m Final metric reconciliation timed out; metrics will still be correct on the next tracker flush if the actor stays alive


'edaaf50f-10f0-4b24-b148-439cdc9fa5a7'

In [36]:
# This may take a few minutes if you're running on a CPU.
table.backfill("caption", checkpoint_size=CHECKPOINT_SIZE, concurrency=CONCURRENCY)

  validate_backfill_args(self, col_name, udf, read_version=read_version)
  validate_backfill_args(self, col_name, udf, read_version=read_version)
[36m(run_ray_add_column_remote pid=83330)[0m Error running Ray add column operation
[36m(run_ray_add_column_remote pid=83330)[0m Traceback (most recent call last):
[36m(run_ray_add_column_remote pid=83330)[0m   File "/Users/dantasse/src/geneva/src/geneva/runners/ray/pipeline.py", line 2908, in run_ray_add_column_remote
[36m(run_ray_add_column_remote pid=83330)[0m     validate_backfill_args(
[36m(run_ray_add_column_remote pid=83330)[0m   File "/Users/dantasse/src/geneva/src/geneva/runners/ray/pipeline.py", line 2845, in validate_backfill_args
[36m(run_ray_add_column_remote pid=83330)[0m     udf = tbl._conn._packager.unmarshal(udf_spec)
[36m(run_ray_add_column_remote pid=83330)[0m           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[36m(run_ray_add_column_remote pid=83330)[0m   File "/Users/dantasse/src/geneva/src/geneva/packager/

RayTaskError(ImportError): [36mray::run_ray_add_column_remote()[39m (pid=83330, ip=127.0.0.1)
  File "/Users/dantasse/src/geneva/src/geneva/runners/ray/pipeline.py", line 2956, in run_ray_add_column_remote
    raise e
  File "/Users/dantasse/src/geneva/src/geneva/runners/ray/pipeline.py", line 2908, in run_ray_add_column_remote
    validate_backfill_args(
  File "/Users/dantasse/src/geneva/src/geneva/runners/ray/pipeline.py", line 2845, in validate_backfill_args
    udf = tbl._conn._packager.unmarshal(udf_spec)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dantasse/src/geneva/src/geneva/packager/__init__.py", line 387, in unmarshal
    docker_spec = self.backend(spec)
                  ^^^^^^^^^^^^^^^^^^
  File "/Users/dantasse/src/geneva/src/geneva/packager/__init__.py", line 406, in backend
    return DockerUDFSpecV1.from_bytes(spec.udf_payload)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dantasse/src/geneva/src/geneva/packager/__init__.py", line 172, in from_bytes
    val = cls(**self_as_dict)
          ^^^^^^^^^^^^^^^^^^^
  File "<attrs generated init geneva.packager.DockerUDFSpecV1>", line 7, in __init__
    self.__attrs_post_init__()
  File "/Users/dantasse/src/geneva/src/geneva/packager/__init__.py", line 141, in __attrs_post_init__
    udf = cloudpickle.loads(self.udf_pickle)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dantasse/src/vectordb-recipes/venv/lib/python3.12/site-packages/transformers/models/blip/processing_blip.py", line 22, in <module>
    from ...processing_utils import ProcessingKwargs, ProcessorMixin, Unpack
ImportError: cannot import name 'Unpack' from 'transformers.processing_utils' (/Users/dantasse/src/vectordb-recipes/venv/lib/python3.12/site-packages/transformers/processing_utils.py)

[36m(JobTracker(job_id=edaaf50f-10f0-4b24-b148-439cdc9fa5a7) pid=86132)[0m error saving metrics lance error: Too many concurrent writers. Attempted 5 times, but failed on retry_timeout of 30.000 seconds., /Users/runner/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/lance-1.0.1/src/dataset/write/retry.rs:55:19


In [None]:
table.search().limit(2).to_pandas().iloc[0:3]

### Features on Features

Of course, feature engineering workflows often include chains of features: features that depend on other features we've already computed! Let's make some text embeddings for those captions we just generated.

In [None]:
@udf(version="0.1", num_gpus=1 if torch.cuda.is_available() else 0, data_type=pa.list_(pa.float32(), 512))
class GenTextEmbeddings(Callable):
    def __init__(self):
        self.is_loaded=False

    def setup(self):
        self.model, _, self.preprocess = open_clip.create_model_and_transforms("ViT-B-32", pretrained="laion2b_s34b_b79k")
        self.tokenizer = open_clip.get_tokenizer("ViT-B-32")
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        self.model = self.model.to(self.device).eval()

    def __call__(self, caption: pa.Array) -> pa.Array:
        if not self.is_loaded:
            self.setup()

        embeddings = []
        for this_caption in caption:
            # Convert PyArrow scalar to Python string
            caption_str = this_caption.as_py() if hasattr(this_caption, 'as_py') else str(this_caption)
            # Tokenizer expects a list of strings, not a single string
            tokens = self.tokenizer([caption_str])
            tokens = tokens.to(self.device)
            with torch.no_grad():
                embeddings.append(self.model.encode_text(tokens).squeeze().cpu().numpy().astype(np.float32))

        return pa.array(embeddings)


In [None]:
if "caption_embedding" in table.schema.names:
    table.drop_columns(["caption_embedding"])
    
table.add_columns({
    "caption_embedding": GenTextEmbeddings(),
})
table.backfill("caption_embedding", checkpoint_size=CHECKPOINT_SIZE, concurrency=CONCURRENCY)

In [None]:
table.search().limit(1).to_pandas().iloc[0:3]

## 4. Updating

Let's add some new clothes to our table and rerun the backfills to add all our derived features. This will only recompute our backfills on the new rows. This doesn't save that much time in this tutorial, but it absolutely does in production. Imagine adding new data daily; you won't want to recompute your costly features on all your data every day!

As we do the following backfills, you will notice this as the progress bars start at "1000/1500", reflecting that the original 1000 rows have already been computed.

In [None]:
new_rows = list(generate_rows(df, IMG_DIR, start=DATASET_SIZE, end=DATASET_SIZE+500))
table.add(new_rows)
table.backfill("color_tags", checkpoint_size=CHECKPOINT_SIZE, concurrency=CONCURRENCY)
table.backfill("image_embedding", checkpoint_size=CHECKPOINT_SIZE, concurrency=CONCURRENCY)
table.backfill("caption", checkpoint_size=CHECKPOINT_SIZE, concurrency=CONCURRENCY)
table.backfill("caption_embedding", checkpoint_size=CHECKPOINT_SIZE, concurrency=CONCURRENCY)

In [None]:
table.to_pandas()

## 5. Wrapping up

That's the basics of feature engineering! If you wanted to go on to query this table, you might build indexes, as in the next cell:

In [None]:
# Full-Text Search (FTS) Index: This will allow us to quickly search for keywords in our `caption` column.
table.create_fts_index("caption")
# Vector Index: This will allow us to perform fast similarity searches on our `image_embedding` column.
table.create_index(vector_column_name="image_embedding", num_sub_vectors=128)

And if you want to expand this to run on a Ray cluster, to scale up to production use, see [Execution Contexts](https://docs.lancedb.com/geneva/jobs/contexts) for more info on how to do so.