Detect objects in images, expand bounding boxes with padding or a scale factor, crop to specific aspect ratios, and keep detection data aligned after transforms.

## Problem

Object detection gives you tight bounding boxes, but downstream tasks need more flexibility:

| Task | Challenge |
|------|-----------|
| Product thumbnails | Bounding boxes clip the subject; need padding for context |
| Social media repurposing | Source is 16:9, need 9:16 and 1:1 crops centred on the subject |
| Resize + re-detect | After resizing or cropping, bounding box coordinates no longer align |
| Video reframing | Need to spatially crop a video around a detected subject |

## Solution

**What's in this recipe:**

- **`expand_bbox`** — Add pixel padding or a scale factor around a bounding box
- **`fit_bbox_to_aspect`** — Compute an aspect-ratio-matching crop region centred on a subject
- **`rescale_bbox`** — Keep bounding boxes aligned after `resize()`
- **`offset_bbox`** — Keep bounding boxes aligned after `crop()`
- **`video.crop()`** — Spatially crop a video, optionally resizing to a target resolution

All of these are computed columns — they run automatically when you insert new data.

### Setup

In [None]:
%pip install -qU pixeltable-yolox

# Use local pixeltable from this branch (remove these two lines when the
# UDFs ship in a released version of pixeltable).
import sys, os
sys.path.insert(0, os.path.abspath(os.path.join(os.getcwd(), '..', '..', '..', '..', '..')))

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.3[0m[39;49m -> [0m[32;49m26.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [2]:
import pixeltable as pxt
from pixeltable.functions.yolox import yolox
from pixeltable.functions.image import (
    expand_bbox,
    fit_bbox_to_aspect,
    offset_bbox,
    rescale_bbox,
)

ImportError: cannot import name 'expand_bbox' from 'pixeltable.functions.image' (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages/pixeltable/functions/image.py)

### Load images

In [None]:
pxt.drop_dir('crop_demo', force=True)
pxt.create_dir('crop_demo')

In [None]:
images = pxt.create_table('crop_demo/images', {'image': pxt.Image})

In [None]:
image_urls = [
    'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000036.jpg',
    'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000090.jpg',
    'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000106.jpg',
]

images.insert([{'image': url} for url in image_urls])

### Detect objects

Run YOLOX to get bounding boxes and class labels. Then extract the best (highest-confidence) detection per image.

In [None]:
images.add_computed_column(
    detections=yolox(images.image, model_id='yolox_m', threshold=0.5)
)

In [None]:
@pxt.udf
def best_bbox(detections: dict) -> tuple[int, int, int, int] | None:
    """Return the bounding box of the highest-confidence detection."""
    scores = detections.get('scores', [])
    bboxes = detections.get('bboxes', [])
    if not scores:
        return None
    idx = scores.index(max(scores))
    b = bboxes[idx]
    return (round(b[0]), round(b[1]), round(b[2]), round(b[3]))


images.add_computed_column(bbox=best_bbox(images.detections))

In [None]:
images.select(images.image, images.bbox).collect()

## Expand bounding boxes with padding

Detection boxes are often tight to the subject. `expand_bbox` lets you add breathing room before cropping — either as a **scale factor** (e.g. 1.4 = 40% bigger) or **pixel padding** (e.g. 30px on every side).

In [None]:
# Expand the bounding box by 40% around the centre
images.add_computed_column(
    bbox_expanded=expand_bbox(
        images.bbox, images.image.width, images.image.height,
        margin_factor=1.4,
    )
)

# Crop: tight (original bbox) vs. expanded
images.add_computed_column(crop_tight=images.image.crop(images.bbox))
images.add_computed_column(crop_expanded=images.image.crop(images.bbox_expanded))

In [None]:
# Compare tight crop vs expanded crop side by side
images.select(images.crop_tight, images.crop_expanded).collect()

You can also use pixel padding instead of (or in addition to) a scale factor:

In [None]:
# Add 30 pixels on every side
images.add_computed_column(
    bbox_padded=expand_bbox(
        images.bbox, images.image.width, images.image.height,
        padding=30,
    )
)
images.add_computed_column(crop_padded=images.image.crop(images.bbox_padded))

images.select(images.crop_tight, images.crop_padded).collect()

## Crop to a target aspect ratio

`fit_bbox_to_aspect` computes a crop region that **contains the subject** and **matches a target aspect ratio**. The region is centred on the bounding box and clamped to image bounds.

This is the key building block for social-media repurposing: generate 9:16, 1:1, and 4:5 crops from a single source image.

In [None]:
# Compute crop regions for three different aspect ratios
images.add_computed_column(
    box_9x16=fit_bbox_to_aspect(
        images.bbox, images.image.width, images.image.height,
        aspect_ratio='9:16',
    )
)
images.add_computed_column(
    box_1x1=fit_bbox_to_aspect(
        images.bbox, images.image.width, images.image.height,
        aspect_ratio='1:1',
    )
)
images.add_computed_column(
    box_4x5=fit_bbox_to_aspect(
        images.bbox, images.image.width, images.image.height,
        aspect_ratio='4:5',
    )
)

# Apply the crops
images.add_computed_column(crop_9x16=images.image.crop(images.box_9x16))
images.add_computed_column(crop_1x1=images.image.crop(images.box_1x1))
images.add_computed_column(crop_4x5=images.image.crop(images.box_4x5))

In [None]:
images.select(images.crop_9x16, images.crop_1x1, images.crop_4x5).collect()

## Keep bounding boxes aligned after transforms

When you resize or crop an image, the original bounding box coordinates no longer match the new pixel grid. Pixeltable provides two utilities to fix this:

- **`rescale_bbox`** — after `resize()`, scales coordinates proportionally
- **`offset_bbox`** — after `crop()`, offsets coordinates into the cropped image's space

### After resize

In [None]:
# Resize images to 320x240
images.add_computed_column(resized=images.image.resize((320, 240)))

# Rescale the bounding box to match the new dimensions
images.add_computed_column(
    bbox_resized=rescale_bbox(
        images.bbox,
        [images.image.width, images.image.height],
        (320, 240),
    )
)

# Crop the resized image using the rescaled bbox
images.add_computed_column(
    crop_from_resized=images.resized.crop(images.bbox_resized)
)

In [None]:
images.select(images.resized, images.bbox_resized, images.crop_from_resized).collect()

### After crop

In [None]:
# After cropping to 1:1, translate the original bbox into the cropped image's coordinate space
images.add_computed_column(
    bbox_in_crop=offset_bbox(images.bbox, images.box_1x1)
)

In [None]:
images.select(images.crop_1x1, images.bbox_in_crop).collect()

## Crop videos around detected subjects

The same workflow extends to videos. Detect on a representative frame, compute the crop region, then apply `video.crop()` which uses ffmpeg under the hood.

> **Requires:** `ffmpeg` installed and in PATH

In [None]:
videos = pxt.create_table('crop_demo/videos', {'video': pxt.Video})

In [None]:
# Extract first frame for detection
videos.add_computed_column(
    first_frame=videos.video.extract_frame(timestamp=0.0)
)

# Run detection on the frame
videos.add_computed_column(
    detections=yolox(videos.first_frame, model_id='yolox_m', threshold=0.5)
)

# Get the best bounding box
videos.add_computed_column(bbox=best_bbox(videos.detections))

# Compute a 9:16 crop region centred on the subject
videos.add_computed_column(
    crop_box=fit_bbox_to_aspect(
        videos.bbox, videos.first_frame.width, videos.first_frame.height,
        aspect_ratio='9:16',
    )
)

# Crop the video and resize to 1080x1920
videos.add_computed_column(
    cropped_video=videos.video.crop(videos.crop_box, target_size=(1080, 1920))
)

In [None]:
# Insert a sample video to see it in action.
# Replace with your own file path or URL.
# videos.insert(video='path/to/video.mp4')
# videos.select(videos.first_frame, videos.bbox, videos.cropped_video).collect()

## Explanation

### Bounding box utilities

| Function | Purpose | Key parameters |
|----------|---------|----------------|
| `expand_bbox` | Add breathing room around a detection | `margin_factor=1.3` (scale), `padding=20` (pixels) |
| `fit_bbox_to_aspect` | Compute a crop region matching a target aspect ratio | `aspect_ratio='9:16'` |
| `rescale_bbox` | Adjust bbox after `resize()` | `from_size`, `to_size` |
| `offset_bbox` | Adjust bbox after `crop()` | `crop_box` — the region that was cropped |
| `video.crop()` | Spatially crop a video via ffmpeg | `box`, optional `target_size` |

### How `fit_bbox_to_aspect` works

1. Centres on the bounding box
2. Expands the smaller dimension to match the target aspect ratio
3. Constrains to frame bounds (shifts if needed, never goes out of frame)
4. Returns a crop box in `(left, upper, right, lower)` format — ready for `image.crop()` or `video.crop()`

### All bounding boxes use PIL convention

Every function uses `(left, upper, right, lower)` — the same format as `PIL.Image.crop()` and Pixeltable's `image.crop()`. This means you can chain detection → expand → fit → crop without coordinate conversion.

## See also

- [Detect objects in images](https://docs.pixeltable.com/howto/cookbooks/images/img-detect-objects) — YOLOX object detection basics
- [Extract frames from videos](https://docs.pixeltable.com/howto/cookbooks/video/video-extract-frames) — Frame extraction for video analysis
- [PIL image transforms](https://docs.pixeltable.com/howto/cookbooks/images/img-pil-transforms) — Built-in image operations