# Compare object detection and panoptic segmentation

Understand when to use bounding boxes versus pixel-level masks for image analysis.

**What's in this recipe:**
- Run object detection to get bounding boxes and labels
- Run panoptic segmentation to get pixel-level masks
- Visualize and compare outputs side-by-side


## Problem

You need to analyze objects in images, but there are two approaches:

| Approach | Output | Example |
|----------|--------|---------|
| Object Detection | Bounding boxes | "Car at [100, 200, 300, 400]" |
| Panoptic Segmentation | Pixel masks | "These 45,000 pixels are a car" |

Which should you use? Detection is faster but approximate. Segmentation is slower but precise.


## Solution

Run both approaches on the same images using DETR models and compare the results.

### Setup


In [None]:
%pip install -qU pixeltable torch transformers


: 

In [None]:
import numpy as np

import pixeltable as pxt
from pixeltable.functions.huggingface import detr_for_object_detection, detr_for_segmentation
from pixeltable.functions.vision import draw_bounding_boxes, overlay_segmentation


### Load images


In [None]:
pxt.drop_dir('detection_vs_seg', force=True)
pxt.create_dir('detection_vs_seg')


In [None]:
images = pxt.create_table('detection_vs_seg.images', {'image': pxt.Image})

base_url = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images'
images.insert([
    {'image': f'{base_url}/000000000034.jpg'},
    {'image': f'{base_url}/000000000049.jpg'},
])


### Run object detection

The `detr_for_object_detection` function returns bounding boxes, labels, and confidence scores.

**Parameters:**
- `model_id`: DETR variant (`facebook/detr-resnet-50` or `facebook/detr-resnet-101`)
- `threshold`: Confidence threshold (0.0-1.0). Higher = fewer but more confident detections

**Output:**
```python
{'boxes': [[x1, y1, x2, y2], ...], 'scores': [0.98, ...], 'label_text': ['person', ...]}
```


In [None]:
images.add_computed_column(
    detections=detr_for_object_detection(
        images.image,
        model_id='facebook/detr-resnet-50',
        threshold=0.8
    )
)


In [None]:
# View detection results
images.select(images.image, images.detections).collect()


### Visualize detections with bounding boxes

Use `draw_bounding_boxes` to overlay the detection results on the original image.


In [None]:
images.add_computed_column(
    detection_viz=draw_bounding_boxes(
        images.image,
        boxes=images.detections.boxes,
        labels=images.detections.label_text,
        fill=True,
        width=2
    )
)


In [None]:
images.select(images.detection_viz).collect()


### Run panoptic segmentation

The `detr_for_segmentation` function returns pixel-level masks and segment metadata.

**Parameters:**
- `model_id`: Segmentation model (`facebook/detr-resnet-50-panoptic`)
- `threshold`: Confidence threshold for filtering segments

**Output:**
```python
{
    'segmentation': np.ndarray,  # (H, W) array where each pixel = segment ID
    'segments_info': [{'id': 1, 'label_text': 'person', 'score': 0.98}, ...]
}
```


In [None]:
images.add_computed_column(
    segmentation=detr_for_segmentation(
        images.image,
        model_id='facebook/detr-resnet-50-panoptic',
        threshold=0.5
    )
)


In [None]:
# View segment info
images.select(images.segmentation.segments_info).collect()


### Visualize segmentation with colored overlay

Use `overlay_segmentation` to visualize the pixel masks with colored regions and contours.


In [None]:
# Cast the segmentation array to the proper type for overlay_segmentation
segmentation_map = images.segmentation.segmentation.astype(pxt.Array[(None, None), np.int32])

images.add_computed_column(
    segmentation_viz=overlay_segmentation(
        images.image,
        segmentation_map,
        alpha=0.5,
        draw_contours=True,
        contour_thickness=2
    )
)


In [None]:
images.select(images.segmentation_viz).collect()


### Compare side-by-side


In [None]:
images.select(
    images.image,
    images.detection_viz,
    images.segmentation_viz
).collect()


### Count objects per image


In [None]:
images.select(
    images.image,
    num_detections=images.detections.boxes.apply(len, col_type=pxt.Int),
    num_segments=images.segmentation.segments_info.apply(len, col_type=pxt.Int)
).collect()


## Explanation

Detection gives fast, approximate locations. Segmentation gives slower but precise boundaries.

### Capability comparison

| Use Case | Detection | Segmentation |
|----------|-----------|--------------|
| Object counting | Yes | Yes |
| Object localization | Yes | Yes |
| Precise boundaries | No | Yes |
| Background removal | No | Yes |
| Scene composition | No | Yes |
| Speed priority | Yes | No |

### Performance tradeoffs

| Metric | Detection | Segmentation |
|--------|-----------|--------------|
| Inference time | ~100ms | ~200ms |
| Output size | ~1KB | ~1MB+ |

### When to use each

**Choose detection when:**
- You need to know *what* objects are present and *where* (approximately)
- Speed matters (detection is 2x faster)
- You need search, filtering, or counting
- Bounding boxes suffice for visualization

**Choose segmentation when:**
- You need *exact* object boundaries (pixel-perfect masks)
- You're doing image editing, compositing, or AR
- You need to measure actual object area/coverage
- You want scene composition analysis (what % is sky vs buildings)


## See also

- [Detect objects in images](./img-detect-objects) - Object detection with YOLOX
- [Visualize detections](./img-visualize-detections) - Draw bounding boxes and labels
- [DETR documentation](https://huggingface.co/docs/transformers/model_doc/detr) - Hugging Face model docs
