# Improve Coffee Dataset quality with SAM2 in FiftyOne

## 🏆 Learning Objectives
- Understand how to apply the SAM2 segmentation model.
- Learn how to integrate SAM2 with FiftyOne.
- Visualize segmentation results using FiftyOne.
- Improve the dataset quality with Uniqueness features with FiftyOne

## Requirements
### Knowledge
- Understanding of image segmentation.
- Familiarity with deep learning-based annotation tools.
### Installation
Run the following commands to install necessary dependencies:
```bash
git clone https://github.com/facebookresearch/sam2.git && cd sam2
pip install -e .
pip install fiftyone
```

## 1. Loading the Dataset

In [1]:


import fiftyone as fo # base library and app
import fiftyone.utils.huggingface as fouh # Hugging Face integration
dataset_ = fouh.load_from_hub("pjramg/my_colombian_coffe_FO", persistent=True, overwrite=True)
#dataset = fo.load_dataset("Voxel51/mvtec-ad") # Use this CLI if you already have the dataset 
                                               # in your disk or if this is not the first time you run this notebook 

# Define the new dataset name
dataset_name = "coffee_FO_SAM2_process"

# Check if the dataset exists
if dataset_name in fo.list_datasets():
    print(f"Dataset '{dataset_name}' exists. Loading...")
    dataset = fo.load_dataset(dataset_name)
else:
    print(f"Dataset '{dataset_name}' does not exist. Creating a new one...")
    # Clone the dataset with a new name and make it persistent
    dataset = dataset_.clone(dataset_name, persistent=True)



Downloading config file fiftyone.yml from pjramg/my_colombian_coffe_FO
Loading dataset
Importing samples...
 100% |███████████████| 1593/1593 [28.4ms elapsed, 0s remaining, 56.0K samples/s]   
Dataset 'coffee_FO_SAM2_process' exists. Loading...


## 2. Applying the SAM2 Model

In [None]:
import fiftyone.zoo as foz
model = foz.load_zoo_model("segment-anything-2-hiera-tiny-image-torch")
# Prompt with boxes
dataset.apply_model(
    model,
    label_field="segmentations",
    prompt_field="categories_segmentations",
)

## 3. Visualizing the Results

In [None]:
# Print dataset summary
print(dataset)

# Show some random samples
print("Dataset sample labels:")
print(dataset.first())

session = fo.launch_app(dataset, port=5161, auto=False)

Name:        coffee_FO_SAM2_process
Media type:  image
Num samples: 1593
Persistent:  True
Tags:        []
Sample fields:
    id:                       fiftyone.core.fields.ObjectIdField
    filepath:                 fiftyone.core.fields.StringField
    tags:                     fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:                 fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)
    created_at:               fiftyone.core.fields.DateTimeField
    last_modified_at:         fiftyone.core.fields.DateTimeField
    categories_coco_id:       fiftyone.core.fields.IntField
    categories_segmentations: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
    segmentations:            fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
Dataset sample labels:
<Sample: {
    'id': '67892eeb3adf1dd0f587f860',
    'media_type': 'image',
    'filepath': '/home/paula/fiftyone/huggi

INFO:fiftyone.core.session.session:Session launched. Run `session.show()` to open the App in a cell output.


## 4. Find Uniqueness images

How to use uniqueness detection, similarity search, and embedding visualizations for agricultural AI


In [4]:
import fiftyone.brain as fob

results = fob.compute_similarity(dataset, brain_key="img_sim")
results.find_unique(100)

Computing embeddings...


INFO:fiftyone.brain.internal.core.utils:Computing embeddings...


 100% |███████████████| 1593/1593 [30.8s elapsed, 0s remaining, 52.2 samples/s]      


INFO:eta.core.utils: 100% |███████████████| 1593/1593 [30.8s elapsed, 0s remaining, 52.2 samples/s]      


Computing unique samples...


INFO:fiftyone.brain.similarity:Computing unique samples...


Generating index for 1593 embeddings...


INFO:fiftyone.brain.internal.core.sklearn:Generating index for 1593 embeddings...


Index complete


INFO:fiftyone.brain.internal.core.sklearn:Index complete


threshold: 1.000000, kept: 5, target: 100


INFO:fiftyone.brain.similarity:threshold: 1.000000, kept: 5, target: 100


threshold: 0.500000, kept: 64, target: 100


INFO:fiftyone.brain.similarity:threshold: 0.500000, kept: 64, target: 100


threshold: 0.250000, kept: 184, target: 100


INFO:fiftyone.brain.similarity:threshold: 0.250000, kept: 184, target: 100


threshold: 0.375000, kept: 106, target: 100


INFO:fiftyone.brain.similarity:threshold: 0.375000, kept: 106, target: 100


threshold: 0.437500, kept: 84, target: 100


INFO:fiftyone.brain.similarity:threshold: 0.437500, kept: 84, target: 100


threshold: 0.406250, kept: 98, target: 100


INFO:fiftyone.brain.similarity:threshold: 0.406250, kept: 98, target: 100


threshold: 0.390625, kept: 99, target: 100


INFO:fiftyone.brain.similarity:threshold: 0.390625, kept: 99, target: 100


threshold: 0.382812, kept: 104, target: 100


INFO:fiftyone.brain.similarity:threshold: 0.382812, kept: 104, target: 100


threshold: 0.386719, kept: 101, target: 100


INFO:fiftyone.brain.similarity:threshold: 0.386719, kept: 101, target: 100


threshold: 0.388672, kept: 99, target: 100


INFO:fiftyone.brain.similarity:threshold: 0.388672, kept: 99, target: 100


threshold: 0.387695, kept: 101, target: 100


INFO:fiftyone.brain.similarity:threshold: 0.387695, kept: 101, target: 100


threshold: 0.388184, kept: 100, target: 100


INFO:fiftyone.brain.similarity:threshold: 0.388184, kept: 100, target: 100


Uniqueness computation complete


INFO:fiftyone.brain.similarity:Uniqueness computation complete


In [5]:
vis_results = fob.compute_visualization(dataset, brain_key="img_vis")


Computing embeddings...


INFO:fiftyone.brain.internal.core.utils:Computing embeddings...


 100% |███████████████| 1593/1593 [30.8s elapsed, 0s remaining, 55.0 samples/s]      


INFO:eta.core.utils: 100% |███████████████| 1593/1593 [30.8s elapsed, 0s remaining, 55.0 samples/s]      


Generating visualization...


INFO:fiftyone.brain.visualization:Generating visualization...
2025-03-17 10:53:14.747127: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-03-17 10:53:14.849674: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1742223194.887693    5841 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1742223194.899174    5841 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-03-17 10:53:15.010614: I tensorflow/core/platform/cpu_feature_guard.cc:210] 

UMAP( verbose=True)
Mon Mar 17 10:53:16 2025 Construct fuzzy simplicial set
Mon Mar 17 10:53:17 2025 Finding Nearest Neighbors
Mon Mar 17 10:53:20 2025 Finished Nearest Neighbor Search
Mon Mar 17 10:53:21 2025 Construct embedding


Epochs completed:   0%|            0/500 [00:00]

	completed  0  /  500 epochs
	completed  50  /  500 epochs
	completed  100  /  500 epochs
	completed  150  /  500 epochs
	completed  200  /  500 epochs
	completed  250  /  500 epochs
	completed  300  /  500 epochs
	completed  350  /  500 epochs
	completed  400  /  500 epochs
	completed  450  /  500 epochs
Mon Mar 17 10:53:22 2025 Finished embedding


In [23]:
import fiftyone.brain as fob

fob.compute_uniqueness(dataset)

Downloading model from Google Drive ID '1SIO9XreK0w1ja4EuhBWcR10CnWxCOsom'...


INFO:fiftyone.core.models:Downloading model from Google Drive ID '1SIO9XreK0w1ja4EuhBWcR10CnWxCOsom'...


 100% |████|  100.6Mb/100.6Mb [357.9ms elapsed, 0s remaining, 281.0Mb/s]      


INFO:eta.core.utils: 100% |████|  100.6Mb/100.6Mb [357.9ms elapsed, 0s remaining, 281.0Mb/s]      


Computing embeddings...


INFO:fiftyone.brain.internal.core.utils:Computing embeddings...


 100% |███████████████| 1593/1593 [2.5s elapsed, 0s remaining, 902.8 samples/s]      


INFO:eta.core.utils: 100% |███████████████| 1593/1593 [2.5s elapsed, 0s remaining, 902.8 samples/s]      


Computing uniqueness...


INFO:fiftyone.brain.internal.core.uniqueness:Computing uniqueness...


Generating index for 1593 embeddings...


INFO:fiftyone.brain.internal.core.sklearn:Generating index for 1593 embeddings...


Index complete


INFO:fiftyone.brain.internal.core.sklearn:Index complete


Uniqueness computation complete


INFO:fiftyone.brain.internal.core.uniqueness:Uniqueness computation complete


In [24]:
unique_view = dataset.select(results.unique_ids)
session.view = unique_view

print(unique_view)

Dataset:     coffee_FO_SAM2_process
Media type:  image
Num samples: 100
Sample fields:
    id:                       fiftyone.core.fields.ObjectIdField
    filepath:                 fiftyone.core.fields.StringField
    tags:                     fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:                 fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)
    created_at:               fiftyone.core.fields.DateTimeField
    last_modified_at:         fiftyone.core.fields.DateTimeField
    categories_coco_id:       fiftyone.core.fields.IntField
    categories_segmentations: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
    segmentations:            fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
    auto:                     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
    polished_auto:            fiftyone.core.fields.EmbeddedDocumentFiel

## 5. Pre-annoted with SAM2 in the 100 unique samples:

In [9]:
# Full automatic segmentations
#dataset.apply_model(model, label_field="auto")
unique_view.apply_model(model, label_field="auto")


 100% |█████████████████| 100/100 [2.6m elapsed, 0s remaining, 0.6 samples/s]    


INFO:eta.core.utils: 100% |█████████████████| 100/100 [2.6m elapsed, 0s remaining, 0.6 samples/s]    


In [10]:
session = fo.launch_app(unique_view, port=5161, auto=False)

Session launched. Run `session.show()` to open the App in a cell output.


INFO:fiftyone.core.session.session:Session launched. Run `session.show()` to open the App in a cell output.


## 6. Assign labels to auto-labeling:

In [12]:
import fiftyone as fo
import numpy as np
import torch
import torchvision.transforms as transforms
from PIL import Image
from torchvision.models import resnet18
from sklearn.metrics.pairwise import cosine_similarity

# Load dataset
#dataset = fo.load_dataset("coffee_FO")

# Ensure `polished_auto` field exists
if "polished_auto" not in unique_view.get_field_schema():
    dataset.add_sample_field("polished_auto", fo.EmbeddedDocumentField, embedded_doc_type=fo.Detections)

# Load a pre-trained feature extractor (ResNet18)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = resnet18(pretrained=True).eval().to(device)

# Define preprocessing for the bounding box patches
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

def extract_patch(sample, bbox):
    """ Extracts an image patch corresponding to a bounding box from a sample. """
    image = Image.open(sample.filepath).convert("RGB")
    img_w, img_h = image.size

    # Convert relative bounding box to absolute
    x, y, w, h = bbox
    abs_x, abs_y, abs_w, abs_h = int(x * img_w), int(y * img_h), int(w * img_w), int(h * img_h)

    # Crop and preprocess
    patch = image.crop((abs_x, abs_y, abs_x + abs_w, abs_y + abs_h))
    return transform(patch).unsqueeze(0).to(device)  # Add batch dimension

def compute_embedding(image_patch):
    """ Computes the feature embedding of a cropped bounding box using ResNet. """
    with torch.no_grad():
        features = model(image_patch)
    return features.cpu().numpy().flatten()  # Convert to 1D vector

def compute_iou(boxA, boxB):
    """ Computes Intersection over Union (IoU) between two bounding boxes. """
    xA = max(boxA[0], boxB[0])
    yA = max(boxA[1], boxB[1])
    xB = min(boxA[0] + boxA[2], boxB[0] + boxB[2])
    yB = min(boxA[1] + boxA[3], boxB[1] + boxB[3])

    inter_area = max(0, xB - xA) * max(0, yB - yA)
    boxA_area = boxA[2] * boxA[3]
    boxB_area = boxB[2] * boxB[3]
    union_area = boxA_area + boxB_area - inter_area

    return inter_area / union_area if union_area > 0 else 0

# Step 1: Extract ground truth information from the whole dataset (Embeddings)
ground_truth_boxes = []
y_positions = []
gt_embeddings = []
gt_labels = []

for sample in dataset:
    if sample.categories_segmentations and sample.categories_segmentations.detections:
        for det in sample.categories_segmentations.detections:
            bbox = det.bounding_box
            ground_truth_boxes.append(bbox)
            y_positions.append(bbox[1])  # Store Y positions
            image_patch = extract_patch(sample, bbox)
            gt_embeddings.append(compute_embedding(image_patch))
            gt_labels.append(det.label)

# Convert embeddings list to NumPy array
gt_embeddings = np.array(gt_embeddings) if gt_embeddings else np.empty((0, 512))

# Compute size and Y-axis constraints
box_areas = [w * h for _, _, w, h in ground_truth_boxes]
avg_box_area = np.mean(box_areas)
std_box_area = np.std(box_areas)
lower_size_limit = max(0, avg_box_area - 1.5 * std_box_area)
upper_size_limit = avg_box_area + 1.5 * std_box_area
min_y_gt = min(y_positions) if y_positions else 0
max_y_gt = max(y_positions) if y_positions else 1

# Step 2: Filter auto-generated bounding boxes
for sample in unique_view:
    if sample.auto and sample.auto.detections:
        valid_detections = []
        for detection in sample.auto.detections:
            x, y, bw, bh = detection.bounding_box
            area = bw * bh
            aspect_ratio = bw / bh if bh > 0 else 1
            is_circular = 0.25 <= aspect_ratio <= 0.8  # Keep only circular/elliptical

            if (lower_size_limit <= area <= upper_size_limit and  
                min_y_gt <= y <= max_y_gt and  
                is_circular):  
                valid_detections.append(detection)

        # Step 3: Assign labels using embeddings
        for det in valid_detections:
            image_patch = extract_patch(sample, det.bounding_box)
            embedding = compute_embedding(image_patch)

            if len(gt_embeddings) > 0:
                similarities = cosine_similarity([embedding], gt_embeddings)[0]
                best_match_idx = np.argmax(similarities)
                best_match_label = gt_labels[best_match_idx]
            else:
                best_match_label = "unknown"  # This should not happen

            det.label = best_match_label

        # Save filtered detections in `polished_auto`
        sample["polished_auto"] = fo.Detections(detections=valid_detections)
        sample.save()

print("Filtering and label assignment completed for `polished_auto`.")




Filtering and label assignment completed for `polished_auto`.


In [25]:
print(dataset)
print(unique_view)

Name:        coffee_FO_SAM2_process
Media type:  image
Num samples: 1593
Persistent:  True
Tags:        []
Sample fields:
    id:                       fiftyone.core.fields.ObjectIdField
    filepath:                 fiftyone.core.fields.StringField
    tags:                     fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:                 fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)
    created_at:               fiftyone.core.fields.DateTimeField
    last_modified_at:         fiftyone.core.fields.DateTimeField
    categories_coco_id:       fiftyone.core.fields.IntField
    categories_segmentations: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
    segmentations:            fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
    auto:                     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
    polished_auto:            fiftyo

In [26]:
session = fo.launch_app(unique_view, port=5161, auto=False)

Session launched. Run `session.show()` to open the App in a cell output.


INFO:fiftyone.core.session.session:Session launched. Run `session.show()` to open the App in a cell output.


In [None]:
# Step 1: Duplicate `polished_auto` into `polished_auto_export`
if "polished_auto_export" not in unique_view.get_field_schema():
    unique_view.add_sample_field("polished_auto_export", fo.EmbeddedDocumentField, embedded_doc_type=fo.Detections)

for sample in unique_view:
    if sample["polished_auto"]:
        sample["polished_auto_export"] = sample["polished_auto"].copy()  # Create a true duplicate
    else:
        sample["polished_auto_export"] = None  # Ensure field exists
    sample.save()

print("Duplicated `polished_auto` to `polished_auto_export`.")

# Step 2: Clean `polished_auto_export` to remove `score` and `confidence`
def clean_detections(sample, label_field):
    """Removes 'score' and 'confidence' fields to fix COCO export issues."""
    if sample[label_field] and sample[label_field].detections:
        for det in sample[label_field].detections:
            if hasattr(det, "attributes"):
                det.attributes.pop("score", None)  # Remove score field
                det.attributes.pop("confidence", None)  # Remove confidence field
            if hasattr(det, "score"):
                delattr(det, "score")  # Delete score if it exists
            if hasattr(det, "confidence"):
                delattr(det, "confidence")  # Delete confidence if it exists
            det["iscrowd"] = 0  # Ensure compatibility with COCO format
    return sample

# Apply cleaning function
for sample in unique_view:
    clean_detections(sample, "polished_auto_export")
    sample.save()

print("Cleaned `polished_auto_export` to remove conflicting fields.")

# Step 3: Export dataset in ...... format

In [21]:
unique_view = dataset.select(results.unique_ids)
session.view = unique_view

new_dataset= unique_view.clone()
print(new_dataset)

Name:        2025.03.17.13.05.22
Media type:  image
Num samples: 100
Persistent:  False
Tags:        []
Sample fields:
    id:                       fiftyone.core.fields.ObjectIdField
    filepath:                 fiftyone.core.fields.StringField
    tags:                     fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:                 fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)
    created_at:               fiftyone.core.fields.DateTimeField
    last_modified_at:         fiftyone.core.fields.DateTimeField
    categories_coco_id:       fiftyone.core.fields.IntField
    categories_segmentations: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
    segmentations:            fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
    auto:                     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
    polished_auto:            fiftyone.

In [None]:
# Step 3: Export dataset in COCO format
export_dir = "100_unique_coffee_coco"
new_dataset.export(
    export_dir=export_dir,
    dataset_type = fo.types.COCODetectionDataset,
    label_field="polished_auto_export",  # Use cleaned duplicate field
    include_media=True,  # Export images along with annotations
)

In [None]:
# Step 3: Export dataset in CVAT format
export_dir = "100_unique_coffee_cvat"
new_dataset.export(
    export_dir=export_dir,
    dataset_type=fo.types.CVATImageDataset,
    label_field="polished_auto_export",  # Use cleaned duplicate field
    include_media=True,  # Export images along with annotations
)

In [22]:
# Step 3: Export dataset in CVAT format
export_dir = "100_unique_coffee_FO"
new_dataset.export(
    export_dir=export_dir,
    dataset_type=fo.types.FiftyOneDataset,
    label_field="polished_auto_export",  # Use cleaned duplicate field
    include_media=True,  # Export images along with annotations
)

Ignoring unsupported parameter 'include_media'




Exporting samples...


INFO:fiftyone.utils.data.exporters:Exporting samples...


 100% |████████████████████| 100/100 [164.3ms elapsed, 0s remaining, 608.7 docs/s]     


INFO:eta.core.utils: 100% |████████████████████| 100/100 [164.3ms elapsed, 0s remaining, 608.7 docs/s]     


### Optional you can send images to CVAT for fixing annotations

In [None]:
# We need to send the 100 uniqueness samples, but in this example we 
# Randomly select 5 samples to load to CVAT
unique_5_view = unique_view.take(5)

# A unique identifer for this run
anno_key = "segs_run"

# Upload the samples and launch CVAT
anno_results = unique_5_view.annotate(
    anno_key,
    label_field="auto",
    label_type="instances",
    classes=["immature", "mature", "overmature", "semimature"],
    launch_editor=True,
    url="https://cvat.ai",
    username="your_user_name",
    password="your_password",
)

![Image](https://github.com/user-attachments/assets/498d632a-c93a-41d7-82da-a81d6c29bbdf)

## Next Steps
- Fine-tune the SAM2 model for improved segmentation.
- Integrate additional annotation tools with FiftyOne.
- Explore active learning workflows for improving dataset quality.