# Segment Anything with FiftyOne
In this notebook, you will use Segment Anything to segment images from a downloaded dataset and then examine the predicted segments using FiftyOne.

## Setup

### Environment setup
To begin, create a virtual environment using your tool of choice, and if following along in this notebook, make sure to run the following commands in your activated virtual environment to enable the virtual environment to be used by the notebook:

```
$ pip install ipykernel
$ python -m ipykernel install --user --name=fiftyoneSAMenv
```

In [None]:
!pip install fiftyone git+https://github.com/facebookresearch/segment-anything.git torch torchvision opencv-python numpy==1.24.4

In [None]:
from copy import deepcopy

import cv2
import fiftyone as fo
import fiftyone.zoo as foz
import numpy as np
from segment_anything import SamPredictor, sam_model_registry
import torch

### Segment Anything Setup
After installing and importing the dependencies above, download the [default Segment Anything model checkpoint](https://github.com/facebookresearch/segment-anything?tab=readme-ov-file#model-checkpoints) to the same directory as this notebook.

The cells below will load your downloaded SAM checkpoint into a model instance called `sam`, which will then be used to create a `SamPredictor` object. If you don't have a CUDA-enabled GPU, skip the third cell in this sequence.

In [None]:
sam_checkpoint = "sam_vit_h_4b8939.pth"
model_type = "vit_h"

In [None]:
sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)

In [None]:
# Only run this if you have a CUDA-enabled GPU
sam.to(device="cuda")

In [None]:
predictor = SamPredictor(sam)

### Dataset Setup
In the next cell, you use FiftyOne's data zoo to download the `quickstart` dataset. If you've previously downloaded it, the `load_zoo_dataset()` call will just load the dataset into memory from disk. Then, to conserve time, the next line will take a slice of 10 images from that dataset, creating a [DatasetView](https://docs.voxel51.com/user_guide/using_views.html) that you will use for the rest of this tutorial's operations.

In [None]:
dataset.delete()

In [None]:
dataset = foz.load_zoo_dataset("quickstart")
sliced_view = dataset[:10]

## Segment the Sliced DatasetView
Next, you will use Segment Anything to generate segment masks for each image in the sliced dataset view. While Segment Anything can segment the entire image, you will use the ground truth bounding box labels in the dataset as prompts to target the segment.

### Segmenting a Single Sample
The collection of cells below will show you how to get segment masks for every detection in a single image sample, load them into the sample, and then view them in FiftyOne. After that, you'll learn how to put it all together for a whole dataset. First, grab the first image in the sliced dataset.

In [None]:
sample = sliced_view.first()

Next, use OpenCV to open the image and change the color format from OpenCV's default BGR to RGB. The call to `set_image()` will generate embeddings for the `SamPredictor` to generate masks for.

In [None]:
image = cv2.imread(sample["filepath"])
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
predictor.set_image(image)

This next cell is where the magic happens. In it, you will iterate through all of the ground truth detections in the sample image. For each detection, you will use a helper function in the `fiftyone.utils.voc` module to convert from FiftyOne's relative xywh format to an absolute xyxy format accepted by Segment Anything, and then use that as a prompt and generate a mask and score for that mask. Then, you'll make a copy of the ground truth detection and add the mask and its score to it. After the loop exits, you use the `predictions` list to construct a `Detections` object and add that to a new key in the sample called `predictions`.

Note that you must do some transformations of the returned mask to render it properly in FiftyOne. First, you need to use `mask[0]` to get a 2-dimensional representation of the mask. Then, the mask needs to be trimmed to the size of the input box. Since the mask is a NumPy array, you can use list slicing to do that, with the slices being the range of the y coordinates and the range of the x coordinates.

The very last step is to save the sample so that the new predictions can be loaded in FiftyOne.

In [None]:
predictions = []
h, w, _ = image.shape
for detection in sample["ground_truth"]["detections"]:
    input_bbox = fo.utils.voc.VOCBoundingBox.from_detection_format(detection["bounding_box"], (w, h))
    mask, score, _ = predictor.predict(
        box=np.array([input_bbox.xmin, input_bbox.ymin, input_bbox.xmax, input_bbox.ymax]),
        multimask_output=False,
    )
    prediction = deepcopy(detection)
    prediction["mask"] = mask[0][input_bbox.ymin:input_bbox.ymax+1, input_bbox.xmin:input_bbox.xmax+1]
    prediction["confidence"] = score
    predictions.append(prediction)
sample["predictions"] = fo.Detections(detections=predictions)
sample.save()

#### Start FiftyOne
Next, start FiftyOne on the sliced dataset view you created earlier. Make sure the "predictions" checkbox is toggled on, and you should be able to see the instance segmentations detected by Segment Anything on the first image.

In [None]:
session = fo.launch_app(sliced_view)

### Segmenting the Dataset
Now that the hard work is done, all that is left is to assemble the pieces you already implemented into another loop to generate segments for the whole dataset view.

In [None]:
for sample in sliced_view:
    image = cv2.imread(sample["filepath"])
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    predictor.set_image(image)
    predictions = []
    h, w, _ = image.shape
    for detection in sample["ground_truth"]["detections"]:
        input_bbox = fo.utils.voc.VOCBoundingBox.from_detection_format(detection["bounding_box"], (w, h))
        mask, score, _ = predictor.predict(
            box=np.array([input_bbox.xmin, input_bbox.ymin, input_bbox.xmax, input_bbox.ymax]),
            multimask_output=False,
        )
        prediction = deepcopy(detection)
        prediction["mask"] = mask[0][input_bbox.ymin:input_bbox.ymax+1, input_bbox.xmin:input_bbox.xmax+1]
        prediction["confidence"] = score
        predictions.append(prediction)
    sample["predictions"] = fo.Detections(detections=predictions)
    sample.save()

Now refresh the current session.

In [None]:
session.refresh()

And that's it! In this tutorial you learned how to:
- Load FiftyOne and Segment Anything
- Use ground truth detections in a FiftyOne dataset to prompt Segment Anything for segmentation masks
- Inspect those masks within the FiftyOne application