In [None]:
!pip install fiftyone umap-learn
!pip install git+https://github.com/huggingface/transformers.git#egg=transformers
!pip install shapely

In this tutorial we'll make use of the [RIS-LAD](https://huggingface.co/datasets/Voxel51/RIS-LAD) dataset. [RIS-LAD is the first fine-grained benchmark](https://arxiv.org/abs/2507.20920) designed specifically for low-altitude drone image segmentation.

The dataset features 13,871 annotations with image-text-mask triplets captured from real drone footage at 30-100 meter altitudes with oblique viewing angles. Unlike existing remote sensing datasets that rely on high-altitude satellite imagery, RIS-LAD focuses on the visual complexities of low-altitude drone perception. These challenges include perspective changes, densely packed tiny objects, variable lighting conditions, and the notorious problems of **category drift** (tiny targets causing confusion with larger, semantically similar objects) and **object drift** (difficulty distinguishing among crowded same-class instances) that plague crowded aerial scenes.

This benchmark addresses the gap in understanding how Visual AI systems see the world from a drone's perspective.

You can download the dataset from the Hugging Face Hub as follows:

In [None]:
import fiftyone as fo
from fiftyone.utils.huggingface import load_from_hub

dataset = load_from_hub(
    "Voxel51/RIS-LAD",
    overwrite=True,
    persistent=True
)

This dataset is in [FiftyOne format](https://docs.voxel51.com/user_guide/using_datasets.html). 

FiftyOne provides powerful functionality to inspect, search, and modify it from a [Dataset](https://docs.voxel51.com/api/fiftyone.core.dataset.html#fiftyone.core.dataset.Dataset)-wide down to a [Sample](https://docs.voxel51.com/api/fiftyone.utils.data.html#fiftyone.utils.data.Sample) level.

To see the schema of this dataset, you can simply call the Dataset as follows:

In [None]:
dataset

A FiftyOne dataset is comprised of [Samples](https://docs.voxel51.com/api/fiftyone.utils.data.html#fiftyone.utils.data.Sample).  

Samples store all information associated with a particular piece of data in a dataset, including basic metadata about the data, one or more sets of labels, and additional features associated with subsets of the data and/or label sets.

The attributes of a Sample are called [Fields](https://docs.voxel51.com/api/fiftyone.core.fields.html#fiftyone.core.fields.Field), which stores information about the Sample. When a new Field is assigned to a Sample in a Dataset, it is automatically added to the datasetâ€™s schema and thus accessible on all other samples in the dataset.

To see the schema of a single Sample and the contents of its Fields, you can call the [`first()` method](https://docs.voxel51.com/api/fiftyone.core.dataset.html#fiftyone.core.dataset.Dataset.first):

In [None]:
dataset.first()

You can use the FiftyOne SDK to quickly compute some high-level statistics about your dataset with it's [built-in Aggregration methods](https://docs.voxel51.com/user_guide/using_aggregations.html).

For example, you can use the [`count()` aggregation](https://docs.voxel51.com/api/fiftyone.core.collections.html#fiftyone.core.collections.SampleCollection.count) to compute the number of non-None field values in a collection:

In [None]:
dataset.count("ground_truth.detections.label")

In [None]:
dataset.count("ground_truth.detections.referring_expression")

You can use the [`count_values()` aggregation](https://docs.voxel51.com/api/fiftyone.core.collections.html#fiftyone.core.collections.SampleCollection.count_values) to compute the occurrences of field values in a collection:

In [None]:
dataset.count_values("ground_truth.detections.label")

You can use the [`distinct()` aggregation](https://docs.voxel51.com/api/fiftyone.core.collections.html#fiftyone.core.collections.SampleCollection.distinct) to compute the distinct values of a field in a collection:

In [None]:
len(dataset.distinct("ground_truth.detections.referring_expression"))

### Adding a new Field to the Dataset

A useful piece of information to have about a sample is the number of detection labels in that sample.  You can easily add this to each sample in your Dataset using a `ViewField` expression.  

[`ViewField`](https://docs.voxel51.com/api/fiftyone.core.expressions.html#fiftyone.core.expressions.ViewField) and [`ViewExpression`](https://docs.voxel51.com/api/fiftyone.core.expressions.html#fiftyone.core.expressions.ViewExpression) classes allow you to use native Python operators to define expression. Simply wrap the target field of your sample in a `ViewField` and then apply comparison, logic, arithmetic or array operations to it to create a `ViewExpression`

The idiomatic FiftyOne way to count the number of instance labels in a sample is to use a `ViewField` expression to access the list of labels and then use `.length()` to count them.

To add the number of instances per image as a field on each sample in your dataset, you can use FiftyOne's [`set_values()`](https://docs.voxel51.com/api/fiftyone.core.dataset.html#fiftyone.core.dataset.Dataset.set_values) method. This will efficiently compute and store the count for each sample.

You can learn more about creating Dataset Views [in these docs](https://docs.voxel51.com/user_guide/using_views.html).

In [None]:
import fiftyone as fo
from fiftyone import ViewField as F

num_instances = dataset.values(F("ground_truth.detections").length())

dataset.set_values("num_instances", num_instances)

dataset.save()

In a similar manner, you can count the number of unique instance types for each sample in your Dataset:

In [None]:
from fiftyone import ViewField as F

labels_per_sample = dataset.values("ground_truth.detections.label")

num_distinct_labels_per_sample = [len(set(labels)) if labels else 0 for labels in labels_per_sample]

dataset.set_values("num_unique_instances", num_distinct_labels_per_sample)

dataset.save()

You can then combine these values together to create a complexity score for each Sample in your Dataset. As a simple example you can define the complexity score as number of instances + number of unique instance types. Note that the [`.values()` method](https://docs.voxel51.com/api/fiftyone.core.dataset.html#fiftyone.core.dataset.Dataset.values) is used for efficiently extracting a slice of field across all Samples in a Dataset.

In [None]:
unique_instance_counts = dataset.values("num_unique_instances")

num_instances_values = dataset.values("num_instances")

# Compute complexity scores for all samples
complexity_scores = [nd + nul for nd, nul in zip(num_instances_values, unique_instance_counts)]

# Set the values
dataset.set_values("complexity_score", complexity_scores)

dataset.save()

There's a lot of interesting and non-trival things, like those shown above, that you can do with Fiftyone. Here are some additional resources for you to check out later:

- For those familar with `pandas` you may want to check out this [pandas v FiftyOne cheat sheet](https://docs.voxel51.com/cheat_sheets/pandas_vs_fiftyone.html) to learn how to you can translate common pandas operations into FiftyOne syntax. 

- How to [create Views of your Dataset](https://docs.voxel51.com/cheat_sheets/views_cheat_sheet.html) 

- [Filtering cheat sheet docs](https://docs.voxel51.com/cheat_sheets/filtering_cheat_sheet.html)

Of course, the most interesting part of FiftyOne is [the FiftyOne App](https://docs.voxel51.com/user_guide/app.html#using-the-fiftyone-app) (which runs locally on your machine). Something that can help us in exploring our Dataset in the App is [the Dashboard plugin](https://docs.voxel51.com/plugins/plugins_ecosystem/dashboard.html). You can install the Plugin as follows:

In [None]:
!fiftyone plugins download https://github.com/voxel51/fiftyone-plugins --plugin-names @voxel51/dashboard

FiftyOne is open-source and hackable, and it has a robust framework for [building Plugins](https://docs.voxel51.com/plugins/developing_plugins.html), which allow you to extend and customize the functionality of the core tool to suit your specific needs.  FiftyOne has integrations with various computer vision models and other popular AI tools, [browse this curated collection of plugins](https://docs.voxel51.com/plugins/) to see how you can transform FiftyOne into a bespoke visual AI development workbench.

To launch the FiftyOne App, all you need to do is run the following:

In [None]:
session = fo.launch_app(dataset, auto=False)
session.url

<img src="ris_lad_in_fo_1.gif">


Of course, you can go deeper in the analysis of your dataset by [visualizing image embeddings](https://docs.voxel51.com/brain.html#visualizing-embeddings) in the App. You can use one of the the models from the [FiftyOne Model Zoo](https://docs.voxel51.com/model_zoo/overview.html), or a custom model which you can integrate as a [Remote Zoo Model](https://docs.voxel51.com/model_zoo/remote.html#remotely-sourced-zoo-models).

One example of a Remote Zoo Model is the integration of [SigLIP2](https://docs.voxel51.com/plugins/plugins_ecosystem/siglip2.html), which you can use to visualize image embeddings, perform zero shot classification, and perform image retrieval by [searching via natural language](https://docs.voxel51.com/brain.html#text-similarity) in the App.

Let's start by registering the Remote Zoo Model source:

In [None]:
import fiftyone.zoo as foz

# Register this custom model source
foz.register_zoo_model_source(
    "https://github.com/harpreetsahota204/siglip2", 
    overwrite=True
    )

Then instantiate the model:

In [None]:
import fiftyone.zoo as foz

siglip_model = foz.load_zoo_model(
    "google/siglip2-giant-opt-patch16-256"
)

You can than use the [`compute_embeddings()` method](https://docs.voxel51.com/api/fiftyone.core.models.html#fiftyone.core.models.compute_embeddings) of the Dataset:

In [None]:
dataset.compute_embeddings(
    model=siglip_model,
    embeddings_field="siglip2_embeddings",
)

Then use the [`compute_visualization()` method](https://docs.voxel51.com/api/fiftyone.brain.html#fiftyone.brain.compute_visualization) to generate low-dimensional representations of the samples (and/or individual objects) in your Dataset.

In [None]:
import fiftyone.brain as fob

results = fob.compute_visualization(
    dataset,
    embeddings="siglip2_embeddings",
    method="umap",
    brain_key="siglip2_viz",
    num_dims=2,
)


You can then use the [`compute_similarity()` method](https://docs.voxel51.com/api/fiftyone.brain.html#fiftyone.brain.compute_similarity) to build a similarity index over the images in your dataset, which allows you to sort by similarity or search with natural language.

In [None]:
# Build a similarity index
text_img_index = fob.compute_similarity(
    dataset,
    model="google/siglip2-giant-opt-patch16-256",
    embeddings="siglip2_embeddings",
    brain_key="siglip2_similarity",
)

With the embeddings computed you can perform a lot of non-trival math, such as computing scores for [uniqueness](https://docs.voxel51.com/brain.html#image-uniqueness), [representativeness](https://docs.voxel51.com/brain.html#image-representativeness), and [identifying near duplicates](https://docs.voxel51.com/brain.html#near-duplicates) with simple function calls. 


We can use the same SigLIP2 model to perform zero-shot classification and further enrich our Dataset with information it didn't have before:

In [None]:
siglip_model.text_prompt = "Low altitude drone footage taken at "
siglip_model.classes = ["day", "night", "dusk"]

dataset.apply_model(
    siglip_model,
    label_field="time_of_day"
)

In [None]:
siglip_model.text_prompt = "The scene in this low altitude drone footage is in a "
siglip_model.classes = ["urban area", "near water", "highway", "pedestrian area"]

dataset.apply_model(
    siglip_model,
    label_field="location"
)



Let's launch the App again and see what we can uncover by inspecting [the Embeddings panel](https://docs.voxel51.com/user_guide/app.html#embeddings-panel).

In [None]:
session = fo.launch_app(dataset, auto=False)
session.url

<img src="ris_lad_in_fo_2.gif">

# Using SAM 3

We can use [SAM 3 in FiftyOne](https://docs.voxel51.com/plugins/plugins_ecosystem/sam3_images.html) as a Remote Zoo Model. The pattern is exactly as we have seen before:

In [None]:
import fiftyone.zoo as foz

# Register the remote model source
foz.register_zoo_model_source(
    "https://github.com/harpreetsahota204/sam3_images",
    overwrite=True
)

# Load the model
sam3_model = foz.load_zoo_model("facebook/sam3")

The implementation in Fiftyone also allows us to compute embeddings for images using SAM 3 as well:

In [None]:
import fiftyone as fo
import fiftyone.zoo as foz
import fiftyone.brain as fob

sam3_model.pooling_strategy = "max"  # or "mean", "cls"

dataset.compute_embeddings(
    sam3_model,
    embeddings_field="sam_embeddings",
    batch_size=32
)

# Visualize with UMAP
fob.compute_visualization(
    dataset,
    method="umap",
    brain_key="sam_viz",
    embeddings="sam_embeddings",
    num_dims=2
)

To run the SAM 3 model on the dataset, all we have to do is set some values for the model, and use the [`apply_model()` of the Dataset](docs.voxel51.com/api/fiftyone.core.dataset.html#fiftyone.core.dataset.dataset.apply_model):

In [None]:
sam3_model.operation = "concept_segmentation"
sam3_model.threshold = 0.5
sam3_model.mask_threshold = 0.5

sam3_model.prompt = dataset.distinct("ground_truth.detections.label")

dataset.apply_model(
    sam3_model,
    label_field="sam3_not_finetuned",
    batch_size=32,
    num_workers=8,
    skip_failures=False
)

We can view the embeddings and the predictions in the App as well:

<img src="ris_lad_in_fo_3.gif">

We can then use [FiftyOne's evaluation API](https://docs.voxel51.com/user_guide/evaluation.html) to see how well the initial results. You can [`evaluate_detections()` method](https://docs.voxel51.com/user_guide/evaluation.html#detections) to evaluate the predictions of an object detection model stored in a [`Detections`](https://docs.voxel51.com/api/fiftyone.core.labels.html#fiftyone.core.labels.Detections), [`Polylines`](https://docs.voxel51.com/api/fiftyone.core.labels.html#fiftyone.core.labels.Polylines), or [`Keypoints`](https://docs.voxel51.com/api/fiftyone.core.labels.html#fiftyone.core.labels.Keypoints) field of your dataset or of a temporal detection model stored in a [`TemporalDetections`](https://docs.voxel51.com/api/fiftyone.core.labels.html#fiftyone.core.labels.TemporalDetection) field of your dataset.

In [None]:
results = dataset.evaluate_detections(
    "sam3_not_finetuned",          # Detections with masks
    gt_field="ground_truth",   # Detections with masks
    eval_key="initial_sam3_eval",
    use_masks=True,            # use instance masks for IoU
    compute_mAP=True,
    tolerance=2
)


The `evaluate_detections()` method returns a [`DetectionResults` instance](https://docs.voxel51.com/api/fiftyone.utils.eval.detection.html#fiftyone.utils.eval.detection.DetectionResults) that provides a variety of methods for generating various aggregate evaluation reports about your model.

In addition, when you specify an `eval_key` parameter, a number of helpful fields will be populated on each sample and its predicted/ground truth objects that you can leverage via the FiftyOne App to interactively explore the strengths and weaknesses of your model on individual samples.

You can print the report to get a high-level picture of the model performance:

In [3]:
results.print_report()
print(results.mAP())

              precision    recall  f1-score   support

     bicycle       0.23      0.35      0.28       640
        boat       0.32      0.79      0.45       245
         bus       0.60      0.71      0.65       732
         car       0.34      0.85      0.48      4365
       motor       0.23      0.80      0.35      2803
      people       0.20      0.50      0.29      2910
    tricycle       0.54      0.43      0.48       528
       truck       0.51      0.49      0.50      1648

   micro avg       0.29      0.68      0.40     13871
   macro avg       0.37      0.62      0.43     13871
weighted avg       0.32      0.68      0.42     13871

0.27819947196402245


You can also open the [Model Evaluation Panel](https://docs.voxel51.com/api/fiftyone.utils.eval.detection.html#fiftyone.utils.eval.detection.DetectionResults) to visualize and interactively explore the evaluation results in the App:

<img src="ris_lad_in_fo_4.gif">


You can use [Scenario Analysis](https://docs.voxel51.com/user_guide/app.html#scenario-analysis-sub-new) for a deep dive into model behavior across different scenarios.

This evaluation technique helps uncover edge cases, identify annotation errors, and understand performance variations in different contexts. It gives you a better insight into your model's strengths and weaknesses while enabling meaningful comparisons of performance under varying input conditions. 

Ultimately, this detailed analysis helps improve training data quality and builds intuition about when and why your model succeeds or fails.

<img src="ris_lad_in_fo_5.gif">

#### We're almost ready to fine-tune the model, but before we do we should check if there is any data leakage between the train and validation sets of the dataset.

Our dataset has [Sample level tags](https://docs.voxel51.com/user_guide/basics.html#tags) which indicate which split each sample belongs to:

In [6]:
dataset.distinct("tags")

['test', 'train', 'val']

Despite our best efforts, duplicates and other forms of non-IID samples show up in our data. 

When these samples end up in different splits, [this can have consequences when evaluating a model](https://voxel51.com/blog/on-leaky-datasets-and-a-clever-horse). It can often be easy to overestimate model capability due to this issue. The FiftyOne Brain offers a way to identify such cases in dataset splits.

The leaks of a dataset can be computed directly without the need for the predictions of a pre-trained model via the [`compute_leaky_splits()`](https://docs.voxel51.com/brain.html#leaky-splits) method:



In [7]:
dataset

Name:        Voxel51/RIS-LAD
Media type:  image
Num samples: 2103
Persistent:  True
Tags:        []
Sample fields:
    id:                   fiftyone.core.fields.ObjectIdField
    filepath:             fiftyone.core.fields.StringField
    tags:                 fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:             fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)
    created_at:           fiftyone.core.fields.DateTimeField
    last_modified_at:     fiftyone.core.fields.DateTimeField
    ground_truth:         fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
    prompts:              fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    siglip2_embeddings:   fiftyone.core.fields.VectorField
    time_of_day:          fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)
    location:             fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labe

In [None]:
import fiftyone.brain as fob

split_tags = ["train", "val"]

index = fob.compute_leaky_splits(
    dataset, 
    splits=split_tags,
    embeddings="sam_embeddings",
    )

Computing duplicate samples...
Duplicates computation complete


The [`leaks_view()` method](https://docs.voxel51.com/api/fiftyone.brain.internal.core.leaky_splits.html#fiftyone.brain.internal.core.leaky_splits.LeakySplitsIndex.leaks_view) returns a view that contains only the leaks in the input splits. Once you have these leaks, it is wise to look through them. You may gain some insight into the source of the leaks:

In [9]:
leaks = index.leaks_view()

You can launch the app on this view like so:

```python
session = fo.launch_app(leaks)
```

Fortunately for us, there are no leaks between our splits. But, it's always a good idea to check