<a href="https://colab.research.google.com/drive/19HwWsoE8WhBVsSXIuKCKt29nWOoOei75" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# FiftyOne Workshop - Agriculture
# Exploring a Coffee-Beans Dataset

Welcome to this hands-on workshop where we will learn how to load and explore datasets using FiftyOne.
This notebook will guide you through programmatic interaction via the **FiftyOne SDK** and visualization using the **FiftyOne App**.

[![coffee-intro.png](https://i.postimg.cc/FR4MdMnB/coffee-intro.png)](https://postimg.cc/LqxCGbSt)

This notebook provides a brief walkthrough of [FiftyOne](https://voxel51.com/docs/fiftyone), highlighting features that will help you build better datasets and computer vision models.

We'll cover the following concepts:

- Loading your own dataset [into FiftyOne](https://voxel51.com/docs/fiftyone/user_guide/dataset_creation/index.html). You can replace [my Coffee-Beans Dataset](https://huggingface.co/datasets/pjramg/colombian_coffee) by the one of your preference.
- Using FiftyOne [in a notebook](https://voxel51.com/docs/fiftyone/environments/index.html#notebooks)
- Using [views](https://voxel51.com/docs/fiftyone/user_guide/using_views.html) and [the App](https://voxel51.com/docs/fiftyone/user_guide/app.html) to explore different aspects of your dataset
- Running

## Install FiftyOne and dependencies

In [None]:
# If you are running this on Google Colab run this cell otherwise check the readme file with the requirements. 
!pip install fiftyone huggingface_hub torch torchvision umap-learn

Let's write a quick function to help you select the best device for your machine:

In [None]:
import torch

def get_device():
    """Get the appropriate device for model inference."""
    if torch.cuda.is_available():
        return "cuda"
    elif hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
        return "mps"
    return "cpu"

DEVICE = get_device()

print(f"Using device: {DEVICE}")

# Download dataset from source

We can download the file from Google Drive using `gdown`

Let's get started by importing the FiftyOne library, and the utils we need for a COCO format dataset, depending of the dataset format you should change that option. [Supported Formats](https://docs.voxel51.com/user_guide/dataset_creation/datasets.html#supported-formats)

In [None]:
import fiftyone as fo
import fiftyone.utils.huggingface as fouh
from fiftyone.utils.coco import COCODetectionDatasetImporter

In [None]:
import gdown

# Download the coffee dataset from Google Drive

url = "https://drive.google.com/uc?id=1TMeeIzj8EyocVyXmOgKSLYE3vTLc2gPe" # original
gdown.download(url, output="coffee_original.zip", quiet=False)

You can then extract the dataset as follows:

In [None]:
!unzip coffee_original.zip

### Load into FiftyOne Format

FiftyOne [supports importing datasets from disk in various formats](https://docs.voxel51.com/user_guide/import_datasets.html), and it can be extended to import datasets in custom formats. The basic recipe involves specifying the path(s) to the data on disk and the type of dataset you’re loading. 

You can import a dataset from disk via [the `from_dir()` method](https://docs.voxel51.com/api/fiftyone.core.dataset.html#fiftyone.core.dataset.Dataset.from_dir). 

Read the docs for full detail on all [supported formats](https://docs.voxel51.com/user_guide/import_datasets.html#common-formats).

The Coffee_Beans dataset is in COCO format, so you can use [FiftyOne's built-in importer for COCO dataset](https://docs.voxel51.com/api/fiftyone.types.html?highlight=cocodetectiondataset#fiftyone.types.COCODetectionDataset). 

The relevant arguments we use here are:

• `data_path` - where the images reside on disk

• `labels_path` - the path to the annotations, which should be a `json` file

• `dataset_type` - let' FiftyOne know we are loading a Dataset in COCO format

Read [the docs to learn more](https://docs.voxel51.com/integrations/coco.html?highlight=cocodetectiondataset) about working with datasets in COCO format.

In [None]:
# import fiftyone as fo # base library and app
# import fiftyone.utils.huggingface as fouh # Hugging Face integration
# dataset_ = fouh.load_from_hub("pjramg/my_colombian_coffe_FO", persistent=True, overwrite=True)

# # Define the new dataset name
dataset_name = "coffee_original"

# Check if the dataset exists
if dataset_name in fo.list_datasets():
    print(f"Dataset '{dataset_name}' exists. Loading...")
    dataset = fo.load_dataset(dataset_name)
else:
    print(f"Dataset '{dataset_name}' does not exist. Creating a new one...")
    # Clone the dataset with a new name and make it persistent
    dataset = fo.Dataset.from_dir(
                dataset_type=fo.types.COCODetectionDataset,
                dataset_dir="./colombian_coffee",
                data_path="images/default",
                labels_path="annotations/instances_default.json",
                label_types="segmentations",
                label_field="categories",
                name="coffee",
                include_id=True,
                overwrite=True
            )

### Alternative - Hugging Face Hub

You can use Hugging Face Hub reviously, I download my dataset from HuggingFace Hub Datasets, and them I can work with it locally. Here more information about [loading dataset into FiftyOne](https://docs.voxel51.com/user_guide/dataset_creation/index.html)

In [None]:
# import fiftyone as fo # base library and app
# import fiftyone.utils.huggingface as fouh # Hugging Face integration
# dataset_hub = fouh.load_from_hub("pjramg/my_colombian_coffe_FO", persistent=True, overwrite=True)

# # Define the new dataset name
# dataset_name = "coffee_FO_hub"

# # Check if the dataset exists
# if dataset_name in fo.list_datasets():
#     print(f"Dataset '{dataset_name}' exists. Loading...")
#     dataset_hub = fo.load_dataset(dataset_name)
# else:
#     print(f"Dataset '{dataset_name}' does not exist. Creating a new one...")
#     # Clone the dataset with a new name and make it persistent
#     dataset_hub = dataset_hub.clone(dataset_name, persistent=True)

You can call the dataset to see it's associated fields:

In [None]:
dataset

Let's [persist the Dataset](https://docs.voxel51.com/user_guide/using_datasets.html#dataset-persistence) as non-persistent datasets are deleted from the database each time the database is shut down. Note, you could define dataset persistence when you create the dataset by passing `persistent=True` into the `from_dir` method above.

In [None]:
dataset.persistent = True

You can also call [the first Sample of the Dataset](https://docs.voxel51.com/api/fiftyone.core.dataset.html#fiftyone.core.dataset.Dataset.first) to see what the Fields looks like:

In [None]:
print(dataset.first())

Check particular sample information:

In [None]:
# # Get the sample
# sample = dataset["REPLACE WITH AN THE SAMPLE ID"]

# # Print sample info
# print(sample)


Now let's launch the [FiftyOne App](https://voxel51.com/docs/fiftyone/user_guide/app.html) so we can explore the dataset visually. Right away you will see that because we are in a notebook, an embedded instance of the App with our dataset loaded has been rendered in the cell's output.

The [Session](https://voxel51.com/docs/fiftyone/api/fiftyone.core.session.html#fiftyone.core.session.Session) object created below is a bi-directional connection between your Python kernel and the FiftyOne App, as we'll see later.

In [None]:
session = fo.launch_app(dataset, auto=False)
session.open_tab()

In [None]:
 #session.close()


## 🔍 Querying and Filtering

FiftyOne provides a powerful querying engine to filter and analyze datasets efficiently.
We can apply filters to:
- Retrieve specific labels (e.g., all images with "cat" labels).
- Apply confidence thresholds to object detections.
- Filter data based on metadata (e.g., image size, timestamp).

🔗 **Relevant Documentation:** [Dataset views](https://docs.voxel51.com/user_guide/using_views.html#dataset-views), [Querying Samples](https://docs.voxel51.com/user_guide/using_views.html#querying-samples), [Common filters](https://docs.voxel51.com/user_guide/using_views.html#querying-samples)

### Examples:
- Show all images containing a particular class.
- Retrieve samples with object detection confidence above a threshold.
- Filter out low-quality images based on metadata.


In [None]:
import fiftyone.core.expressions as foe

# Create a view with samples where at least one detection has label "immature"
view = dataset.match(
    foe.ViewField("categories_segmentations.detections").filter(
        foe.ViewField("label") == "immature"
    ).length() > 0
)

# Launch FiftyOne App with the filtered dataset
session = fo.launch_app(view, auto=False)


In [None]:
for sample in view:
    print("Sample:", sample.id)
    for det in sample.categories_segmentations.detections:
        if det.label == "immature":
            print(" -", det.label, det.bounding_box)

In [None]:
# Filter detections within each sample to only show "mature"
filtered_view = dataset.filter_labels(
    "categories_segmentations",  # name of your Detections field
    foe.ViewField("label") == "mature"
)


In [None]:
combined_view = (
    dataset
    .match(
        foe.ViewField("categories_segmentations.detections").filter(
            foe.ViewField("label") == "immature"
        ).length() > 0
    )
    .filter_labels("categories_segmentations", foe.ViewField("label") == "semimature")
)

print(combined_view)


# Launch FiftyOne App with the filtered dataset
session = fo.launch_app(combined_view, auto=False)

In [None]:
new_dataset= filtered_view.clone()
print(new_dataset)

export_dir = "filtered_coffee_dataset_FO"
new_dataset.export(
    export_dir=export_dir,
    dataset_type=fo.types.FiftyOneDataset,
)


## 🖥️ Interactive Exploration with the FiftyOne App

The **FiftyOne App** allows users to interactively browse, filter, and analyze datasets.
This visual interface is an essential tool for understanding dataset composition and refining data exploration workflows.

Key features of the FiftyOne App:
- Interactive filtering of images/videos.
- Object detection visualization.
- Dataset statistics and metadata overview.

🔗 **Relevant Documentation:** [Using the FiftyOne App](https://voxel51.com/docs/fiftyone/user_guide/app.html)


### Intereacting with Plugins to understand the dataset

FiftyOne provides a powerful [plugin framework](https://docs.voxel51.com/plugins/index.html) that allows for extending and customizing the functionality of the tool to suit your specific needs. In this case we will use the [@voxel51/dashboard](https://github.com/voxel51/fiftyone-plugins/blob/main/plugins/dashboard/README.md) plugin, a plugin that enables users to construct custom dashboards that display statistics of interest about the current dataset (and beyond)

In [None]:
!fiftyone plugins download https://github.com/voxel51/fiftyone-plugins --plugin-names @voxel51/dashboard

## Using the App

With the App, you can visualize your samples and their fields either in image grid view, or by double-clicking an image to enter an expanded sample view, where you can study individual samples in more detail.

The [view bar](https://voxel51.com/docs/fiftyone/user_guide/app.html#using-the-view-bar) allows you to search and filter your dataset to study specific samples or labels of interest.

With FiftyOne, you can seemlessly transition between the App and Python. For example, create a search using the `Shuffle() == 51` and `Limit() == 10` stages in the view bar:

In [None]:
#session.show()

You can access the current view back in your Python shell at any time:

In [None]:
# Access the current view in the App
print(session.view)

## Indexing images by uniqueness

FiftyOne includes a `fiftyone.brain` package that provides a collection of algorithms to help you gain insight into your datasets and models. For more information, [check out the user guide](https://voxel51.com/docs/fiftyone/user_guide/brain.html).

Let's use the `compute_uniqueness()` function to index the samples in our dataset according to their visual uniqueness:

In [None]:
import fiftyone.brain as fob

fob.compute_uniqueness(dataset)

Inspecting the dataset shows that a numeric `uniqueness` field has been added to each sample, which measures its visual uniqueness with respect to the other samples in the dataset:

In [None]:
print(dataset.select_fields("uniqueness").first())

Let's visualize this information in the App by showing the most visually unique samples first:

In [None]:
# Explore most unique samples


# Launch FiftyOne App with the filtered dataset
session = fo.launch_app(dataset, auto=False)
session.view = dataset.sort_by("uniqueness", reverse=True)

Sorting by **least unique** can help us identify near duplicate samples in our dataset. This can be useful in situations where you need to send a dataset for annotation and need to select a diverse set of images.

In [None]:
# Explore the least unique samples
session.view = dataset.sort_by("uniqueness")

## Embedding methods

The `embeddings` and `model` parameters of `compute_visualization()` support a variety of ways to generate embeddings for your data:

- Provide nothing, in which case a default general purpose model is used to embed your data
- Provide a Model instance or the name of any model from the Model Zoo that supports embeddings
- Provide your own precomputed embeddings in array form
- Provide the name of a VectorField or ArrayField of your dataset in which precomputed embeddings are stored

In [None]:
import fiftyone.zoo as foz
# Load a resnet from the model zoo
model = foz.load_zoo_model("resnet50-imagenet-torch")

# Verify that the model exposes embeddings
print(model.has_embeddings)
# True

# Compute embeddings for each image
embeddings = dataset.compute_embeddings(model) #, embeddings_field="resnet50_emb")

print(embeddings.shape)
# 10000 x 2048

In [None]:
print(dataset)

In [None]:
import fiftyone.brain as fob
# Compute 2D representation using pre-computed embeddings
results = fob.compute_visualization(
    dataset,
    embeddings=embeddings,
    num_dims=2,
    brain_key="image_embeddings_rs50",
    verbose=True,
    seed=51,
)

## Using Embeddings for Deeper Dataset Understanding

You can use embeddings to gain a deeper understanding of the images in this dataset.
Visual embeddings can help analyze your images in several key ways:

1. [**Relationship Visualization:**](https://docs.voxel51.com/brain.html#brain-embeddings-visualization) Using dimensionality reduction (like UMAP) to visualize how different types cluster together and identify patterns of your branches.

2. **Model Comparison:** Compare how different vision models encode and interpret maturation stages.

3. **Category Analysis:** Explore visual similarities and differences between the four maturations stages.

4. **Variation Study:** Understand how factors like weather conditions can affect the embedding space.

5. **Feature Detection:** Identify subtle visual features that distinguish different types fruits.

For this analysis, we’ll use these models:

• CLIP

• AIMv2

• C-RADIOv3

Note that both of these models can be used for zero-shot classification. 

In [None]:
#session = fo.launch_app(dataset)

### CLIP, AIM Embeddings

In [None]:
model_clip = foz.load_zoo_model("clip-vit-base32-torch")

In [None]:
import torch
from transformers import AutoModel
import fiftyone.utils.transformers as fout

aim_model = AutoModel.from_pretrained(
    "apple/aimv2-large-patch14-224",
    revision="ac764a25c832c7dc5e11871daa588e98e3cdbfb7",
    trust_remote_code=True,
)

In [None]:
aim_fo_model = fout.convert_transformers_model(
    aim_model,
    trust_remote_code=True
    )

In [None]:
dataset.compute_embeddings(
    model=model_clip,
    embeddings_field="clip_embeddings",
    skip_failures=False
)
dataset.compute_embeddings(
    model=aim_fo_model,
    embeddings_field="aim_embeddings"
)

### NVIDIA LAB * CRADIO

In [None]:
import fiftyone.zoo as foz

foz.register_zoo_model_source(
    "https://github.com/harpreetsahota204/NVLabs_CRADIOV3",
)

radio_embeddings_model = foz.load_zoo_model(
    "nv_labs/c-radio_v3-b",
    feature_format="NCHW", # you can also pass NLC here
)

dataset.compute_embeddings(
    model=radio_embeddings_model,
    embeddings_field="radio_embeddings",
)

In [None]:
radio_spatial_model = foz.load_zoo_model(
    "nv_labs/c-radio_v3-b",
    output_type="spatial",
    apply_smoothing=True, # if you want smoothing
    smoothing_sigma=0.51, # how much smoothing you want to apply
    feature_format="NCHW" #this is the required for the heatmap
)

In [None]:
dataset.apply_model(
    radio_spatial_model,
    "radio_spatial_features"
)

In [None]:
dataset.compute_patch_embeddings(
    model=radio_embeddings_model,
    patches_field="categories_segmentations",
    embeddings_field="radio_mask_patch_emb"
)

In [None]:
import fiftyone.brain as fob

embedding_fields = [
    "clip_embeddings",
    "aim_embeddings",
    "radio_embeddings",
    "radio_mask_patch_emb"
]

# Compute UMAP for each embedding

for field in embedding_fields:
    brain_key = f"{field}_viz"

    results = fob.compute_visualization(
        dataset,
        embeddings=field,
        method="umap",
        brain_key=brain_key,
        num_dims=2,
        skip_failures=True,
        create_index=True
    )

In [None]:
# Mount drive
from google.colab import drive
drive.mount('/content/drive')

# Save in notebook1
dataset.export(export_dir="/content/drive/MyDrive/coffee_dataset_FO", dataset_type=fo.types.FiftyOneDataset)