<a href="https://colab.research.google.com/drive/1OYipLzeOxvKYPBtOtRfyRn2lFV76lrZM" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# FiftyOne Workshop - Agriculture
# Coffee Dataset Exploration -> Geolocation

## 🏆 Learning Objectives
- Understand how to load and explore a dataset using FiftyOne.
- Perform basic dataset inspection and visualization.
- Explore geolocation data (if available) in the dataset.

## Requirements
### Knowledge
- Basic Python programming.
- Familiarity with Computer Vision concepts.
- Understanding of geospatial data (optional).
### Installation
Run the following command to install necessary dependencies:
```bash
pip install --upgrade pip
pip install fiftyone
pip install "huggingface_hub>=0.20.0"
```

In [1]:
!pip install fiftyone
!pip install "huggingface_hub>=0.20.0"



In [2]:
import torch

def get_device():
    """Get the appropriate device for model inference."""
    if torch.cuda.is_available():
        return "cuda"
    elif hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
        return "mps"
    return "cpu"

DEVICE = get_device()

print(f"Using device: {DEVICE}")

Using device: cuda


## Load a dataset

Let's get started by importing the FiftyOne library, and the utils we need for a COCO format dataset, depending of the dataset format you should change that option. [Supported Formats](https://docs.voxel51.com/user_guide/dataset_creation/datasets.html#supported-formats)

In [3]:
import fiftyone as fo
import fiftyone.utils.huggingface as fouh
from fiftyone.utils.coco import COCODetectionDatasetImporter

In [4]:
import gdown

# Download the coffee dataset from Google Drive

url = "https://drive.google.com/uc?id=1TMeeIzj8EyocVyXmOgKSLYE3vTLc2gPe" # original
gdown.download(url, output="coffee_original.zip", quiet=False)

Downloading...
From (original): https://drive.google.com/uc?id=1TMeeIzj8EyocVyXmOgKSLYE3vTLc2gPe
From (redirected): https://drive.google.com/uc?id=1TMeeIzj8EyocVyXmOgKSLYE3vTLc2gPe&confirm=t&uuid=1e1f36d7-176b-4c53-bee1-8d458ee7e502
To: /content/coffee_original.zip
100%|██████████| 687M/687M [00:10<00:00, 65.1MB/s]


'coffee_original.zip'

In [5]:
!unzip coffee_original.zip

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
  inflating: colombian_coffee/images/default/lin_ln_20150813_093545_im_52.jpg  
  inflating: colombian_coffee/images/default/lin_ln_20150514_152714_im_60.jpg  
  inflating: colombian_coffee/images/default/lin_ln_20150813_093545_im_53.jpg  
  inflating: colombian_coffee/images/default/lin_ln_20150813_093545_im_47.jpg  
  inflating: colombian_coffee/images/default/lin_ln_20150514_152714_im_48.jpg  
  inflating: colombian_coffee/images/default/lin_ln_20150617_104246_im_31.jpg  
  inflating: colombian_coffee/images/default/lin_ln_20150617_104246_im_25.jpg  
  inflating: colombian_coffee/images/default/lin_ln_20150716_075409_im_39.jpg  
  inflating: colombian_coffee/images/default/lin_ln_20150702_113655_im_118.jpg  
  inflating: colombian_coffee/images/default/lin_ln_20150911_194542_im_82.jpg  
  inflating: colombian_coffee/images/default/lin_ln_20150827_090954_im_33.jpg  
  inflating: colombian_coffee/images/default/lin_ln_20

In [6]:
import gdown

# Download the CarDD dataset from Google Drive
url = "https://drive.google.com/uc?id=1YHBdFd5SJuiqRK4YV6pJr4-6p4_830lE"
gdown.download(url, output="Coffee_Tree_Geolocations.csv", quiet=False)

Downloading...
From: https://drive.google.com/uc?id=1YHBdFd5SJuiqRK4YV6pJr4-6p4_830lE
To: /content/Coffee_Tree_Geolocations.csv
100%|██████████| 65.6k/65.6k [00:00<00:00, 50.3MB/s]


'Coffee_Tree_Geolocations.csv'

## 1. Loading the Dataset

In [7]:
# import fiftyone as fo # base library and app
# import fiftyone.utils.huggingface as fouh # Hugging Face integration
# dataset_ = fouh.load_from_hub("pjramg/my_colombian_coffe_FO", persistent=True, overwrite=True)

# # Define the new dataset name
dataset_name = "coffee_original"

# Check if the dataset exists
if dataset_name in fo.list_datasets():
    print(f"Dataset '{dataset_name}' exists. Loading...")
    dataset = fo.load_dataset(dataset_name)
else:
    print(f"Dataset '{dataset_name}' does not exist. Creating a new one...")
    # Clone the dataset with a new name and make it persistent
    dataset = fo.Dataset.from_dir(
                dataset_type=fo.types.COCODetectionDataset,
                dataset_dir="./colombian_coffee",
                data_path="images/default",
                labels_path="annotations/instances_default.json",
                label_types="segmentations",
                label_field="categories",
                name="coffee",
                include_id=True,
                overwrite=True
            )

Dataset 'coffee_original' does not exist. Creating a new one...
 100% |███████████████| 1593/1593 [14.1s elapsed, 0s remaining, 392.6 samples/s]      


INFO:eta.core.utils: 100% |███████████████| 1593/1593 [14.1s elapsed, 0s remaining, 392.6 samples/s]      


## 2. Exploring the Dataset

In [8]:
session = fo.launch_app(dataset, auto=False)
session.open_tab()

Session launched. Run `session.show()` to open the App in a cell output.


INFO:fiftyone.core.session.session:Session launched. Run `session.show()` to open the App in a cell output.


<IPython.core.display.Javascript object>

## 3. Geolocation Analysis
If the dataset contains geolocation metadata, use FiftyOne’s visualization tools to inspect it. Download the ```csv``` file within this repo folder. ```"Coffee_Tree_Geolocations.csv"```. Change the location in the csv_file if needed.

In [10]:
import pandas as pd
import random
# Load the CSV file with tree geolocations
csv_file = "Coffee_Tree_Geolocations.csv"  # Update with the correct file path
tree_data = pd.read_csv(csv_file)

# Shuffle geolocations to assign randomly
tree_data = tree_data.sample(frac=1).reset_index(drop=True)

# Assign geolocations to samples
for sample, (_, row) in zip(dataset, tree_data.iterrows()):
    sample["location"] = fo.GeoLocation(
        point=[row["longitude"], row["latitude"]]
    )
    sample.save()

print("Geolocation metadata assigned successfully!")

# Verify the first few samples
print(dataset.first())

Geolocation metadata assigned successfully!
<Sample: {
    'id': '688cfed5aec268d37893972d',
    'media_type': 'image',
    'filepath': '/content/colombian_coffee/images/default/lin_ln_20150617_102755_im_01.jpg',
    'tags': [],
    'metadata': <ImageMetadata: {
        'size_bytes': None,
        'mime_type': None,
        'width': 1920,
        'height': 1080,
        'num_channels': None,
    }>,
    'created_at': datetime.datetime(2025, 8, 1, 17, 52, 21, 522000),
    'last_modified_at': datetime.datetime(2025, 8, 1, 17, 55, 28, 669000),
    'categories_coco_id': 1,
    'categories_segmentations': None,
    'location': <GeoLocation: {
        'id': '688cff90aec268d37893a70f',
        'tags': [],
        'point': [-74.99966819168937, 5.000620934239795],
        'line': None,
        'polygon': None,
    }>,
}>


## 4. Analyzing Maturation States in the Dataset
To gain more insights into the dataset, we analyze the segmentation results by categorizing coffee beans into different maturation stages.
This helps in understanding the distribution of different maturation states across the dataset.

### Steps in the Analysis:
- **Load the dataset** and ensure segmentations are available.
- **Count occurrences of different maturation states** based on segmentation labels.
- **Assign explicit numerical values** to facilitate analysis and visualization.
- **Ensure all fields are correctly set** to avoid issues with visualization plugins.

The following script processes each sample in the dataset and adds metadata fields representing:
- The count of beans in different maturation stages.
- The dominant maturation stage for each sample.
- Ensuring all fields have valid values to avoid errors in visualization.

### Code Implementation:


First we need to install The Plotly Map Panel, ```#!fiftyone plugins download https://github.com/allenleetc/plotly-map-panel```. This is a community plugin for FiftyOne that provides an alternative to the built-in Map Panel, which relies on Mapbox. This plugin utilizes PlotlyView for interactive geospatial visualizations, making it a great option for users who want a flexible, open-source alternative without requiring Mapbox API keys. Once installed, you can enable the Plotly Map Panel in the FiftyOne App by navigating to:

    - Open the FiftyOne App
    - Go to the Plugins Menu
    - Enable "Plotly Map Panel"
    - Load a dataset with geolocation metadata
    - Start visualizing geospatial data interactively!

When to Use This Plugin?

    - If you need interactive maps without requiring Mapbox.
    - When working with datasets containing latitude/longitude metadata.
    - To compare spatial distributions of objects across locations.
    - For customizing geospatial visualization with Plotly’s flexibility.

By integrating this plugin, you can unlock geospatial insights in your FiftyOne datasets more easily.

In [9]:
!fiftyone plugins download https://github.com/allenleetc/plotly-map-panel

Downloading allenleetc/plotly-map-panel...

Skipping existing plugin 'Plotly Map'


In [11]:
import fiftyone as fo
from collections import Counter

# Load the existing dataset
# dataset = fo.load_dataset("coffee_beans_dataset")  # Replace with your dataset name

# Define label-to-index mapping
label_mapping = {
    "immature": 1,
    "semimature": 2,
    "mature": 3,
    "overmature": 4,
}

# Iterate over each sample
for sample in dataset:
    # Ensure the "categories_segmentations" field exists and contains detections
    detections = getattr(sample, "categories_segmentations", None)
    if detections and hasattr(detections, "detections"):
        detections = detections.detections
    else:
        detections = []

    # Count occurrences of each maturation state
    label_counts = Counter(d.label for d in detections if d.label is not None)

    # Explicitly set count fields, ensuring no `None` values
    sample["immature_count"] = int(label_counts.get("immature", 0))
    sample["mature_count"] = int(label_counts.get("mature", 0))
    sample["semimature_count"] = int(label_counts.get("semimature", 0))
    sample["overmature_count"] = int(label_counts.get("overmature", 0))

    # Determine the maturation stage with the highest count
    if label_counts:
        max_label, max_count = max(label_counts.items(), key=lambda x: x[1])
    else:
        max_label, max_count = "unknown", 0  # Avoid NoneType errors

    # Assign segmentation status (1 = No segmentations, 0 = Has segmentations)
    sample["No_Segmentations"] = 1 if max_count == 0 else 0

    # Ensure numeric fields for compatibility with visualization
    sample["max_maturation_count"] = int(max_count) if max_count > 0 else 0
    sample["max_maturation_stage"] = int(label_mapping.get(max_label, 0)) if max_count > 0 else 0
    sample["max_maturation_count_str"] = str(max_count) if max_count > 0 else "0"

    # Save the updated metadata
    sample.save()

print("Maturation state metadata added successfully!")

# Verify the first sample
print(dataset.first())  # Print a sample to confirm
print(dataset)  # Print the dataset structure


Maturation state metadata added successfully!
<Sample: {
    'id': '688cfed5aec268d37893972d',
    'media_type': 'image',
    'filepath': '/content/colombian_coffee/images/default/lin_ln_20150617_102755_im_01.jpg',
    'tags': [],
    'metadata': <ImageMetadata: {
        'size_bytes': None,
        'mime_type': None,
        'width': 1920,
        'height': 1080,
        'num_channels': None,
    }>,
    'created_at': datetime.datetime(2025, 8, 1, 17, 52, 21, 522000),
    'last_modified_at': datetime.datetime(2025, 8, 1, 17, 59, 18, 723000),
    'categories_coco_id': 1,
    'categories_segmentations': None,
    'location': <GeoLocation: {
        'id': '688cff90aec268d37893a70f',
        'tags': [],
        'point': [-74.99966819168937, 5.000620934239795],
        'line': None,
        'polygon': None,
    }>,
    'immature_count': 0,
    'mature_count': 0,
    'semimature_count': 0,
    'overmature_count': 0,
    'No_Segmentations': 1,
    'max_maturation_count': 0,
    'max_maturati

In [13]:
session = fo.launch_app(dataset, auto=False)
session.open_tab()

Session launched. Run `session.show()` to open the App in a cell output.


INFO:fiftyone.core.session.session:Session launched. Run `session.show()` to open the App in a cell output.


<IPython.core.display.Javascript object>

In [12]:
 session.close()

In [14]:
# Create a view that filters samples with No_Segmentations = 0
segmented_view = dataset.match({"No_Segmentations": 0})

# Print the number of matching samples
print(f"Number of segmented samples: {len(segmented_view)}")

# Launch FiftyOne App to visualize the view
session = fo.launch_app(segmented_view, auto=False)

Number of segmented samples: 146
Session launched. Run `session.show()` to open the App in a cell output.


INFO:fiftyone.core.session.session:Session launched. Run `session.show()` to open the App in a cell output.


[![ploty.png](https://i.postimg.cc/qBZnxy8p/ploty.png)](https://postimg.cc/F1379fW6)

## Next Steps
- Learn how to apply AI models for segmentation.
- Use FiftyOne for annotation and active learning workflows.
- Proceed to the SAM2 annotation notebook for object segmentation.