[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/harpreetsahota204/car_dd_dataset_workshop/blob/main/01_loading_and_exploring_dataset.ipynb)

Note: If using in Google Colab, make sure you [install all the requirements listed here](https://github.com/harpreetsahota204/car_dd_dataset_workshop/blob/main/requirements.txt).

# Download dataset from source

We can download the file from Google Drive using `gdown`

In [None]:
import gdown

# Download the CarDD dataset from Google Drive
url = "https://drive.google.com/uc?id=1RgiZK970s0BffZiAkwmxrdJ7klqIY2PZ"
gdown.download(url, output="cardd.zip", quiet=False)

Downloading...
From (original): https://drive.google.com/uc?id=1RgiZK970s0BffZiAkwmxrdJ7klqIY2PZ
From (redirected): https://drive.google.com/uc?id=1RgiZK970s0BffZiAkwmxrdJ7klqIY2PZ&confirm=t&uuid=8709d12c-00e2-43d7-814b-7fabbcbf7132
To: /home/harpreet/workspace/car_dd_dataset_workshop/cardd.zip
  0%|          | 11.0M/6.05G [00:00<03:45, 26.8MB/s]

You can then extract the dataset as follows:


In [None]:
!unzip cardd.zip

### Load into FiftyOne Format

FiftyOne [supports importing datasets from disk in various formats](https://beta-docs.voxel51.com/fiftyone_concepts/dataset_creation/), and it can be extended to import datasets in custom formats. The basic recipe involves specifying the path(s) to the data on disk and the type of dataset you’re loading. 

You can import a dataset from disk via [the `from_dir()` method](https://beta-docs.voxel51.com/api/fiftyone.core.dataset.Dataset.html#from_dir). 

Read the docs for full detail on all [supported formats](https://beta-docs.voxel51.com/fiftyone_concepts/dataset_creation/datasets/#loading-datasets-from-disk).

The CarDD dataset is in COCO format, so you can use [FiftyOne's built-in importer for COCO dataset](https://beta-docs.voxel51.com/fiftyone_concepts/dataset_creation/datasets/#cocodetectiondataset). 

The relevant arguments we use here are:

• `data_path` - where the images reside on disk

• `labels_path` - the path to the annotations, which should be a `json` file

• `dataset_type` - let' FiftyOne know we are loading a Dataset in COCO format

Read [the docs to learn more](https://beta-docs.voxel51.com/api/fiftyone.utils.coco.html#fiftyone.utils.coco.COCODetectionDatasetImporter) about working with datasets in COCO format.

In [None]:
import fiftyone as fo

dataset = fo.Dataset.from_dir(
    data_path="CarDD_release/CarDD_COCO/train2017",
    labels_path="CarDD_release/CarDD_COCO/annotations/instances_train2017.json",
    dataset_type=fo.types.COCODetectionDataset,
    name="car_dd",
    overwrite=True,
    include_id=True,
)

You can call the dataset to see it's associated fields:

In [None]:
dataset

Let's [persist the Dataset](https://beta-docs.voxel51.com/fiftyone_concepts/using_datasets/#dataset-persistence) as non-persistent datasets are deleted from the database each time the database is shut down. Note, you could define dataset persistence when you create the dataset by passing `persistent=True` into the `from_dir` method above.

In [6]:
dataset.persistent = True

You can also call [the first Sample of the Dataset](https://beta-docs.voxel51.com/api/fiftyone.core.dataset.Dataset.html#first) to see what the Fields looks like:

In [None]:
dataset.first()

Notice that bounding box detections and the segmentations are parsed as [FiftyOne Detection types](https://beta-docs.voxel51.com/api/fiftyone.core.labels.Detection.html).

FiftyOne Detections are relative bounding box coordinates in `[0, 1]` in the following format: `[top-left-x, top-left-y, width, height]`.



Since we are working with instance segmentation the labels are parsed via the Detection label type with `mask` defining an instance segmentation mask for the detection within its bounding box. These are parsed as a 2D binary or 0/1 integer `numpy` array.

# Alternatively, download dataset from Hugging Face Hub

FiftyOne has an [integration with Hugging Face](https://beta-docs.voxel51.com/integrations/huggingface/), which allows you to push and pull FiftyOne Datasets from the Hugging Face Hub.

For the purposes of this workshop, I have have already [parsed this dataset into FiftyOne format and pushed it to the Hub](https://huggingface.co/datasets/harpreetsahota/CarDD).

You can download it as follows:

In [None]:
import fiftyone as fo
from fiftyone.utils.huggingface import load_from_hub

hub_dataset = load_from_hub(
    "harpreetsahota/CarDD",
    name="cardd_from_hub",
    # max_samples=500, # if you want to work with a subset of the dataset
    persistent=True,
    overwrite=True,
    )


With your Dataset loaded into FiftyOne format, now is a good time to launch the App and perform a "visual vibe check" of its contents. 

You can launch the app in a notebook by running:

```python

import fiftyone as fo

fo.launch_app(hub_dataset)
```

Or, you can open your terminal and execute `fiftyone app launch`. This will open the App in a browser window, and you can select your Dataset from the dropdown menu.

# Exploration via FiftyOne App

High resolution images may take unnecessary time to load. Sometimes it's useful to create thumbnails of the full resolution images and load those in the App:

In [None]:
import fiftyone.utils.image as foui

THUMBNAIL_SIZE = 224

foui.transform_images(
        hub_dataset,
        size=(-1, THUMBNAIL_SIZE),
        output_field="thumbnails_path",
        output_dir="thumbnails",
    )

You'll also need to set the following properties in your App config:

In [12]:
hub_dataset.app_config.media_fields = ["filepath", "thumbnails_path"]
hub_dataset.app_config.grid_media_field = "thumbnails_path"
hub_dataset.save()  # must save after edits

You can use the App in a variety of ways to explore your Dataset. One way to do this is by making using of [the Dashboard plugin](https://github.com/voxel51/fiftyone-plugins/tree/main/plugins/dashboard), which allows you to create interactive dashboards and explore the Dataset in detail.

We'll discuss [using and developing plugins](https://beta-docs.voxel51.com/plugins/using_plugins/) later in this workshop, but for now let's go ahead and install the required plugin:

In [None]:
!fiftyone plugins download \
    https://github.com/voxel51/fiftyone-plugins \
    --plugin-names @voxel51/dashboard

### Initial exploration via SDK

> If you're a seasoned Pandas user, you might want to learn more about performing Pandas-style queries in FiftyOne. Read [these docs to learn more](https://beta-docs.voxel51.com/tutorials/pandas_comparison/).

Let's explore the Dataset via the SDK before launching the FiftyOne App. From here on out, we will make use of the dataset that we have downloaded from the Hugging Face Hub

You'll make use of a [`ViewExpression`](https://beta-docs.voxel51.com/api/fiftyone.core.expressions.ViewExpression.html) and [`ViewField`](https://beta-docs.voxel51.com/api/fiftyone.core.expressions.ViewField.html) to perform aggregrations over fields of a FiftyOne dataset. Using `ViewField` allows efficient calculation across the entire dataset without manual iteration.  

You can learn more about aggregrations in FiftyOne by [reading these docs](https://beta-docs.voxel51.com/fiftyone_concepts/using_aggregations/) and learn more about creating `Views` in [these docs](https://beta-docs.voxel51.com/how_do_i/recipes/creating_views/).

Let's see the counts and types of damages in this dataset:

In [None]:
from fiftyone import ViewField as F

hub_dataset.count_values(F"detections.detections.label")

### Enriching the dataset

You notice that beyond the bounding boxes and segmentation masks, there is not much other information on this dataset. However, you can use Fiftyone to enrich your dataset. 

Something you might be interested in is the are of the bounding boxes. Let's start by adding this information. The code below will help us compute this value. Here's what is happening:

- `rel_bbox_area`: Calculates bounding box area (width * height) as fraction of image size

- `im_width, im_height`: Gets image dimensions from metadata

- `abs_area`: Converts relative area to pixels by multiplying with image dimensions

The code adds two fields to each detection:

1. `relative_bbox_area`: Area as fraction of image (0-1). Note: represent the percentage of the total image area.

2. `absolute_bbox_area`: Area in pixels




In [None]:
from fiftyone import ViewField as F

rel_bbox_area = F("bounding_box")[2] * F("bounding_box")[3]

im_width, im_height = F("$metadata.width"), F("$metadata.height")

abs_area = rel_bbox_area * im_width * im_height

hub_dataset.set_field("detections.detections.relative_bbox_area", rel_bbox_area).save()

hub_dataset.set_field("detections.detections.absolute_bbox_area", abs_area).save()

With these values computed you can perform some useful aggregations, for example you  can compute the [upper and lower bounds of the bounding box areas](https://beta-docs.voxel51.com/api/fiftyone.core.collections.SampleCollection.html#bounds) as well as other summary statistics like mean and standard deviation.

This code performs analysis on a car damage dataset (CarDD) loaded into FiftyOne. For each damage type (like "scratch", "dent", "crack", etc.), it:

1. Creates a filtered view containing only detections with that specific label
2. Calculates bounds (min/max) of the bounding box areas as percentage of image size
3. Computes the mean area and standard deviation
4. Formats and prints a summary showing the distribution of bounding box sizes for each damage type

This helps understand the relative size characteristics of different damage types - for instance, whether scratches tend to be smaller or larger than dents, which types have more size variation, and what the extreme values look like across the dataset.

In [None]:
labels = hub_dataset.distinct("detections.detections.label")

for label in labels:
    view = hub_dataset.filter_labels("detections", F("label") == label)
    bounds = view.bounds("detections.detections.relative_bbox_area")
    bounds = (bounds[0]*100, bounds[1]*100)
    area = view.mean("detections.detections.relative_bbox_area")*100
    std = view.std("detections.detections.relative_bbox_area")
    print("\033[1m%s:\033[0m Min: %.4f, Mean: %.4f, Std: %.4f, Max: %.4f" % (label, bounds[0], std, area, bounds[1]))

In the context of the CarDD damage dataset, this analysis is useful because:

1. It helps understand the physical characteristics of different damage types - scratches might be long and thin, while dents could be more compact

2. It identifies potential challenges for damage detection models - very small damages might be harder to detect

3. It can inform model architecture decisions - if damage sizes vary dramatically, you might need a model that handles multi-scale features well

4. It helps with data filtering and subset creation - you could focus on smaller damages for fine-grained detection tasks

5. It provides dataset quality insights - unusually large or small annotations might indicate labeling errors

6. It enables more targeted data augmentation - you could apply transformations specifically for damage types with limited size variations

This size distribution knowledge ultimately helps build more robust computer vision models for vehicle damage assessment.

You can also filter to Samples which have a scratch which meets some condition (as defined by their relative bounding box areas):

In [20]:
from fiftyone import ViewField as F

filter_to_scratch = F("label") == "scratch"

filter_to__boxes = F("relative_bbox_area") < 0.03

filtered_scratches = hub_dataset.match_labels(
    filter=(filter_to_scratch & filter_to_scratch), 
    fields="detections.detections"
    )

You'll see that we have just created a View into our dataset:

In [None]:
type(filtered_scratches)

You can save this view to the Dataset so that you can visualize them later in the FiftyOne App:

In [22]:
hub_dataset.save_view("filtered_scratches", filtered_scratches)

If, for whatever reason, you want to delete a View you can run:

```python
hub_dataset.delete_saved_view("<view-name>")
```

Or delete all the saved Views as follows:

```python
hub_dataset.delete_saved_views()
```


> Check out [this tutorial for more information](https://github.com/harpreetsahota204/Hands-on-Data-Centric-Visual-AI/blob/main/Module-2/Lesson_1_Exploring_Your_Dataset_with_FiftyOne.ipynb) about doing complex aggregations and filtering in FiftyOne

### Computing surface area of damages

This code takes a dataset of car damage images with their associated damage annotations and converts their representation into a more useful format. 

1. First, it extracts all the ground truth detection masks from the dataset using the [`values` method](https://beta-docs.voxel51.com/api/fiftyone.core.aggregations.Values.html) of the Dataset. 

2. It then prepares to convert these masks into a [FiftyOne Polyline](https://beta-docs.voxel51.com/api/fiftyone.core.labels.Polyline.html), which are essentially outlines of the damaged areas defined by a series of connected points. 

3. The code then goes through each image's detections one by one:
   - For each image that has damage annotations, it converts all the damage masks into a FiftyOne Polyline.
   - If an image has no damage annotations, it creates an empty list instead
   - It packages these polylines into a [FiftyOne Polylines](https://beta-docs.voxel51.com/api/fiftyone.core.labels.Polylines.html), which is just a a list of Polylines or polygons in an image.

4. Finally, it adds all these polyline representations back to the dataset as a new field called `polylines`. 

This conversion is useful because polylines can be easier to work with for certain types of analysis, like calculating the area of damaged regions or visualizing the boundaries of damage. It's like having both a coloring book (the masks) and just the outlines (the polylines) - each format has its own advantages for different tasks.


In [23]:
# Get all ground truth detection masks from the dataset
# This returns a list of Detections objects, one per sample
segmentation_masks = hub_dataset.values("detections.detections")

# Initialize an empty list to store polyline representations for each sample
all_polylines = []

# Iterate through detections for each sample in the dataset
for sample_segmentation in segmentation_masks:
    # For each detection in the sample, convert its segmentation mask to a polyline
    # If sample has no detections (None), create empty list
    polylines = [segmentation.to_polyline() for segmentation in sample_segmentation] if sample_segmentation else []
    
    # Create a FiftyOne Polylines field containing the polyline representations
    polylines_field = fo.Polylines(
        polylines=polylines,
        closed=True,
        filled=True,
        )
    
    # Add the polylines for this sample to our list
    all_polylines.append(polylines_field)

# Add the polylines field to every sample in the dataset
# This creates a new field called "polylines" containing the polyline representations
hub_dataset.set_values("polylines", all_polylines)

Now we can define a function which will compute the area of a polygon:

In [24]:
import numpy as np

def compute_polygon_area(points, image_width, image_height):
    """
    Compute the area of a polygon in pixel units using the Shoelace formula.
    
    Args:
        points: List of (x,y) coordinates defining the polygon vertices, normalized to [0,1]
        image_width: Width of the image in pixels
        image_height: Height of the image in pixels
        
    Returns:
        float: Area of the polygon in square pixels
        
    Notes:
        The Shoelace formula (also known as the surveyor's formula) calculates the area 
        of a polygon by using the coordinates of its vertices. The formula gets its name
        from the way the computation "laces" together vertex coordinates.
    """
    # Convert points list to numpy array for vectorized operations
    points = np.array(points)
    
    # Scale normalized coordinates back to pixel dimensions
    points[:, 0] *= image_width  # Scale x coordinates
    points[:, 1] *= image_height # Scale y coordinates
    
    # Extract x and y coordinates into separate arrays
    x = points[:, 0]
    y = points[:, 1]
    
    # Create shifted versions of coordinate arrays
    # np.roll shifts array elements by 1 position for the formula
    x_shift = np.roll(x, 1)
    y_shift = np.roll(y, 1)
    
    # Apply Shoelace formula: A = 1/2 * |sum(x_i*y_i+1 - x_i+1*y_i)|
    return 0.5 * np.abs(np.sum(x * y_shift - x_shift * y))

Finally, we can add in the absolute and relative surface areas.

This code processes each sample in a car damage dataset to calculate the surface area of the damage regions. For each sample:

1. It extracts the polygon coordinates (`points`) from the first polyline in the sample. These polylines represent the outlines of damaged areas on the car.

2. It retrieves the image dimensions (`width` and `height`) from the sample's metadata.

3. It calculates the `absolute_surface_area` of the damage in pixels using a helper function called `compute_polygon_area`, which likely uses the Shoelace formula to calculate the area of an irregular polygon.

4. It calculates the `relative_surface_area` as a proportion of the total image area, but there's a bug: it uses an undefined variable `area` instead of `absolute_surface_area`.

5. It adds both measurements as properties to the polyline object.

6. It saves the updated sample back to the dataset.

This information is valuable for analyzing damage severity and comparing different types of damage across the dataset.

In [25]:
import numpy as np

for sample in hub_dataset:
    # Get the points - take the first list from the nested structure
    points = np.array(sample.polylines.polylines[0].points[0])  # Note the [0] to get first list
    
    # Get image dimensions
    width = sample.metadata.width
    height = sample.metadata.height
    
    # Compute area using the helper function
    absolute_surface_area = compute_polygon_area(points, width, height)

    relative_surface_area = area / (width * height)
    
    # Store both relative and absolute areas
    sample.polylines.polylines[0].relative_surface_area = relative_surface_area
    sample.polylines.polylines[0].absolute_surface_area = absolute_surface_area
    
    # Save the sample
    sample.save()

In [None]:
labels = hub_dataset.distinct("polylines.polylines.label")

for label in labels:
    view = hub_dataset.filter_labels("polylines", F("label") == label)
    bounds = view.bounds("polylines.polylines.relative_bbox_area")
    bounds = (bounds[0]*100, bounds[1]*100)
    area = view.mean("polylines.polylines.relative_bbox_area")*100
    std = view.std("polylines.polylines.relative_bbox_area")
    print("\033[1m%s:\033[0m Min: %.4f, Mean: %.4f, Std: %.4f, Max: %.4f" % (label, bounds[0], std, area, bounds[1]))

### Using VLMs for data enrichment

You can use Vision Language Models (VLMs) to enrich car damage analysis in some interesting ways:

1. **Basic Captioning:** Generate descriptions of damage type, severity, and affected car parts in natural language.

2. **Detailed Location:** Convert basic annotations (like "dent on door") into precise descriptions (like "dent on lower passenger door near handle").

3. **Damage Cause Analysis:** Infer potential causes from visual clues (e.g., "scratch likely from brushing against object").

4. **Scene Context:** Describe relevant environmental factors (e.g., "parked car, daylight conditions, clean vehicle").

5. **Multi-Damage Relations:** Explain how multiple damages relate (e.g., "scratch running across dent" or "cluster of dents").

6. **Component Recognition:** Identify and label both damaged and undamaged car parts for better context.


FiftyOne allows you to use VLMs rather seemlessly, for example [Qwen2.5-VL is a VLM](https://github.com/harpreetsahota204/qwen2_5_vl) which has been integrated as a [Remotely Sourced Zoo Model](https://beta-docs.voxel51.com/models/model_zoo/remote/) (which we will discuss later).

Start by registering the model:

In [None]:
import fiftyone.zoo as foz

foz.register_zoo_model_source(
    "https://github.com/harpreetsahota204/qwen2_5_vl",
    overwrite=True
    )

In [None]:
foz.download_zoo_model(
    "https://github.com/harpreetsahota204/qwen2_5_vl",
    model_name="Qwen/Qwen2.5-VL-3B-Instruct",
    overwrite=True
)

Next, you can download a checkpoint. Refer to the [Qwen2.5-VL's Remote Zoo Model's GitHub](https://github.com/harpreetsahota204/qwen2_5_vl/tree/main) to see the available checkpoints.

Then you can load the model as you would any [Built-in Zoo Model](https://beta-docs.voxel51.com/models/model_zoo/models/).

Ttart by setting `operation="vqa"`, which we will use to generate answers/captions for each image:

In [None]:
import fiftyone.zoo as foz 

zoo_model = foz.load_zoo_model(
    "Qwen/Qwen2.5-VL-3B-Instruct",
    # install_requirements=True #if you are using for the first time and need to download reuirement,
    # ensure_requirements=True #  ensure any requirements are installed before loading the model
)

You can set the prompt for the model by as follows:

In [50]:
zoo_model.operation="vqa"
zoo_model.prompt="Complete a damage report for this vehicle in this image. Include details about the damage, including the location and type of damage. If there is no damage, say 'No damage'."

And then apply the model to your Dataset via [the `apply_model` method](https://beta-docs.voxel51.com/api/fiftyone.core.models.html#apply_model) of the Dataset.

Image captioning/VQA typically takes longer other operations. This took ~17 minutes to run on a single NVIDIA RTX 600 Ada.

In [None]:
hub_dataset.apply_model(zoo_model, label_field="damage_report")

You can also use the model for classification, for example to get the color of the vehicle. All you have to do is:

In [None]:
zoo_model.operation="classify"
zoo_model.prompt = "What is the color of the damaged vehicle in this image? Please provide the color name."
hub_dataset.apply_model(zoo_model, label_field="vehicle_color")

You can also classify the location of the damage:

In [None]:
zoo_model.prompt = """You are required to report the location of vehicle damage. \
    List all the locations where this vehicle has been damaged. Choose from one or more of the following, or include any location not explicitly listed: 
    - quarter panel
    - driver door
    - passenger door
    - rear door
    - hood
    - front bumper
    - rear bumper
    - quarter panel
    - tires
    - rim 
    - wheel well
    - window
    - windshield     
    
    Where all the locations where this vehicle has been damaged?"""

hub_dataset.apply_model(zoo_model, label_field="damage_location")

Let's briefly explore what the VLM has come up with. For context, let's open the first image and examine the model output:

In [None]:
from PIL import Image

Image.open(hub_dataset.first().filepath)

In [None]:
print(hub_dataset.first().damage_report)

In [None]:
hub_dataset.first().vehicle_color

In [None]:
hub_dataset.first().damage_location

In [25]:
# REMOVE BEFORE PUSH
hub_dataset.save()
hub_dataset.persistent = True

Note, at this point it will be helpful for us to install the [Caption Viewer Plugin](https://github.com/mythrandire/caption-viewer) so that we can more easily read the captions as we explore them in the App.

In [None]:
!fiftyone plugins download https://github.com/mythrandire/caption-viewer

In [None]:
fo.launch_app(hub_dataset)

You can also refresh the FiftyOne App and view the model output for each sample in the Dataset.

## Your turn now!

There are several VLMs which are integrated as remotely sourced zoo models.

Pick from one (or more) of the following and see what you can get these models to do. Each of these repos has a complete notebook example and thorough README:

- [Moondream2](https://github.com/harpreetsahota204/moondream2)

- [Florence2](https://github.com/harpreetsahota204/florence2)

- [PaliGemma2](https://github.com/harpreetsahota204/paligemma2)

By the way, you may want to push your dataset to Hugging Face so that you can access it later. 

Make sure you have a Hugging Face token, and then sign in via the CLI: `huggingface-cli login` and enter your token.

Once you've done that, you can push your dataset to the hub as follows:

```python

from fiftyone.utils.huggingface import push_to_hub

push_to_hub(
    dataset,
    "<some-repo-name>"
)
```