FiftyOne is an open-source tool for building high-quality datasets and computer vision models

Sources:

https://docs.voxel51.com/integrations/coco.html

https://colab.research.google.com/github/voxel51/fiftyone/blob/v0.23.5/docs/source/tutorials/detectron2.ipynb#scrollTo=FsePPpwZSmqt

In [None]:
!pip install fiftyone # install FiftyOne app

In [2]:
import fiftyone as fo
import fiftyone.zoo as foz

Migrating database to v0.23.5


INFO:fiftyone.migrations.runner:Migrating database to v0.23.5


In [12]:
# load dataset

# classes of interest
classes = ["bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl"]

dataset = foz.load_zoo_dataset(
    "coco-2017",  # Dataset name or identifier
    split="train",  # Specifies the dataset split to load (train, val, test).
    classes=classes,  # List of classes to include in the dataset
    label_types=["detections"],  # Types of labels to include (e.g., segmentation labels)... by default only detections are loaded
    max_samples=500,  # Maximum number of samples to load.
    shuffle=True,  # Whether to shuffle the dataset before loading.
    seed=42,  # Seed for shuffling the dataset.
)

Downloading split 'train' to '/root/fiftyone/coco-2017/train' if necessary


INFO:fiftyone.zoo.datasets:Downloading split 'train' to '/root/fiftyone/coco-2017/train' if necessary


Found annotations at '/root/fiftyone/coco-2017/raw/instances_train2017.json'


INFO:fiftyone.utils.coco:Found annotations at '/root/fiftyone/coco-2017/raw/instances_train2017.json'


Sufficient images already downloaded


INFO:fiftyone.utils.coco:Sufficient images already downloaded


Existing download of split 'train' is sufficient


INFO:fiftyone.zoo.datasets:Existing download of split 'train' is sufficient


Loading existing dataset 'coco-2017-train-500'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use


INFO:fiftyone.zoo.datasets:Loading existing dataset 'coco-2017-train-500'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use


In [13]:
print(dataset)

Name:        coco-2017-train-500
Media type:  image
Num samples: 500
Persistent:  False
Tags:        []
Sample fields:
    id:           fiftyone.core.fields.ObjectIdField
    filepath:     fiftyone.core.fields.StringField
    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)
    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)


In [14]:
# Launch the FiftyOne App to explore the dataset visually
session = fo.launch_app(dataset)

Specifying a classes when downloading a dataset from the zoo will ensure that only samples with one of the given classes will be present. However, these samples may still contain other labels, so we can use the powerful filtering capability of FiftyOne to easily keep only the labels of interest. We will also untag these samples and create our own split out of them.

In [15]:
from fiftyone import ViewField as F

# Remove other classes that we are not interested in
dataset.filter_labels("ground_truth", F("label").is_in(classes)).save()

# untag the dataset to allow for a custom split
dataset.untag_samples("train")

In [16]:
## visualize the updated app session
## Whenever you open a new App instance in a notebook cell, e.g., by updating your Session object, any previous App instances will be automatically replaced with a static screenshot
session.show()

In [17]:
## split the dataset as you wish

import fiftyone.utils.random as four

four.random_split(dataset, {"train": 0.8, "val": 0.2})

In [22]:
session.show()

In [10]:
# store the formatted COCO subset

# Classes list
class_list = dataset.distinct("ground_truth.detections.label")
class_list

# The directory in which the dataset's images are stored
import os
IMAGES_DIR = os.path.dirname(dataset.first().filepath)

# Export some labels in COCO format
dataset.take(5).export(
    dataset_type=fo.types.COCODetectionDataset,
    label_field="ground_truth",
    labels_path="/tmp/coco.json",
    classes=class_list,
)

 100% |█████████████████████| 5/5 [526.4ms elapsed, 0s remaining, 9.5 samples/s]      


INFO:eta.core.utils: 100% |█████████████████████| 5/5 [526.4ms elapsed, 0s remaining, 9.5 samples/s]      


EXPORTING FIFTYONE DATASETS - https://docs.voxel51.com/user_guide/export_datasets.html

In [20]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [21]:
# Export the labels in the `ground_truth` field in COCO format, and
# move (rather than copy) the source media to the output directory
dataset.export(
    export_dir="/content/drive/My Drive/UofT Grad School/MIE 1517 - Intro to Deep Learning/Course Project/COCO_dataset/",
    dataset_type=fo.types.COCODetectionDataset,
    label_field="ground_truth",
    export_media="move",
)

Directory '/content/drive/My Drive/UofT Grad School/MIE 1517 - Intro to Deep Learning/Course Project/COCO_dataset/' already exists; export will be merged with existing files




 100% |█████████████████| 500/500 [1.4m elapsed, 0s remaining, 6.2 samples/s]       


INFO:eta.core.utils: 100% |█████████████████| 500/500 [1.4m elapsed, 0s remaining, 6.2 samples/s]       


Now we have a /tmp/coco.json file on disk containing COCO labels corresponding to the images in IMAGES_DIR

In [None]:
!python -m json.tool /tmp/coco.json

See link below on how you can evaluate your model using FiftyOne test set

- https://docs.voxel51.com/user_guide/evaluation.html#evaluating-detections

Useful image utilities for preprocessing datasets
- https://docs.voxel51.com/api/fiftyone.utils.image.html