<a href="https://colab.research.google.com/github/kandong54/autodrone/blob/main/datasets/open_images.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# [Open Images Dataset V6](https://storage.googleapis.com/openimages/web/index.html) Flower Subset


## TODO
- Add [background images](https://github.com/ultralytics/yolov5/wiki/Tips-for-Best-Training-Results)
> Background images are images with no objects that are added to a dataset to reduce False Positives (FP). We recommend about 0-10% background images to help reduce FPs (COCO has 1000 background images for reference, 1% of the total). No labels are required for background images.

## Flower Subset
72,001 images with flower bounding boxes:
- Train: 62716
- Test: 6949
- Validation: 2336

## Format
- open-images-v6: FiftyOne format
- yolov5: YOLOv5 format
- tfrecords: TFRecords in TF Object Detection API format

All files are stored in the [Shareddrives/AutoDrone/Datasets/open-images-v6](https://drive.google.com/drive/u/1/folders/1TWXl1o9qkUCKIFzYmzXsQfC7e4ZJeTnY).

## Tools
- [FiftyOne](https://voxel51.com/docs/fiftyone/)

## Licenses
- Open Images Dataset V6
> The annotations are licensed by Google LLC under CC BY 4.0 license. The images are listed as having a CC BY 2.0 license. Note: while we tried to identify images that are licensed under a Creative Commons Attribution license, we make no representations or warranties regarding the license status of each image and you should verify the license for each image yourself.

# Load and Explore Images

[Downloading and Evaluating Open Images](https://voxel51.com/docs/fiftyone/tutorials/open_images.html)

In [1]:
!pip install -q opencv-python-headless==4.5.4.60 fiftyone

import fiftyone as fo
import fiftyone.zoo as foz
from google.colab import drive

drive.mount('/content/drive')

Mounted at /content/drive


Download zipped dataset and unzip it. 

This takes about 10 minutes.

In [2]:
!rm -rf /root/fiftyone
!mkdir /root/fiftyone
!unzip -q drive/Shareddrives/AutoDrone/Datasets/open-images-v6/open-images-v6.zip -d /root/fiftyone

Load dataset.

This takes about 15 minutes.

In [3]:
# dataset directory
# fo.config.dataset_zoo_dir = "drive/Shareddrives/AutoDrone/Datasets/open-images-v6/"

dataset = foz.load_zoo_dataset(
    "open-images-v6", 
    label_types="detections", 
    classes = ["Flower"],
    only_matching=True,
    dataset_name="open-images-flower",
)

Downloading split 'train' to '/root/fiftyone/open-images-v6/train' if necessary
Necessary images already downloaded
Existing download of split 'train' is sufficient
Downloading split 'test' to '/root/fiftyone/open-images-v6/test' if necessary
Necessary images already downloaded
Existing download of split 'test' is sufficient
Downloading split 'validation' to '/root/fiftyone/open-images-v6/validation' if necessary
Necessary images already downloaded
Existing download of split 'validation' is sufficient
Loading 'open-images-v6' split 'train'
 100% |█████████████| 62716/62716 [8.6m elapsed, 0s remaining, 108.5 samples/s]      
Loading 'open-images-v6' split 'test'
 100% |███████████████| 6949/6949 [30.7s elapsed, 0s remaining, 224.6 samples/s]      
Loading 'open-images-v6' split 'validation'
 100% |███████████████| 2336/2336 [10.0s elapsed, 0s remaining, 241.7 samples/s]     
Dataset 'open-images-flower' created


In [4]:
# explore
session = fo.launch_app(dataset)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [5]:
session.freeze() # screenshots App for sharing

# Convert format

In [None]:
# export_dir = "./yolov5"
# label_field = "detections"
# splits = ["train", "test", "validation"]
# for split in splits:
#     view = dataset.match_tags(split)
#     view.export(
#         export_dir=export_dir,
#         split=split,
#         classes=["Flower"],
#         dataset_type=fo.types.YOLOv5Dataset,
#         label_field=label_field,
#     )


In [None]:
# export_dir = "./tfrecords"
# label_field = "detections"
# splits = ["train", "test", "validation"]
# for split in splits:
#     view = dataset.match_tags(split)
#     view.export(
#         export_dir=export_dir,
#         tf_records_path=split + ".tfrecords",
#         classes=["Flower"],
#         dataset_type=fo.types.TFObjectDetectionDataset,
#         label_field=label_field,
#     )
