In this notebook we attempt to download a subset of images from the famous  Open Images Dataset by Google. We will use OIDv4 toolkit that enables us to specify the classes we wish to download and then convert the annotations style to the darknet format to be ready for YOLOv3 training


github repo refrence: https://github.com/theAIGuysCode/OIDv4_ToolKit

#Clonning OIDv4 toolkit to download our custom object classes training dataset

In [None]:
!git clone https://github.com/theAIGuysCode/OIDv4_ToolKit

Cloning into 'OIDv4_ToolKit'...
remote: Enumerating objects: 444, done.[K
remote: Total 444 (delta 0), reused 0 (delta 0), pack-reused 444[K
Receiving objects: 100% (444/444), 34.09 MiB | 29.63 MiB/s, done.
Resolving deltas: 100% (157/157), done.


Moving into the toolkit dir and installing the requirements

In [None]:
%cd OIDv4_ToolKit/
!pip install -r requirements.txt

/content/OIDv4_ToolKit
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting awscli
  Downloading awscli-1.24.10-py3-none-any.whl (3.9 MB)
[K     |████████████████████████████████| 3.9 MB 5.3 MB/s 
Collecting botocore==1.26.10
  Downloading botocore-1.26.10-py3-none-any.whl (8.8 MB)
[K     |████████████████████████████████| 8.8 MB 36.3 MB/s 
[?25hCollecting docutils<0.17,>=0.10
  Downloading docutils-0.16-py2.py3-none-any.whl (548 kB)
[K     |████████████████████████████████| 548 kB 54.6 MB/s 
[?25hCollecting colorama<0.4.5,>=0.2.5
  Downloading colorama-0.4.4-py2.py3-none-any.whl (16 kB)
Collecting s3transfer<0.6.0,>=0.5.0
  Downloading s3transfer-0.5.2-py3-none-any.whl (79 kB)
[K     |████████████████████████████████| 79 kB 8.0 MB/s 
[?25hCollecting rsa<4.8,>=3.1.2
  Downloading rsa-4.7.2-py3-none-any.whl (34 kB)
Collecting jmespath<2.0.0,>=0.7.1
  Downloading jmespath-1.0.0-py3-none-any.whl (23 kB)
Collecting urllib3
  D

In [None]:
#we restarted the runtime so we will cd into the toolkit dir again
%cd OIDv4_ToolKit/

/content/OIDv4_ToolKit


Then we will run the script that takes in the classes we wish the download as well as the limit of the downloaded images per class

The classes we are interested in for our vehicle detection application are: 
Car, Bus, Van, Truck, Motorcycle and Bicycle

In [None]:
!python main.py downloader --classes Car Bus Van Truck Motorcycle Bicycle --type_csv train --limit 1000 --multiclasses 1

[92m
		   ___   _____  ______            _    _    
		 .'   `.|_   _||_   _ `.         | |  | |   
		/  .-.  \ | |    | | `. \ _   __ | |__| |_  
		| |   | | | |    | |  | |[ \ [  ]|____   _| 
		\  `-'  /_| |_  _| |_.' / \ \/ /     _| |_  
		 `.___.'|_____||______.'   \__/     |_____|
	[0m
[92m
             _____                    _                 _             
            (____ \                  | |               | |            
             _   \ \ ___  _ _ _ ____ | | ___   ____  _ | | ____  ____ 
            | |   | / _ \| | | |  _ \| |/ _ \ / _  |/ || |/ _  )/ ___)
            | |__/ / |_| | | | | | | | | |_| ( ( | ( (_| ( (/ /| |    
            |_____/ \___/ \____|_| |_|_|\___/ \_||_|\____|\____)_|    
                                                          
        [0m
    [INFO] | Downloading ['Car', 'Bus', 'Van', 'Truck', 'Motorcycle', 'Bicycle'] together.[0m
[91m   [ERROR] | Missing the class-descriptions-boxable.csv file.[0m
[94m[DOWNLOAD] | Do you want to down

It can be observed that the number of images for each class (except cars) are less than 1000 (which is the number of images that are supposed to be downloaded per class) and this is because some images were already downloaded before that had multiple object instances so they weren't downloaded again. For example an image with a car and bus was downloaded once in the car class and not downloaded again in the bus class since they are all in the same folder.

Next step is to convert generated labels to annotations with the format required by Darknet using a ready made script

In [None]:
!python convert_annotations.py

Currently in subdirectory: train
Converting annotations for class:  Car_Bus_Van_Truck_Motorcycle_Bicycle
100% 5796/5796 [03:10<00:00, 30.36it/s]


We will need to delete the old labels folder because we no longer need it

In [None]:
!rm -r OID/Dataset/train/Car_Bus_Van_Truck_Motorcycle_Bicycle/Label/

We will rename the dataset folder into 'obj' so we can refrence it later easily.

We have now succesfully generated our custom dataset with the proper format that is ready to be trained through the darknet network on the YOLOv3 model

Finally we will mount our google drive and upload a zipped folder containing the dataset we just downloaded to recall it if the runtime session ended so that we can use it anytime we need

In [None]:
from google.colab import drive
drive.mount('/content/gdrive',force_remount=True)

Mounted at /content/gdrive


In [None]:
import zipfile
import os
import sys

zipname = 'obj'

def zipfolder(foldername, target_dir):            
    zipobj = zipfile.ZipFile(foldername + '.zip', 'w', zipfile.ZIP_DEFLATED)
    rootlen = len(target_dir) + 1
    for base, dirs, files in os.walk(target_dir):
        for file in files:
            fn = os.path.join(base, file)
            zipobj.write(fn, fn[rootlen:])

zipfolder(zipname, '/content/OIDv4_ToolKit/OID/Dataset/train')

In [None]:
%cp obj.zip /content/gdrive/My\ Drive/yolov3

#Downloading test dataset using the same method

In [None]:
!python main.py downloader --classes Car Bus Van Truck Motorcycle Bicycle --type_csv validation --limit 100 --multiclasses 1

[92m
		   ___   _____  ______            _    _    
		 .'   `.|_   _||_   _ `.         | |  | |   
		/  .-.  \ | |    | | `. \ _   __ | |__| |_  
		| |   | | | |    | |  | |[ \ [  ]|____   _| 
		\  `-'  /_| |_  _| |_.' / \ \/ /     _| |_  
		 `.___.'|_____||______.'   \__/     |_____|
	[0m
[92m
             _____                    _                 _             
            (____ \                  | |               | |            
             _   \ \ ___  _ _ _ ____ | | ___   ____  _ | | ____  ____ 
            | |   | / _ \| | | |  _ \| |/ _ \ / _  |/ || |/ _  )/ ___)
            | |__/ / |_| | | | | | | | | |_| ( ( | ( (_| ( (/ /| |    
            |_____/ \___/ \____|_| |_|_|\___/ \_||_|\____|\____)_|    
                                                          
        [0m
    [INFO] | Downloading ['Car', 'Bus', 'Van', 'Truck', 'Motorcycle', 'Bicycle'] together.[0m
[91m   [ERROR] | Missing the class-descriptions-boxable.csv file.[0m
[94m[DOWNLOAD] | Do you want to down

In [None]:
!python convert_annotations.py

Currently in subdirectory: validation
Converting annotations for class:  Car_Bus_Van_Truck_Motorcycle_Bicycle
100% 552/552 [00:12<00:00, 42.62it/s]


In [None]:
!rm -r OID/Dataset/validation/Car_Bus_Van_Truck_Motorcycle_Bicycle/Label/

In [None]:
zipname = 'test'
zipfolder(zipname, '/content/OIDv4_ToolKit/OID/Dataset/validation')

In [None]:
%cp test.zip /content/gdrive/My\ Drive/yolov3