Skip to content
Permalink
Browse files

Added -y option (Answer yes) and update README.md

  • Loading branch information
keldrom committed Jun 18, 2019
1 parent 93954ca commit c22f26d367e997f748755c05fb7fdd1cdd356ed9
@@ -13,7 +13,7 @@ In particular, with this practical ToolKit written in Python3 we give you, for b
* download a single class or multiple classes with the desired [attributes](https://storage.googleapis.com/openimages/web/download.html)
* use the practical visualizer to inspect the donwloaded classes

**(3.0) Image Classification**
**(3.0) Image Classification**

* download any of the [19,794](https://storage.googleapis.com/openimages/web/download.html#attributes) classes in a common labeled folder
* exploit tens of possible commands to select only the desired images (ex. like only test images)
@@ -34,8 +34,8 @@ In these few lines are simply summarized some statistics and important tips.
<tr><td>Images</td><td>1,743,042</td><td>41,620 </td><td>125,436</td><td>-</td></tr>
<tr><td>Boxes</td><td>14,610,229</td><td>204,621</td><td>625,282</td><td>600</td></tr>
</table>
**Image Classification**

**Image Classification**

<table>
<tr><td></td><td><b>Train<b></td><td><b>Validation<b></td><td><b>Test<b></td><td><b>#Classes<b></td></tr>
@@ -179,11 +179,12 @@ The annotations of the dataset has been marked with a bunch of boolean values. T
- **IsInside**: Indicates a picture taken from the inside of the object (e.g., a car interior or inside of a building).
- **n_threads**: Select how many threads you want to use. The ToolKit will take care for you to download multiple images in parallel, considerably speeding up the downloading process.
- **limit**: Limit the number of images being downloaded. Useful if you want to restrict the size of your dataset.
- **y**: Answer yes when have to download missing csv files.

Naturally, the ToolKit provides the same options as paramenters in order to filter the downloaded images.
For example, with:
```bash
python3 main.py downloader --classes Apple Orange --type_csv validation --image_IsGroupOf 0
python3 main.py downloader -y --classes Apple Orange --type_csv validation --image_IsGroupOf 0
```
only images without group annotations are downloaded.

@@ -193,24 +194,25 @@ The Toolkit is now able to acess also to the huge dataset without bounding boxes
python3 main.py downloader_ill --sub m --classes Orange --type_csv train --limit 30
```
The previously explained commands ```Dataset```, ```multiclasses```, ```n_threads``` and ```limit``` are available.
The Toolkit automatically will put the dataset and the csv folder in specific folders that are renamed with a `_nl` at the end.
The Toolkit automatically will put the dataset and the csv folder in specific folders that are renamed with a `_nl` at the end.
# Commands sum-up

| | downloader | visualizer | downloader_ill | |
|-------------:|:----------:|:----------:|:--------------:|--------------------------------------------------|
| Dataset | O | O | O | Dataset folder name |
| classes | R | | R | Considered classes |
| type_csv | R | | R | Train, test or validation dataset |
| multiclasses | O | | O | Download classes toghether |
| noLabels | O | | | Don't create labels |
| IsOccluded | O | | | Consider or not this filter |
| IsTruncated | O | | | Consider or not this filter |
| IsGroupOf | O | | | Consider or not this filter |
| IsDepiction | O | | | Consider or not this filter |
| IsInside | O | | | Consider or not this filter |
| n_threads | O | | O | Indicates the maximum threads number |
| limit | O | | O | Max number of images to download |
| sub | | | R | Human-verified or Machine-generated images (h/m) |
| | downloader | visualizer | downloader_ill | |
|-------------------:|:----------:|:----------:|:--------------:|--------------------------------------------------|
| Dataset | O | O | O | Dataset folder name |
| classes | R | | R | Considered classes |
| type_csv | R | | R | Train, test or validation dataset |
| y | O | | O | Answer yes when downloading missing csv files |
| multiclasses | O | | O | Download classes toghether |
| noLabels | O | | | Don't create labels |
| Image_IsOccluded | O | | | Consider or not this filter |
| Image_IsTruncated | O | | | Consider or not this filter |
| Image_IsGroupOf | O | | | Consider or not this filter |
| Image_IsDepiction | O | | | Consider or not this filter |
| Image_IsInside | O | | | Consider or not this filter |
| n_threads | O | | O | Indicates the maximum threads number |
| limit | O | | O | Max number of images to download |
| sub | | | R | Human-verified or Machine-generated images (h/m) |

R = required, O = optional

@@ -11,6 +11,7 @@
Licensed under the MIT License (see LICENSE for details)
------------------------------------------------------------
Usage:
refer to README.md file
"""
from sys import exit
from textwrap import dedent
@@ -33,4 +34,4 @@
if args.command == 'downloader_ill':
image_level(args, DEFAULT_OID_DIR)
else:
bounding_boxes_images(args, DEFAULT_OID_DIR)
bounding_boxes_images(args, DEFAULT_OID_DIR)
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
@@ -19,7 +19,7 @@ def bounding_boxes_images(args, DEFAULT_OID_DIR):
CLASSES_CSV = os.path.join(csv_dir, name_file_class)

if args.command == 'downloader':

logo(args.command)

if args.type_csv is None:
@@ -50,30 +50,30 @@ def bounding_boxes_images(args, DEFAULT_OID_DIR):
print(bc.INFO + 'Downloading {}.'.format(classes) + bc.ENDC)
class_name = classes

error_csv(name_file_class, csv_dir)
error_csv(name_file_class, csv_dir, args.yes)
df_classes = pd.read_csv(CLASSES_CSV, header=None)

class_code = df_classes.loc[df_classes[1] == class_name].values[0][0]

if args.type_csv == 'train':
name_file = file_list[0]
df_val = TTV(csv_dir, name_file)
df_val = TTV(csv_dir, name_file, args.yes)
if not args.n_threads:
download(args, df_val, folder[0], dataset_dir, class_name, class_code)
else:
download(args, df_val, folder[0], dataset_dir, class_name, class_code, threads = int(args.n_threads))

elif args.type_csv == 'validation':
name_file = file_list[1]
df_val = TTV(csv_dir, name_file)
df_val = TTV(csv_dir, name_file, args.yes)
if not args.n_threads:
download(args, df_val, folder[1], dataset_dir, class_name, class_code)
else:
download(args, df_val, folder[1], dataset_dir, class_name, class_code, threads = int(args.n_threads))

elif args.type_csv == 'test':
name_file = file_list[2]
df_val = TTV(csv_dir, name_file)
df_val = TTV(csv_dir, name_file, args.yes)
if not args.n_threads:
download(args, df_val, folder[2], dataset_dir, class_name, class_code)
else:
@@ -82,7 +82,7 @@ def bounding_boxes_images(args, DEFAULT_OID_DIR):
elif args.type_csv == 'all':
for i in range(3):
name_file = file_list[i]
df_val = TTV(csv_dir, name_file)
df_val = TTV(csv_dir, name_file, args.yes)
if not args.n_threads:
download(args, df_val, folder[i], dataset_dir, class_name, class_code)
else:
@@ -98,7 +98,7 @@ def bounding_boxes_images(args, DEFAULT_OID_DIR):
multiclass_name = ['_'.join(class_list)]
mkdirs(dataset_dir, csv_dir, multiclass_name, args.type_csv)

error_csv(name_file_class, csv_dir)
error_csv(name_file_class, csv_dir, args.yes)
df_classes = pd.read_csv(CLASSES_CSV, header=None)

class_dict = {}
@@ -109,23 +109,23 @@ def bounding_boxes_images(args, DEFAULT_OID_DIR):

if args.type_csv == 'train':
name_file = file_list[0]
df_val = TTV(csv_dir, name_file)
df_val = TTV(csv_dir, name_file, args.yes)
if not args.n_threads:
download(args, df_val, folder[0], dataset_dir, class_name, class_dict[class_name], class_list)
else:
download(args, df_val, folder[0], dataset_dir, class_name, class_dict[class_name], class_list, int(args.n_threads))

elif args.type_csv == 'validation':
name_file = file_list[1]
df_val = TTV(csv_dir, name_file)
df_val = TTV(csv_dir, name_file, args.yes)
if not args.n_threads:
download(args, df_val, folder[1], dataset_dir, class_name, class_dict[class_name], class_list)
else:
download(args, df_val, folder[1], dataset_dir, class_name, class_dict[class_name], class_list, int(args.n_threads))

elif args.type_csv == 'test':
name_file = file_list[2]
df_val = TTV(csv_dir, name_file)
df_val = TTV(csv_dir, name_file, args.yes)
if not args.n_threads:
download(args, df_val, folder[2], dataset_dir, class_name, class_dict[class_name], class_list)
else:
@@ -134,7 +134,7 @@ def bounding_boxes_images(args, DEFAULT_OID_DIR):
elif args.type_csv == 'all':
for i in range(3):
name_file = file_list[i]
df_val = TTV(csv_dir, name_file)
df_val = TTV(csv_dir, name_file, args.yes)
if not args.n_threads:
download(args, df_val, folder[i], dataset_dir, class_name, class_dict[class_name], class_list)
else:
@@ -155,7 +155,7 @@ def bounding_boxes_images(args, DEFAULT_OID_DIR):

if image_dir == 'exit':
exit(1)

class_image_dir = os.path.join(dataset_dir, image_dir)

print("Which class? <exit>")
@@ -8,7 +8,7 @@

OID_URL = 'https://storage.googleapis.com/openimages/2018_04/'

def TTV(csv_dir, name_file):
def TTV(csv_dir, name_file, args_y):
'''
Manage error_csv and read the correct .csv file.
@@ -17,11 +17,11 @@ def TTV(csv_dir, name_file):
:return: None
'''
CSV = os.path.join(csv_dir, name_file)
error_csv(name_file, csv_dir)
error_csv(name_file, csv_dir, args_y)
df_val = pd.read_csv(CSV)
return df_val

def error_csv(file, csv_dir):
def error_csv(file, csv_dir, args_y):
'''
Check the presence of the required .csv files.
@@ -31,7 +31,11 @@ def error_csv(file, csv_dir):
'''
if not os.path.isfile(os.path.join(csv_dir, file)):
print(bc.FAIL + "Missing the {} file.".format(os.path.basename(file)) + bc.ENDC)
ans = input(bc.OKBLUE + "Do you want to download the missing file? [Y/n] " + bc.ENDC)
if args_y:
ans = 'y'
print(bc.OKBLUE + "Automatic download." + bc.ENDC)
else:
ans = input(bc.OKBLUE + "Do you want to download the missing file? [Y/n] " + bc.ENDC)

if ans.lower() == 'y':
folder = str(os.path.basename(file)).split('-')[0]
@@ -76,4 +80,4 @@ def reporthook(count, block_size, total_size):
percent = int(count * block_size * 100 / (total_size + 1e-5))
sys.stdout.write("\r...%d%%, %d MB, %d KB/s, %d seconds passed" %
(percent, progress_size / (1024 * 1024), speed, duration))
sys.stdout.flush()
sys.stdout.flush()
@@ -46,7 +46,7 @@ def download(args, df_val, folder, dataset_dir, class_name, class_code, class_li
class_name_list = '_'.join(class_list)
else:
class_name_list = class_name

download_img(folder, dataset_dir, class_name_list, images_list, threads)
if not args.sub:
get_label(folder, dataset_dir, class_name, class_code, df_val, class_name_list, args)
@@ -74,7 +74,7 @@ def download_img(folder, dataset_dir, class_name, images_list, threads):
commands = []
for image in images_list:
path = image_dir + '/' + str(image) + '.jpg ' + '"' + download_dir + '"'
command = 'aws s3 --no-sign-request --only-show-errors cp s3://open-images-dataset/' + path
command = 'aws s3 --no-sign-request --only-show-errors cp s3://open-images-dataset/' + path
commands.append(command)

list(tqdm(pool.imap(os.system, commands), total = len(commands) ))
@@ -35,7 +35,7 @@ def image_level(args, DEFAULT_OID_DIR):
'test-annotations-machine-imagelabels.csv']

if args.sub == 'h' or args.sub == 'm':

logo(args.command)

if args.type_csv is None:
@@ -64,30 +64,30 @@ def image_level(args, DEFAULT_OID_DIR):

class_name = classes

error_csv(name_file_class, csv_dir)
error_csv(name_file_class, csv_dir, args.yes)
df_classes = pd.read_csv(CLASSES_CSV, header=None)

class_code = df_classes.loc[df_classes[1] == class_name].values[0][0]

if args.type_csv == 'train':
name_file = file_list[0]
df_val = TTV(csv_dir, name_file)
df_val = TTV(csv_dir, name_file, args.yes)
if not args.n_threads:
download(args, df_val, folder[0], dataset_dir, class_name, class_code)
else:
download(args, df_val, folder[0], dataset_dir, class_name, class_code, threads = int(args.n_threads))

elif args.type_csv == 'validation':
name_file = file_list[1]
df_val = TTV(csv_dir, name_file)
df_val = TTV(csv_dir, name_file, args.yes)
if not args.n_threads:
download(args, df_val, folder[1], dataset_dir, class_name, class_code)
else:
download(args, df_val, folder[1], dataset_dir, class_name, class_code, threads = int(args.n_threads))

elif args.type_csv == 'test':
name_file = file_list[2]
df_val = TTV(csv_dir, name_file)
df_val = TTV(csv_dir, name_file, args.yes)
if not args.n_threads:
download(args, df_val, folder[2], dataset_dir, class_name, class_code)
else:
@@ -96,7 +96,7 @@ def image_level(args, DEFAULT_OID_DIR):
elif args.type_csv == 'all':
for i in range(3):
name_file = file_list[i]
df_val = TTV(csv_dir, name_file)
df_val = TTV(csv_dir, name_file, args.yes)
if not args.n_threads:
download(args, df_val, folder[i], dataset_dir, class_name, class_code)
else:
@@ -112,7 +112,7 @@ def image_level(args, DEFAULT_OID_DIR):
multiclass_name = ['_'.join(class_list)]
mkdirs(dataset_dir, csv_dir, multiclass_name, args.type_csv)

error_csv(name_file_class, csv_dir)
error_csv(name_file_class, csv_dir, args.yes)
df_classes = pd.read_csv(CLASSES_CSV, header=None)

class_dict = {}
@@ -123,23 +123,23 @@ def image_level(args, DEFAULT_OID_DIR):

if args.type_csv == 'train':
name_file = file_list[0]
df_val = TTV(csv_dir, name_file)
df_val = TTV(csv_dir, name_file, args.yes)
if not args.n_threads:
download(args, df_val, folder[0], dataset_dir, class_name, class_dict[class_name], class_list)
else:
download(args, df_val, folder[0], dataset_dir, class_name, class_dict[class_name], class_list, int(args.n_threads))

elif args.type_csv == 'validation':
name_file = file_list[1]
df_val = TTV(csv_dir, name_file)
df_val = TTV(csv_dir, name_file, args.yes)
if not args.n_threads:
download(args, df_val, folder[1], dataset_dir, class_name, class_dict[class_name], class_list)
else:
download(args, df_val, folder[1], dataset_dir, class_name, class_dict[class_name], class_list, int(args.n_threads))

elif args.type_csv == 'test':
name_file = file_list[2]
df_val = TTV(csv_dir, name_file)
df_val = TTV(csv_dir, name_file, args.yes)
if not args.n_threads:
download(args, df_val, folder[2], dataset_dir, class_name, class_dict[class_name], class_list)
else:
@@ -148,8 +148,8 @@ def image_level(args, DEFAULT_OID_DIR):
elif args.type_csv == 'all':
for i in range(3):
name_file = file_list[i]
df_val = TTV(csv_dir, name_file)
df_val = TTV(csv_dir, name_file, args.yes)
if not args.n_threads:
download(args, df_val, folder[i], dataset_dir, class_name, class_dict[class_name], class_list)
else:
download(args, df_val, folder[i], dataset_dir, class_name, class_dict[class_name], class_list, int(args.n_threads))
download(args, df_val, folder[i], dataset_dir, class_name, class_dict[class_name], class_list, int(args.n_threads))
@@ -13,14 +13,17 @@ def parser_arguments():
parser.add_argument('--Dataset', required=False,
metavar="/path/to/OID/csv/",
help='Directory of the OID dataset folder')
parser.add_argument('-y', '--yes', required=False, action='store_true',
#metavar="Yes to download missing files",
help='ans Yes to possible download of missing files')
parser.add_argument('--classes', required=False, nargs='+',
metavar="list of classes",
help="Sequence of 'strings' of the wanted classes")
parser.add_argument('--type_csv', required=False, choices=['train', 'test', 'validation', 'all'],
metavar="'train' or 'validation' or 'test' or 'all'",
help='From what csv search the images')

parser.add_argument('--sub', required=False, choices=['h', 'm'],
parser.add_argument('--sub', required=False, choices=['h', 'm'],
metavar="Subset of human verified images or machine generated (h or m)",
help='Download from the human verified dataset or from the machine generated one.')

This file was deleted.

0 comments on commit c22f26d

Please sign in to comment.
You can’t perform that action at this time.