# Scraping google images to create a multi-label dataset

The previous model that we trained with images collected from open-source datasets was limited by its size and its single label class. We'll run a scraping tool to download more images from Google Image for different classes of military vehicles. 

## Defining labels

It's hard to find a unified taxonomy for military vehicles. We'll try to define large class labels by using Wikipedia's [Military vehicles by type](https://en.wikipedia.org/wiki/Category:Military_vehicles_by_type) category. Model names can be found in this list of [modern armoured fighting vehicles](https://en.wikipedia.org/wiki/List_of_modern_armoured_fighting_vehicles)

- **Armoured fighting vehicle (AFV)** is an armed combat vehicle protected by armour, generally combining operational mobility with offensive and defensive capabilities. AFVs can be wheeled or tracked. Examples of AFVs are tanks, armoured cars, assault guns, self-propelled guns, infantry fighting vehicles (IFV), and armoured personnel carriers (APC).
- **Armoured personnel carrier (APC)** is a broad type of armoured military vehicle designed to transport personnel and equipment in combat zones.
- **Military engineering vehicle (MEV)** is a vehicle built for construction work or for the transportation of combat engineers on the battlefield.
- **Light armoured vehicle (LAV) (incl. Reconnaissance vehicle - RV)** is the lightest weight class military vehicle category. A Jeep-like four-wheel drive vehicle for military use, with light or no armour. **Reconnaissance vehicle (RV)** is a military vehicle used for forward reconnaissance. Both tracked and wheeled reconnaissance vehicles are in service.

Based on these categories, we can define some search terms.

In [1]:
AFV = [
    "AFV Lynx",
    "Boxer AFV",
    "ZTZ-99",
    "ZTZ-96",
    "VT-4",
    "ZBD-04",
    "Leclerc tank",
    "AMX 10 RC",
    "Leopard tank",
    "T-90",
    "T-72",
    "challenger tank",
    "M1 abrams",
]
APC = [
    "AMX-10P",
    "VAB",
    "LAV III",
    "Berliet VXB",
    "Panhard VCR",
    "Didgori-3",
    "M113 APC",
    "AMPV",
    "VBTP-MR Guarani",
    "BTR-40",
    "BTR-60",
    "BTR-80",
    "TPZ Fuchs",
    "Bison APC",
    "ZBL-08",
    "fv103 spartan",
    "MRAP",
]
MEV = [
    "Engin blindé du génie",
    "ebg vulcain",
    "kodiak wisent armoured vehicle",
    "m728 cev",
    "terrier armoured vehicle",
    "imr-2 armoured vehicle",
]
LAV = [
    "LAV-25",
    "Iveco VM 90",
    "Panhard VBL",
    "Panhard AML",
    "Panhard ERC",
    "Humvee",
    "FV601 Saladin",
    "AMX-10 RC",
    "RG-32 Scout",
    "fv101 scorpion",
    "fv107 scimitar",
]

## Downloading images from google

Once we've defined our labels and search terms, we can download images from Google for each category. We'll create our dataset by downloading 50 images for each search term.

In [2]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)

In [3]:
MAX_IMAGES_PER_TERM = 50

In [4]:
from adomvi.scraper.google import GoogleImageScraper
from pathlib import Path

def worker_thread(klass, search_term):
    save_dir = Path(f"google/{klass}")
    scraper = GoogleImageScraper(
        save_dir,
        search_term,
        max_images=MAX_IMAGES_PER_TERM,
        min_resolution=(400, 300),
        max_resolution=(2048, 2048),
    )
    images = scraper.get_image_urls()
    scraper.save_images(images)

In [5]:
from concurrent.futures import ThreadPoolExecutor
from itertools import repeat

with ThreadPoolExecutor(max_workers=2) as executor:
    executor.map(worker_thread, repeat("AFV"), AFV)

INFO:adomvi.scraper.google:Chrome web driver initialized. Page title for https://www.google.com: Google
INFO:adomvi.scraper.google:Chrome web driver initialized. Page title for https://www.google.com: Google
INFO:adomvi.scraper.google:Seaching images for LAV-25
INFO:adomvi.scraper.google:Seaching images for AFV Lynx
INFO:adomvi.scraper.google:Clicking Images search button
INFO:adomvi.scraper.google:Clicking Images search button
INFO:adomvi.scraper.google:Fetching thumbnails.
INFO:adomvi.scraper.google:Found 48 thumbnails (48 new).


  0%|                                                                                                                | 0/48 [00:00<?, ?it/s]

INFO:adomvi.scraper.google:Fetching thumbnails.
INFO:adomvi.scraper.google:Found 48 thumbnails (48 new).



  2%|██▏                                                                                                     | 1/48 [00:00<00:29,  1.60it/s][A
  4%|████▎                                                                                                   | 2/48 [00:01<00:27,  1.67it/s][A
  6%|██████▌                                                                                                 | 3/48 [00:01<00:26,  1.69it/s][A
  8%|████████▋                                                                                               | 4/48 [00:02<00:26,  1.66it/s][A
 10%|██████████▊                                                                                             | 5/48 [00:02<00:25,  1.68it/s][A
 12%|█████████████                                                                                           | 6/48 [00:03<00:24,  1.69it/s][A
 15%|███████████████▏                                                                                        | 7/48 [00:04<00:24,  1.69

INFO:adomvi.scraper.google:Saving images to disk.



  0%|                                                                                                                | 0/10 [00:00<?, ?it/s]
 30%|███████████████████████████████▏                                                                        | 3/10 [00:00<00:01,  4.00it/s][A
 50%|████████████████████████████████████████████████████                                                    | 5/10 [00:01<00:01,  3.70it/s][A
 70%|████████████████████████████████████████████████████████████████████████▊                               | 7/10 [00:01<00:00,  3.63it/s][A
 80%|███████████████████████████████████████████████████████████████████████████████████▏                    | 8/10 [00:02<00:00,  2.80it/s][A
 90%|█████████████████████████████████████████████████████████████████████████████████████████████▌          | 9/10 [00:02<00:00,  2.79it/s][A
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:03<00:00,  2.85it/

INFO:adomvi.scraper.google:Writing metadata file.




 35%|████████████████████████████████████▍                                                                  | 17/48 [00:10<00:18,  1.67it/s][A
 38%|██████████████████████████████████████▋                                                                | 18/48 [00:10<00:17,  1.67it/s][A
 40%|████████████████████████████████████████▊                                                              | 19/48 [00:11<00:17,  1.68it/s][A

INFO:adomvi.scraper.google:Chrome web driver initialized. Page title for https://www.google.com: Google
INFO:adomvi.scraper.google:Seaching images for Boxer AFV



 42%|██████████████████████████████████████████▉                                                            | 20/48 [00:11<00:17,  1.65it/s][A

INFO:adomvi.scraper.google:Clicking Images search button



 44%|█████████████████████████████████████████████                                                          | 21/48 [00:12<00:16,  1.68it/s][A
 46%|███████████████████████████████████████████████▏                                                       | 22/48 [00:13<00:15,  1.70it/s][A
 48%|█████████████████████████████████████████████████▎                                                     | 23/48 [00:13<00:14,  1.72it/s][A

INFO:adomvi.scraper.google:Fetching thumbnails.
INFO:adomvi.scraper.google:Found 48 thumbnails (48 new).


  0%|                                                                                                                | 0/48 [00:00<?, ?it/s]
  2%|██▏                                                                                                     | 1/48 [00:00<00:29,  1.60it/s][A
  4%|████▎                                                                                                   | 2/48 [00:01<00:27,  1.68it/s][A
  6%|██████▌                                                                                                 | 3/48 [00:01<00:26,  1.67it/s][A
  8%|████████▋                                                                                               | 4/48 [00:02<00:25,  1.70it/s][A
 10%|██████████▊                                                                                             | 5/48 [00:02<00:25,  1.68it/s][A
 12%|█████████████                                                                                           | 6/48 [00:03<00:24,  1.70it/s

INFO:adomvi.scraper.google:Saving images to disk.




  0%|                                                                                                                | 0/10 [00:00<?, ?it/s][A
 19%|███████████████████▌                                                                                    | 9/48 [00:05<00:22,  1.73it/s][A
 20%|████████████████████▊                                                                                   | 2/10 [00:00<00:01,  7.69it/s][A
 30%|███████████████████████████████▏                                                                        | 3/10 [00:00<00:00,  8.45it/s][A
 40%|█████████████████████████████████████████▌                                                              | 4/10 [00:00<00:00,  8.61it/s][A
 21%|█████████████████████▍                                                                                 | 10/48 [00:05<00:21,  1.73it/s][A
 70%|████████████████████████████████████████████████████████████████████████▊                               | 7/10 [00:01<00:00,  6.8

INFO:adomvi.scraper.google:Writing metadata file.



 29%|██████████████████████████████                                                                         | 14/48 [00:08<00:20,  1.69it/s]

INFO:adomvi.scraper.google:Chrome web driver initialized. Page title for https://www.google.com: Google
INFO:adomvi.scraper.google:Seaching images for LAV III


 31%|████████████████████████████████▏                                                                      | 15/48 [00:08<00:19,  1.70it/s]

INFO:adomvi.scraper.google:Clicking Images search button


 38%|██████████████████████████████████████▋                                                                | 18/48 [00:10<00:17,  1.71it/s]

INFO:adomvi.scraper.google:Fetching thumbnails.
INFO:adomvi.scraper.google:Found 48 thumbnails (48 new).



 40%|████████████████████████████████████████▊                                                              | 19/48 [00:11<00:17,  1.70it/s][A
 42%|██████████████████████████████████████████▉                                                            | 20/48 [00:11<00:16,  1.69it/s][A
 42%|██████████████████████████████████████████▉                                                            | 20/48 [00:12<00:17,  1.62it/s][A

INFO:adomvi.scraper.google:Saving images to disk.



  0%|                                                                                                                | 0/10 [00:00<?, ?it/s]
 40%|█████████████████████████████████████████▌                                                              | 4/10 [00:00<00:00,  6.76it/s][A
 50%|████████████████████████████████████████████████████                                                    | 5/10 [00:01<00:01,  3.70it/s][A
 70%|████████████████████████████████████████████████████████████████████████▊                               | 7/10 [00:01<00:00,  3.45it/s][A
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:02<00:00,  4.30it/s][A

INFO:adomvi.scraper.google:Writing metadata file.




 15%|███████████████▏                                                                                        | 7/48 [00:04<00:24,  1.69it/s][A
 17%|█████████████████▎                                                                                      | 8/48 [00:04<00:23,  1.68it/s][A
 19%|███████████████████▌                                                                                    | 9/48 [00:05<00:23,  1.68it/s][A

INFO:adomvi.scraper.google:Chrome web driver initialized. Page title for https://www.google.com: Google
INFO:adomvi.scraper.google:Seaching images for ZTZ-99



 21%|█████████████████████▍                                                                                 | 10/48 [00:06<00:22,  1.66it/s][A
 23%|███████████████████████▌                                                                               | 11/48 [00:06<00:22,  1.68it/s][A

INFO:adomvi.scraper.google:Clicking Images search button



 25%|█████████████████████████▊                                                                             | 12/48 [00:07<00:21,  1.65it/s][A

INFO:adomvi.scraper.google:Fetching thumbnails.
INFO:adomvi.scraper.google:Found 48 thumbnails (48 new).


  0%|                                                                                                                | 0/48 [00:00<?, ?it/s]
 27%|███████████████████████████▉                                                                           | 13/48 [00:07<00:21,  1.66it/s][A
  2%|██▏                                                                                                     | 1/48 [00:00<00:29,  1.58it/s][A
  4%|████▎                                                                                                   | 2/48 [00:01<00:28,  1.59it/s][A
  6%|██████▌                                                                                                 | 3/48 [00:01<00:27,  1.63it/s][A
  8%|████████▋                                                                                               | 4/48 [00:02<00:26,  1.64it/s][A
 10%|██████████▊                                                                                             | 5/48 [00:03<00:26,  1.65it/s

INFO:adomvi.scraper.google:Saving images to disk.




  0%|                                                                                                                | 0/10 [00:00<?, ?it/s][A
 54%|███████████████████████████████████████████████████████▊                                               | 26/48 [00:16<00:13,  1.62it/s][A

INFO:adomvi.scraper.google:Saving images to disk.



 10%|██████████▍                                                                                             | 1/10 [00:00<00:02,  4.25it/s]
 30%|███████████████████████████████▏                                                                        | 3/10 [00:00<00:01,  3.58it/s][A
 30%|███████████████████████████████▏                                                                        | 3/10 [00:00<00:01,  4.76it/s][A
 40%|█████████████████████████████████████████▌                                                              | 4/10 [00:00<00:01,  4.35it/s][A
 60%|██████████████████████████████████████████████████████████████▍                                         | 6/10 [00:01<00:01,  3.71it/s][A
 70%|████████████████████████████████████████████████████████████████████████▊                               | 7/10 [00:01<00:00,  3.90it/s][A
 80%|███████████████████████████████████████████████████████████████████████████████████▏                    | 8/10 [00:02<00:00,  2.21it/

INFO:adomvi.scraper.google:Writing metadata file.



100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:03<00:00,  2.67it/s]

INFO:adomvi.scraper.google:Writing metadata file.





INFO:adomvi.scraper.google:Chrome web driver initialized. Page title for https://www.google.com: Google
INFO:adomvi.scraper.google:Seaching images for ZTZ-96
INFO:adomvi.scraper.google:Clicking Images search button
INFO:adomvi.scraper.google:Chrome web driver initialized. Page title for https://www.google.com: Google
INFO:adomvi.scraper.google:Seaching images for VT-4
INFO:adomvi.scraper.google:Fetching thumbnails.
INFO:adomvi.scraper.google:Found 48 thumbnails (48 new).


  4%|████▎                                                                                                   | 2/48 [00:01<00:28,  1.63it/s]

INFO:adomvi.scraper.google:Clicking Images search button


  8%|████████▋                                                                                               | 4/48 [00:02<00:25,  1.71it/s]

INFO:adomvi.scraper.google:Fetching thumbnails.
INFO:adomvi.scraper.google:Found 48 thumbnails (48 new).



 10%|██████████▊                                                                                             | 5/48 [00:02<00:25,  1.72it/s][A
 12%|█████████████                                                                                           | 6/48 [00:03<00:24,  1.68it/s][A
 15%|███████████████▏                                                                                        | 7/48 [00:04<00:24,  1.69it/s][A
 17%|█████████████████▎                                                                                      | 8/48 [00:04<00:23,  1.70it/s][A
 19%|███████████████████▌                                                                                    | 9/48 [00:05<00:22,  1.70it/s][A
 21%|█████████████████████▍                                                                                 | 10/48 [00:05<00:22,  1.70it/s][A
 23%|███████████████████████▌                                                                               | 11/48 [00:06<00:22,  1.66

INFO:adomvi.scraper.google:Saving images to disk.



 10%|██████████▍                                                                                             | 1/10 [00:00<00:01,  5.65it/s]
 50%|████████████████████████████████████████████████████                                                    | 5/10 [00:00<00:00,  5.82it/s][A
 70%|████████████████████████████████████████████████████████████████████████▊                               | 7/10 [00:01<00:00,  5.75it/s][A
 90%|█████████████████████████████████████████████████████████████████████████████████████████████▌          | 9/10 [00:01<00:00,  4.66it/s][A
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:02<00:00,  4.60it/s][A

INFO:adomvi.scraper.google:Writing metadata file.




 50%|███████████████████████████████████████████████████▌                                                   | 24/48 [00:14<00:14,  1.66it/s][A
 52%|█████████████████████████████████████████████████████▋                                                 | 25/48 [00:14<00:13,  1.66it/s][A

INFO:adomvi.scraper.google:Chrome web driver initialized. Page title for https://www.google.com: Google
INFO:adomvi.scraper.google:Seaching images for ZBD-04


 52%|█████████████████████████████████████████████████████▋                                                 | 25/48 [00:15<00:14,  1.62it/s]

INFO:adomvi.scraper.google:Saving images to disk.



 10%|██████████▍                                                                                             | 1/10 [00:00<00:02,  4.00it/s]

INFO:adomvi.scraper.google:Clicking Images search button


 60%|██████████████████████████████████████████████████████████████▍                                         | 6/10 [00:01<00:01,  3.31it/s]

INFO:adomvi.scraper.google:Fetching thumbnails.
INFO:adomvi.scraper.google:Found 50 thumbnails (50 new).



  0%|                                                                                                                | 0/50 [00:00<?, ?it/s][A

Traceback (most recent call last):
  File "/Users/jonas/workspace/adomvi/adomvi/scraper/google.py", line 195, in save_images
    with Image.open(
  File "/Users/jonas/workspace/adomvi/.venv/lib/python3.10/site-packages/PIL/Image.py", line 3280, in open
    raise UnidentifiedImageError(msg)
PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x106960ea0>


 70%|████████████████████████████████████████████████████████████████████████▊                               | 7/10 [00:02<00:00,  3.07it/s]
 80%|███████████████████████████████████████████████████████████████████████████████████▏                    | 8/10 [00:02<00:00,  3.24it/s][A
  4%|████▏                                                                                                   | 2/50 [00:01<00:28,  1.67it/s][A
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:03<00:00,  2.67it/s][A

INFO:adomvi.scraper.google:Writing metadata file.




  8%|████████▎                                                                                               | 4/50 [00:02<00:27,  1.64it/s][A
 10%|██████████▍                                                                                             | 5/50 [00:03<00:27,  1.66it/s][A

INFO:adomvi.scraper.google:Chrome web driver initialized. Page title for https://www.google.com: Google
INFO:adomvi.scraper.google:Seaching images for Leclerc tank



 12%|████████████▍                                                                                           | 6/50 [00:03<00:26,  1.68it/s][A

INFO:adomvi.scraper.google:Clicking Images search button



 14%|██████████████▌                                                                                         | 7/50 [00:04<00:25,  1.67it/s][A
 16%|████████████████▋                                                                                       | 8/50 [00:04<00:25,  1.67it/s][A
 18%|██████████████████▋                                                                                     | 9/50 [00:05<00:24,  1.68it/s][A

INFO:adomvi.scraper.google:Fetching thumbnails.
INFO:adomvi.scraper.google:Found 48 thumbnails (48 new).


  0%|                                                                                                                | 0/48 [00:00<?, ?it/s]
  2%|██▏                                                                                                     | 1/48 [00:00<00:28,  1.65it/s][A
  4%|████▎                                                                                                   | 2/48 [00:01<00:28,  1.64it/s][A
  6%|██████▌                                                                                                 | 3/48 [00:01<00:26,  1.68it/s][A
  8%|████████▋                                                                                               | 4/48 [00:02<00:25,  1.71it/s][A
 10%|██████████▊                                                                                             | 5/48 [00:02<00:25,  1.71it/s][A
 12%|█████████████                                                                                           | 6/48 [00:03<00:24,  1.70it/s

INFO:adomvi.scraper.google:Saving images to disk.




 42%|██████████████████████████████████████████▉                                                            | 20/48 [00:11<00:16,  1.65it/s][A
 10%|██████████▍                                                                                             | 1/10 [00:00<00:02,  3.31it/s][A
 44%|█████████████████████████████████████████████                                                          | 21/48 [00:12<00:16,  1.65it/s][A
 30%|███████████████████████████████▏                                                                        | 3/10 [00:00<00:01,  3.97it/s][A
 46%|███████████████████████████████████████████████▏                                                       | 22/48 [00:12<00:15,  1.66it/s][A
 48%|█████████████████████████████████████████████████▎                                                     | 23/48 [00:13<00:14,  1.67it/s][A
 70%|████████████████████████████████████████████████████████████████████████▊                               | 7/10 [00:01<00:00,  3.1

INFO:adomvi.scraper.google:Writing metadata file.



 54%|███████████████████████████████████████████████████████▊                                               | 26/48 [00:15<00:12,  1.71it/s]

INFO:adomvi.scraper.google:Chrome web driver initialized. Page title for https://www.google.com: Google
INFO:adomvi.scraper.google:Seaching images for AMX 10 RC


 58%|████████████████████████████████████████████████████████████                                           | 28/48 [00:16<00:11,  1.71it/s]

INFO:adomvi.scraper.google:Clicking Images search button


 62%|████████████████████████████████████████████████████████████████▍                                      | 30/48 [00:17<00:10,  1.67it/s]

INFO:adomvi.scraper.google:Fetching thumbnails.
INFO:adomvi.scraper.google:Found 48 thumbnails (48 new).



 65%|██████████████████████████████████████████████████████████████████▌                                    | 31/48 [00:18<00:10,  1.68it/s][A
 67%|████████████████████████████████████████████████████████████████████▋                                  | 32/48 [00:18<00:09,  1.69it/s][A
 69%|██████████████████████████████████████████████████████████████████████▊                                | 33/48 [00:19<00:08,  1.71it/s][A
 69%|██████████████████████████████████████████████████████████████████████▊                                | 33/48 [00:20<00:09,  1.65it/s][A

INFO:adomvi.scraper.google:Saving images to disk.



 10%|██████████▍                                                                                             | 1/10 [00:00<00:02,  3.60it/s]
 20%|████████████████████▊                                                                                   | 2/10 [00:00<00:03,  2.10it/s][A
 30%|███████████████████████████████▏                                                                        | 3/10 [00:01<00:03,  2.11it/s][A
 60%|██████████████████████████████████████████████████████████████▍                                         | 6/10 [00:02<00:01,  3.10it/s][A
 70%|████████████████████████████████████████████████████████████████████████▊                               | 7/10 [00:02<00:00,  3.06it/s][A
 90%|█████████████████████████████████████████████████████████████████████████████████████████████▌          | 9/10 [00:03<00:00,  2.90it/s][A
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:03<00:00,  2.61it/

INFO:adomvi.scraper.google:Writing metadata file.




 21%|█████████████████████▍                                                                                 | 10/48 [00:06<00:22,  1.65it/s][A
 23%|███████████████████████▌                                                                               | 11/48 [00:06<00:22,  1.68it/s][A

INFO:adomvi.scraper.google:Chrome web driver initialized. Page title for https://www.google.com: Google
INFO:adomvi.scraper.google:Seaching images for Leopard tank



 25%|█████████████████████████▊                                                                             | 12/48 [00:07<00:21,  1.70it/s][A
 27%|███████████████████████████▉                                                                           | 13/48 [00:07<00:20,  1.69it/s][A
 29%|██████████████████████████████                                                                         | 14/48 [00:08<00:20,  1.69it/s][A

INFO:adomvi.scraper.google:Clicking Images search button



 31%|████████████████████████████████▏                                                                      | 15/48 [00:09<00:19,  1.69it/s][A
 33%|██████████████████████████████████▎                                                                    | 16/48 [00:09<00:18,  1.70it/s][A

INFO:adomvi.scraper.google:Fetching thumbnails.
INFO:adomvi.scraper.google:Found 48 thumbnails (48 new).


  0%|                                                                                                                | 0/48 [00:00<?, ?it/s]
  2%|██▏                                                                                                     | 1/48 [00:00<00:28,  1.65it/s][A
  4%|████▎                                                                                                   | 2/48 [00:01<00:27,  1.67it/s][A
  6%|██████▌                                                                                                 | 3/48 [00:01<00:27,  1.66it/s][A
  8%|████████▋                                                                                               | 4/48 [00:02<00:26,  1.69it/s][A
 10%|██████████▊                                                                                             | 5/48 [00:02<00:25,  1.70it/s][A
 12%|█████████████                                                                                           | 6/48 [00:03<00:24,  1.70it/s

INFO:adomvi.scraper.google:Saving images to disk.




  0%|                                                                                                                | 0/10 [00:00<?, ?it/s][A
 17%|█████████████████▎                                                                                      | 8/48 [00:04<00:23,  1.71it/s][A
 20%|████████████████████▊                                                                                   | 2/10 [00:00<00:02,  3.80it/s][A
 21%|█████████████████████▍                                                                                 | 10/48 [00:05<00:22,  1.68it/s][A
 40%|█████████████████████████████████████████▌                                                              | 4/10 [00:01<00:02,  2.37it/s][A
 23%|███████████████████████▌                                                                               | 11/48 [00:06<00:21,  1.69it/s][A
 60%|██████████████████████████████████████████████████████████████▍                                         | 6/10 [00:02<00:01,  2.7

INFO:adomvi.scraper.google:Writing metadata file.



 60%|██████████████████████████████████████████████████████████████▏                                        | 29/48 [00:17<00:11,  1.65it/s]

INFO:adomvi.scraper.google:Saving images to disk.



  0%|                                                                                                                | 0/10 [00:00<?, ?it/s]

Traceback (most recent call last):
  File "/Users/jonas/workspace/adomvi/adomvi/scraper/google.py", line 195, in save_images
    with Image.open(
  File "/Users/jonas/workspace/adomvi/.venv/lib/python3.10/site-packages/PIL/Image.py", line 3280, in open
    raise UnidentifiedImageError(msg)
PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x1057e31f0>


 20%|████████████████████▊                                                                                   | 2/10 [00:00<00:02,  3.59it/s]

Traceback (most recent call last):
  File "/Users/jonas/workspace/adomvi/adomvi/scraper/google.py", line 195, in save_images
    with Image.open(
  File "/Users/jonas/workspace/adomvi/.venv/lib/python3.10/site-packages/PIL/Image.py", line 3280, in open
    raise UnidentifiedImageError(msg)
PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x10579c310>


 50%|████████████████████████████████████████████████████                                                    | 5/10 [00:02<00:02,  1.90it/s]

Traceback (most recent call last):
  File "/Users/jonas/workspace/adomvi/adomvi/scraper/google.py", line 195, in save_images
    with Image.open(
  File "/Users/jonas/workspace/adomvi/.venv/lib/python3.10/site-packages/PIL/Image.py", line 3280, in open
    raise UnidentifiedImageError(msg)
PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x105802cf0>


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:03<00:00,  3.11it/s]

INFO:adomvi.scraper.google:Writing metadata file.





INFO:adomvi.scraper.google:Chrome web driver initialized. Page title for https://www.google.com: Google
INFO:adomvi.scraper.google:Seaching images for T-72
INFO:adomvi.scraper.google:Chrome web driver initialized. Page title for https://www.google.com: Google
INFO:adomvi.scraper.google:Seaching images for T-90
INFO:adomvi.scraper.google:Clicking Images search button
INFO:adomvi.scraper.google:Clicking Images search button
INFO:adomvi.scraper.google:Fetching thumbnails.
INFO:adomvi.scraper.google:Found 48 thumbnails (48 new).


  0%|                                                                                                                | 0/48 [00:00<?, ?it/s]

INFO:adomvi.scraper.google:Fetching thumbnails.
INFO:adomvi.scraper.google:Found 48 thumbnails (48 new).



  2%|██▏                                                                                                     | 1/48 [00:00<00:30,  1.55it/s][A
  4%|████▎                                                                                                   | 2/48 [00:01<00:29,  1.57it/s][A
  6%|██████▌                                                                                                 | 3/48 [00:01<00:28,  1.59it/s][A
  8%|████████▋                                                                                               | 4/48 [00:02<00:26,  1.64it/s][A
 10%|██████████▊                                                                                             | 5/48 [00:03<00:26,  1.64it/s][A
 12%|█████████████                                                                                           | 6/48 [00:03<00:25,  1.67it/s][A
 15%|███████████████▏                                                                                        | 7/48 [00:04<00:24,  1.67

INFO:adomvi.scraper.google:Saving images to disk.




  0%|                                                                                                                | 0/10 [00:00<?, ?it/s][A
 48%|█████████████████████████████████████████████████▎                                                     | 23/48 [00:13<00:14,  1.69it/s][A
 20%|████████████████████▊                                                                                   | 2/10 [00:00<00:01,  4.49it/s][A
 30%|███████████████████████████████▏                                                                        | 3/10 [00:00<00:01,  4.21it/s][A
 50%|███████████████████████████████████████████████████▌                                                   | 24/48 [00:14<00:14,  1.68it/s][A
 50%|████████████████████████████████████████████████████                                                    | 5/10 [00:01<00:01,  4.61it/s][A
 52%|█████████████████████████████████████████████████████▋                                                 | 25/48 [00:14<00:13,  1.6

INFO:adomvi.scraper.google:Writing metadata file.



 62%|████████████████████████████████████████████████████████████████▍                                      | 30/48 [00:18<00:11,  1.62it/s]

INFO:adomvi.scraper.google:Saving images to disk.



 10%|██████████▍                                                                                             | 1/10 [00:00<00:02,  3.06it/s]

INFO:adomvi.scraper.google:Chrome web driver initialized. Page title for https://www.google.com: Google
INFO:adomvi.scraper.google:Seaching images for challenger tank


 40%|█████████████████████████████████████████▌                                                              | 4/10 [00:01<00:02,  2.56it/s]

INFO:adomvi.scraper.google:Clicking Images search button


 70%|████████████████████████████████████████████████████████████████████████▊                               | 7/10 [00:03<00:01,  2.10it/s]

INFO:adomvi.scraper.google:Fetching thumbnails.
INFO:adomvi.scraper.google:Found 48 thumbnails (48 new).



 90%|█████████████████████████████████████████████████████████████████████████████████████████████▌          | 9/10 [00:03<00:00,  2.85it/s][A
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:04<00:00,  2.40it/s][A

INFO:adomvi.scraper.google:Writing metadata file.




  4%|████▎                                                                                                   | 2/48 [00:01<00:28,  1.62it/s][A
  6%|██████▌                                                                                                 | 3/48 [00:01<00:27,  1.64it/s][A
  8%|████████▋                                                                                               | 4/48 [00:02<00:26,  1.67it/s][A

INFO:adomvi.scraper.google:Chrome web driver initialized. Page title for https://www.google.com: Google
INFO:adomvi.scraper.google:Seaching images for M1 abrams



 10%|██████████▊                                                                                             | 5/48 [00:03<00:25,  1.68it/s][A

INFO:adomvi.scraper.google:Clicking Images search button



 12%|█████████████                                                                                           | 6/48 [00:03<00:25,  1.66it/s][A
 15%|███████████████▏                                                                                        | 7/48 [00:04<00:24,  1.67it/s][A
 17%|█████████████████▎                                                                                      | 8/48 [00:04<00:23,  1.69it/s][A

INFO:adomvi.scraper.google:Fetching thumbnails.
INFO:adomvi.scraper.google:Found 48 thumbnails (48 new).


  0%|                                                                                                                | 0/48 [00:00<?, ?it/s]
  2%|██▏                                                                                                     | 1/48 [00:00<00:29,  1.61it/s][A
  4%|████▎                                                                                                   | 2/48 [00:01<00:28,  1.63it/s][A
  6%|██████▌                                                                                                 | 3/48 [00:01<00:27,  1.66it/s][A
  8%|████████▋                                                                                               | 4/48 [00:02<00:25,  1.70it/s][A
 10%|██████████▊                                                                                             | 5/48 [00:02<00:25,  1.70it/s][A
 12%|█████████████                                                                                           | 6/48 [00:03<00:24,  1.68it/s

INFO:adomvi.scraper.google:Saving images to disk.




  0%|                                                                                                                | 0/10 [00:00<?, ?it/s][A
 38%|██████████████████████████████████████▋                                                                | 18/48 [00:10<00:17,  1.72it/s][A
 20%|████████████████████▊                                                                                   | 2/10 [00:00<00:02,  3.21it/s][A
 40%|████████████████████████████████████████▊                                                              | 19/48 [00:11<00:16,  1.73it/s][A
 42%|██████████████████████████████████████████▉                                                            | 20/48 [00:11<00:16,  1.71it/s][A
 50%|████████████████████████████████████████████████████                                                    | 5/10 [00:01<00:01,  3.12it/s][A
 60%|██████████████████████████████████████████████████████████████▍                                         | 6/10 [00:01<00:01,  3.4

INFO:adomvi.scraper.google:Writing metadata file.



 48%|█████████████████████████████████████████████████▎                                                     | 23/48 [00:14<00:15,  1.61it/s]

INFO:adomvi.scraper.google:Saving images to disk.



100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:03<00:00,  2.66it/s]

INFO:adomvi.scraper.google:Writing metadata file.





## Annotate the dataset

To annotate the dataset, use a tool like [CVAT](https://app.cvat.ai/)

## Load the dataset

We provided a sample annotated dataset with 4 classes (*AFV*, *APC*, *LAV* & *MEV*). We'll use fiftyone to load and preview the dataset.

In [1]:
import fiftyone as fo

name = "google-military-vehicles"
dataset_dir = "../resources/dataset"

# Create the dataset
dataset = fo.Dataset.from_dir(
    dataset_dir=dataset_dir,
    dataset_type=fo.types.YOLOv4Dataset,
    name=name,
)

Images file '/Users/jonas/workspace/adomvi/resources/dataset/images.txt' not found. Listing data directory '/Users/jonas/workspace/adomvi/resources/dataset/data/' instead
 100% |█████████████████| 669/669 [360.0ms elapsed, 0s remaining, 1.9K samples/s]      


In [2]:
session = fo.launch_app(dataset)