# Scraping google images to create a multi-label dataset

The previous model that we trained with images collected from open-source datasets was limited by its size and its single label class. We'll run a scraping tool to download more images from Google Image for different classes of military vehicles. 

### Defining labels

It's hard to find a unified taxonomy for military vehicles. We'll try to define large class labels by using Wikipedia's [Military vehicles by type](https://en.wikipedia.org/wiki/Category:Military_vehicles_by_type) category. Model names can be found in this list of [modern armoured fighting vehicles](https://en.wikipedia.org/wiki/List_of_modern_armoured_fighting_vehicles)

- **Armoured fighting vehicle (AFV)** is an armed combat vehicle protected by armour, generally combining operational mobility with offensive and defensive capabilities. AFVs can be wheeled or tracked. Examples of AFVs are tanks, armoured cars, assault guns, self-propelled guns, infantry fighting vehicles (IFV), and armoured personnel carriers (APC).
- **Armoured personnel carrier (APC)** is a broad type of armoured military vehicle designed to transport personnel and equipment in combat zones.
- **Military engineering vehicles (MEV)**
- **Reconnaissance vehicle (RV)** is a military vehicle used for forward reconnaissance. Both tracked and wheeled reconnaissance vehicles are in service.
- **4x4 Armoured Car (AC)**

In [4]:
AFV = ["AFV Lynx"]
APC = ["AMX-10P", "VAB", "Berliet VXB", "Panhard VCR", "Didgori-3", "M113 APC", "VBTP-MR Guarani", "BTR-40", "TPZ Fuchs"]
MEV = []
RV = ["FV601 Saladin", "AMX-10 RC"]
AC = ["Iveco VM 90", "Panhard VBL", "Panhard AML", "Panhard ERC"]

## Downloading images from google

We'll download 10 images each for each Armoured Car vehicle in our list

In [6]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)

In [13]:
from adomvi.scraper.google import GoogleImageScraper
from pathlib import Path

def worker_thread(vehicle_name):
    save_dir = Path("google/AC")
    scraper = GoogleImageScraper(
        save_dir,
        vehicle_name,
        max_images=5,
        min_resolution=(640, 300),
        max_resolution=(2048, 2048),
    )
    images = scraper.get_image_urls()
    scraper.save_images(images)

In [14]:
from concurrent.futures import ThreadPoolExecutor

with ThreadPoolExecutor(max_workers=2) as executor:
    executor.map(worker_thread, AC)

INFO:adomvi.scraper.google:Chrome web driver initialized. Page title for https://www.google.com: Google
INFO:adomvi.scraper.google:Seaching images for Iveco VM 90
INFO:adomvi.scraper.google:Chrome web driver initialized. Page title for https://www.google.com: Google
INFO:adomvi.scraper.google:Seaching images for Panhard VBL
INFO:adomvi.scraper.google:Clicking Images search button
INFO:adomvi.scraper.google:Clicking Images search button
INFO:adomvi.scraper.google:Scrolling page.
INFO:adomvi.scraper.google:Fetching thumbnails.
INFO:adomvi.scraper.google:Found 48 thumbnails (48 new).


  0%|                                                                                                                | 0/48 [00:00<?, ?it/s]

INFO:adomvi.scraper.google:Scrolling page.
INFO:adomvi.scraper.google:Fetching thumbnails.
INFO:adomvi.scraper.google:Found 48 thumbnails (48 new).



  0%|                                                                                                                | 0/48 [00:00<?, ?it/s][A

INFO:adomvi.scraper.google:1	File:Panhard VBL (Vèhicule Blindé Legér), French army licence registration  '6924 0055' pic.JPG - Wikimedia Commons	https://upload.wikimedia.org/wikipedia/commons/thumb/f/f2/Panhard_VBL_%28V%C3%A8hicule_Blind%C3%A9_Leg%C3%A9r%29%2C_French_army_licence_registration_%276924_0055%27_pic.JPG/2560px-Panhard_VBL_%28V%C3%A8hicule_Blind%C3%A9_Leg%C3%A9r%29%2C_French_army_licence_registration_%276924_0055%27_pic.JPG


  2%|██▏                                                                                                     | 1/48 [00:03<02:28,  3.16s/it]

INFO:adomvi.scraper.google:1	Iveco vm 90 Banque de photographies et d'images à haute résolution - Alamy	https://c8.alamy.com/compfr/f9dm03/l-intervention-militaire-italienne-en-irak-10-2004-les-soldats-de-la-brigade-aeroportee-friuli-pour-la-preparation-d-une-patrouille-de-nuit-f9dm03.jpg



  4%|████▎                                                                                                   | 2/48 [00:06<02:23,  3.12s/it][A

INFO:adomvi.scraper.google:2	Iveco per € 25.000,-	https://prod.pictures.autoscout24.net/listing-images/82deea47-8efb-46eb-87cb-9cb5eeac8eec_08b07175-1010-498f-81e9-a4de8c916d7f.jpg/1920x1080.webp



  6%|██████▌                                                                                                 | 3/48 [00:09<02:21,  3.15s/it][A

INFO:adomvi.scraper.google:3	Iveco VM 90 4X4 - Forum BMH	https://bernard.debucquoi.com/forum/download/file.php?id=58561&mode=view



  8%|████████▋                                                                                               | 4/48 [00:12<02:17,  3.13s/it][A
  8%|████████▋                                                                                               | 4/48 [00:12<02:17,  3.13s/it][A

INFO:adomvi.scraper.google:2	Panhard Vbl Véhicule Militaire Piraeus Photo éditorial - Image du port,  bataille: 227467196	https://thumbs.dreamstime.com/z/panhard-vbl-v%C3%A9hicule-militaire-piraeus-pir%C3%A9e-gr%C3%A8ce-juillet-blind%C3%A9-l%C3%A9ger-de-l-arm%C3%A9e-grecque-allterrain-et-enti%C3%A8rement-227467196.jpg


 10%|██████████▊                                                                                             | 5/48 [00:15<02:14,  3.13s/it]
 12%|█████████████                                                                                           | 6/48 [00:18<02:11,  3.12s/it][A

INFO:adomvi.scraper.google:4	Iveco VM90 40.10W/M 4x4, Véh. de voirie Autres occasion à 8502 Lannach  achetez sur TruckScout24	https://cdn-img.truckscout24.com/images-big/67/89/0019468967008.jpg



 12%|█████████████                                                                                           | 6/48 [00:18<02:10,  3.12s/it][A
 15%|███████████████▏                                                                                        | 7/48 [00:19<01:40,  2.44s/it][A
 17%|█████████████████▎                                                                                      | 8/48 [00:20<01:20,  2.02s/it][A

INFO:adomvi.scraper.google:3	Panhard VBL | Panhard Véhicule Blindé Léger (Light Armoured … | Flickr	https://live.staticflickr.com/4358/35623002353_085567dac8_b.jpg


 15%|███████████████▏                                                                                        | 7/48 [00:21<02:07,  3.12s/it]
 19%|███████████████████▌                                                                                    | 9/48 [00:24<01:32,  2.37s/it][A

INFO:adomvi.scraper.google:4	VBL Panhard : véhicule blindé léger - Yves Debay - Librairie Mollat Bordeaux	https://media.electre-ng.com/images/image-id/512bbcdfe5736ae8bac9bc241747fb6095c2659583e928d0162d95fd24a3ce7d.jpg


 17%|█████████████████▎                                                                                      | 8/48 [00:24<02:04,  3.11s/it]
 21%|█████████████████████▍                                                                                 | 10/48 [00:25<01:15,  1.98s/it][A
 19%|███████████████████▌                                                                                    | 9/48 [00:28<02:01,  3.13s/it][A

INFO:adomvi.scraper.google:5	Iveco VM 90 — Wikipédia	https://upload.wikimedia.org/wikipedia/commons/thumb/8/87/Iveco-Pegaso_40.10WM_E.T..JPG/220px-Iveco-Pegaso_40.10WM_E.T..JPG


 23%|███████████████████████▌                                                                               | 11/48 [00:29<01:38,  2.67s/it]

INFO:adomvi.scraper.google:Saving images to disk.




 23%|███████████████████████▌                                                                               | 11/48 [00:32<01:32,  2.51s/it][A
 20%|█████████████████████                                                                                    | 1/5 [00:03<00:12,  3.13s/it][A
 40%|██████████████████████████████████████████                                                               | 2/5 [00:04<00:05,  1.99s/it][A
 60%|███████████████████████████████████████████████████████████████                                          | 3/5 [00:05<00:03,  1.55s/it][A

INFO:adomvi.scraper.google:5	Panhard VBL, VB2L, Ultima et MKII au 1/48 (Master Fighter et Gaso.Line) -	https://img.over-blog-kiwi.com/1/47/73/87/20200617/ob_a1b37e_mf48599bt06.jpg


 23%|███████████████████████▌                                                                               | 11/48 [00:35<01:59,  3.23s/it]

INFO:adomvi.scraper.google:Saving images to disk.




  0%|                                                                                                                 | 0/5 [00:00<?, ?it/s][A

INFO:adomvi.scraper.google:Not saving image https://upload.wikimedia.org/wikipedia/commons/thumb/8/87/Iveco-Pegaso_40.10WM_E.T..JPG/220px-Iveco-Pegaso_40.10WM_E.T..JPG because of invalid dimension ((220, 163))



100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:05<00:00,  1.19s/it][A

INFO:adomvi.scraper.google:Writing metadata file.



 60%|███████████████████████████████████████████████████████████████                                          | 3/5 [00:01<00:00,  2.53it/s]

INFO:adomvi.scraper.google:Not saving image https://media.electre-ng.com/images/image-id/512bbcdfe5736ae8bac9bc241747fb6095c2659583e928d0162d95fd24a3ce7d.jpg because of invalid dimension ((400, 534))


 80%|████████████████████████████████████████████████████████████████████████████████████                     | 4/5 [00:01<00:00,  2.12it/s]

INFO:adomvi.scraper.google:Chrome web driver initialized. Page title for https://www.google.com: Google
INFO:adomvi.scraper.google:Seaching images for Panhard AML


100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:02<00:00,  2.01it/s]

INFO:adomvi.scraper.google:Writing metadata file.
INFO:adomvi.scraper.google:Clicking Images search button





INFO:adomvi.scraper.google:Scrolling page.
INFO:adomvi.scraper.google:Fetching thumbnails.
INFO:adomvi.scraper.google:Chrome web driver initialized. Page title for https://www.google.com: Google
INFO:adomvi.scraper.google:Seaching images for Panhard ERC
INFO:adomvi.scraper.google:Found 48 thumbnails (48 new).


  0%|                                                                                                                | 0/48 [00:00<?, ?it/s]

INFO:adomvi.scraper.google:Clicking Images search button
INFO:adomvi.scraper.google:Scrolling page.
INFO:adomvi.scraper.google:Fetching thumbnails.
INFO:adomvi.scraper.google:Found 100 thumbnails (100 new).



  0%|                                                                                                               | 0/100 [00:00<?, ?it/s][A

INFO:adomvi.scraper.google:1	Panhard AML armoured car (1960)	https://www.tanks-encyclopedia.com/coldwar/France/Panhard/Panhard_AML-90-AR.png


  4%|████▎                                                                                                   | 2/48 [00:04<01:29,  1.94s/it]
  6%|██████▌                                                                                                 | 3/48 [00:05<01:10,  1.57s/it][A
 10%|██████████▊                                                                                             | 5/48 [00:09<01:17,  1.79s/it][A
 15%|███████████████▏                                                                                        | 7/48 [00:13<01:16,  1.88s/it][A
  4%|████                                                                                                   | 4/100 [00:12<05:00,  3.13s/it][A

INFO:adomvi.scraper.google:2	Polished Sincerely layer aml 90 Read Hare Specificity	https://i.pinimg.com/originals/61/dc/13/61dc132724ed6d92916f3169f0d5d934.jpg


 17%|█████████████████▎                                                                                      | 8/48 [00:16<01:31,  2.28s/it]
 23%|███████████████████████▌                                                                               | 11/48 [00:20<00:54,  1.48s/it][A
  6%|██████▏                                                                                                | 6/100 [00:18<04:52,  3.11s/it][A

INFO:adomvi.scraper.google:3	Panhard AML-90 Reconessance vehicle ACE 72413	https://www.super-hobby.fr/zdjecia/4/4/5/1981_rd.jpg


 25%|█████████████████████████▊                                                                             | 12/48 [00:23<01:11,  1.99s/it]
 27%|███████████████████████████▉                                                                           | 13/48 [00:24<00:59,  1.71s/it][A
  8%|████████▏                                                                                              | 8/100 [00:24<04:46,  3.12s/it][A

INFO:adomvi.scraper.google:4	Voiture Blindée Française Panhard Aml-245 Banque D'Images Et Photos Libres  De Droits. Image 93183857.	https://us.123rf.com/450wm/ryzhov/ryzhov1801/ryzhov180100032/93183857-voiture-blind%C3%A9e-fran%C3%A7aise-panhard-aml-245.jpg


 29%|██████████████████████████████                                                                         | 14/48 [00:27<01:12,  2.14s/it]
 33%|██████████████████████████████████▎                                                                    | 16/48 [00:31<01:04,  2.03s/it][A
 10%|██████████▏                                                                                           | 10/100 [00:31<04:39,  3.11s/it][A

INFO:adomvi.scraper.google:5	WarWheels.Net-Panhard AML-90 Armored Car Index	https://warwheels.net/images/PanhardAML90foti%20(7).jpg


 33%|██████████████████████████████████▎                                                                    | 16/48 [00:34<01:09,  2.18s/it]

INFO:adomvi.scraper.google:Saving images to disk.



  0%|                                                                                                                 | 0/5 [00:00<?, ?it/s]

INFO:adomvi.scraper.google:Not saving image https://www.tanks-encyclopedia.com/coldwar/France/Panhard/Panhard_AML-90-AR.png because of invalid dimension ((414, 211))


 40%|██████████████████████████████████████████                                                               | 2/5 [00:00<00:01,  2.35it/s]

INFO:adomvi.scraper.google:Not saving image https://www.super-hobby.fr/zdjecia/4/4/5/1981_rd.jpg because of invalid dimension ((600, 333))


 60%|███████████████████████████████████████████████████████████████                                          | 3/5 [00:01<00:00,  2.93it/s]
 11%|███████████▏                                                                                          | 11/100 [00:34<04:36,  3.10s/it][A

INFO:adomvi.scraper.google:Not saving image https://us.123rf.com/450wm/ryzhov/ryzhov1801/ryzhov180100032/93183857-voiture-blind%C3%A9e-fran%C3%A7aise-panhard-aml-245.jpg because of invalid dimension ((450, 300))


 80%|████████████████████████████████████████████████████████████████████████████████████                     | 4/5 [00:01<00:00,  2.87it/s]
 12%|████████████▏                                                                                         | 12/100 [00:35<03:39,  2.49s/it][A



100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:03<00:00,  1.37it/s]

INFO:adomvi.scraper.google:Writing metadata file.




 13%|█████████████▎                                                                                        | 13/100 [00:38<03:52,  2.67s/it][A
 14%|██████████████▎                                                                                       | 14/100 [00:41<04:01,  2.81s/it][A
 15%|███████████████▎                                                                                      | 15/100 [00:44<04:06,  2.90s/it][A
 16%|████████████████▎                                                                                     | 16/100 [00:47<04:09,  2.97s/it][A
 17%|█████████████████▎                                                                                    | 17/100 [00:50<04:09,  3.01s/it][A
 18%|██████████████████▎                                                                                   | 18/100 [00:53<04:08,  3.04s/it][A
 19%|███████████████████▍                                                                                  | 19/100 [00:57<04:07,  3.0

KeyboardInterrupt: 

[A




 81%|██████████████████████████████████████████████████████████████████████████████████▌                   | 81/100 [02:47<00:05,  3.44it/s][A



100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [02:47<00:00,  1.67s/it]

INFO:adomvi.scraper.google:Scrolling page.



