# _G-i[mage]-dowloader_
#### _Download Google images with Selenium and Python_
***
#### _*Introduction*_
In this Jupyter notebook we will setup a chrome driver. Given a Google image search page the script downloads the images related to that search in a specified folder. Follow the cells below for a full explanation of the code.

#### _Imports_

In [1]:
#import necessary packages
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import os
import traceback
import requests
import time

#### _Start Chrome Session_
- Setup a Chrome driver session with selenium.
- the chromedriver file is supposed to be in the same directory of the notebook. Change it if needed.

In [3]:
# setup a chrome driver and start a chrome session
driver_path = os.path.join(os.getcwd(),'chromedriver.exe')
chrome_options = webdriver.ChromeOptions()
chrome_options.add_experimental_option('prefs', {'intl.accept_languages': 'en,en_US'})
driver = webdriver.Chrome(driver_path,options=chrome_options)

if everything went well a chrome browser was opened, just like in the image below.
<img src="https://drive.google.com/uc?id=1aAjCU7II4b-FRxDzTML5M_TtjQe2yZq3" width="600px">
***

#### _Search the image needed_
Now you can perform your search, like the one in the image below.
***
<img src="assets/google_image_search.png" width="600px">

***

Note that there are two elements highlighted in green. **_It is mandatory that those elements are visible_** during the download phase, because Selenium will interact with:
* the image to download it
* the chevron right icon to go to the next image
***

#### _HTML structure of the page_
we can inspect the html structure of an element by right-clicking and selecting "Inspect". A navigation with the code will appear on the right.
<img src="assets/image_html.png" style ="width:600px;margin-top:50px;margin-bottom:50px">
we are interested in the ```<img>``` tag and ```class``` attribute.

For demonstration purposes you can run the cell below. Selenium will search for all the elements with ```tag="img"``` and ```class="n3VNCb"``` and store it in a list.

In [6]:
img_html_class="n3VNCb"
#find Image
img=[im.get_attribute('src') for im in driver.find_elements_by_css_selector('img') if im.get_attribute("class")==img_html_class]
print("I found {} images with the same class".format(len(img)))

I found 3 images with the same class


This is because the html structure of the page is so that the selected image, the previous one and the next one are all stored with the same tag and class. How to find it? trial and fail. I tried to dowload each of them and found out the structure. This is sometimes needed when scraping with Selenium or other libraries.

In conclusion, the list is organized in this way ```[previous image,target image,next image]```, so that we should always select the ```[1]``` component to get our target image.

The same applies to the chevron right icon we will use to go to next, with the difference that now ```tag="a"``` and ```class="gvi3cf"```.

#### _Wrap everything in a loop and start dowloading_

In [None]:
nimages_to_download=600
imagefoldername="images"
speciefoldername="whale_shark"
imagename="hammerhead-shark"

img_html_class="n3VNCb"
btn_html_class="gvi3cf"

image_folder = os.path.join(os.getcwd(),imagefoldername,speciefoldername)

count=1
while True:    
    time.sleep(1)
    
    #find Image
    img=[im.get_attribute('src') for im in driver.find_elements_by_css_selector('img') if im.get_attribute("class")==img_html_class]

    #Save File
    try:
        src=img[1]
        reponse = requests.get(src,timeout=5)
        if reponse.status_code == 200:
            imageout=os.path.join(image_folder,src.split('/')[-1].split('?')[0])
            print(imageout)
            if ".jpg" in imageout or ".ppg" in imageout:
            #imagename+"_"+str(count).zfill(9)+".jpg".format(count)
                with open(imageout,"wb") as file:
                    file.write(reponse.content)
                    count+=1
    except:
        print("there was a problem in dowloading the image")
        traceback.print_exc()
        
    #Click Next        
    btn=[im for im in driver.find_elements_by_css_selector('a') if im.get_attribute("class")==btn_html_class]
    btn[1].click()
    
    if count==nimages_to_download:break