# Image Scraping
We want to get the links of some images so we can render them in a web app.

### Beautiful Soup

In [6]:
import urllib.request
from bs4 import BeautifulSoup

Pexels is an open source website for images.

In [20]:
from requests import get

url = 'https://www.pexels.com/search/asian%20girl/'

response = get(url)

In [24]:
soup = BeautifulSoup(response.text, 'html.parser')


Find all images on the page.

In [29]:
results = soup.find_all('img', class_='photo-item__img')


In [60]:
results[0]['data-big-src']

'https://images.pexels.com/photos/1386604/pexels-photo-1386604.jpeg?auto=compress&cs=tinysrgb&h=750&w=1260'

In [62]:
len(results)

30

In [63]:
urls=[]
for i in range(len(results)):
    urls.append(results[i]['data-big-src'])

In [64]:
urls

['https://images.pexels.com/photos/1386604/pexels-photo-1386604.jpeg?auto=compress&cs=tinysrgb&h=750&w=1260',
 'https://images.pexels.com/photos/1372134/pexels-photo-1372134.jpeg?auto=compress&cs=tinysrgb&h=750&w=1260',
 'https://images.pexels.com/photos/1321909/pexels-photo-1321909.jpeg?auto=compress&cs=tinysrgb&h=750&w=1260',
 'https://images.pexels.com/photos/1111304/pexels-photo-1111304.jpeg?auto=compress&cs=tinysrgb&h=750&w=1260',
 'https://images.pexels.com/photos/157015/pexels-photo-157015.jpeg?auto=compress&cs=tinysrgb&h=750&w=1260',
 'https://images.pexels.com/photos/1073567/pexels-photo-1073567.jpeg?auto=compress&cs=tinysrgb&h=750&w=1260',
 'https://images.pexels.com/photos/1386603/pexels-photo-1386603.jpeg?auto=compress&cs=tinysrgb&h=750&w=1260',
 'https://images.pexels.com/photos/262173/pexels-photo-262173.jpeg?auto=compress&cs=tinysrgb&h=750&w=1260',
 'https://images.pexels.com/photos/1319911/pexels-photo-1319911.jpeg?auto=compress&cs=tinysrgb&h=750&w=1260',
 'https://imag

In [65]:
import pandas as pd

Save results in a .csv file.

In [66]:
urls_df = pd.DataFrame(urls)

urls_df.to_csv('urls.csv')

### Selenium

We need to scroll the web page to get more results, we use `selenium` for this.

In [4]:
import time

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

browser = webdriver.Chrome()

In [5]:
browser.get("https://www.pexels.com/search/asian%20girl/")


Scroll down so we load more images.

In [7]:
elem = browser.find_element_by_tag_name("body")

no_of_pagedowns = 150

while no_of_pagedowns:
    elem.send_keys(Keys.PAGE_DOWN)
    time.sleep(0.2)
    no_of_pagedowns-=1

In [8]:
post_elems = browser.find_elements_by_class_name("photo-item__img")

In [21]:
post_elems[0].get_attribute('data-big-src')

'https://images.pexels.com/photos/1386604/pexels-photo-1386604.jpeg?auto=compress&cs=tinysrgb&h=750&w=1260'

Extract image url and place it in a dataframe.

In [23]:
urls=[]
for p in post_elems:
    urls.append(p.get_attribute('data-big-src'))

In [26]:
import pandas as pd 
urls_df = pd.DataFrame(urls)

urls_df.to_csv('urls2.csv')

# END