## Web Scraping across multiple pages of the BestBuy search request to extract product names and prices 

This is a short overview how information can be extracted from multiple pages of the search request <br>
The example is a request to show Samsung TV's on the BestBuy website

**Import libraries**

In [1]:
from bs4 import BeautifulSoup
import requests
import time

**Identify the url to extract from**

In [3]:
url = "https://www.bestbuy.ca/en-CA/Search/SearchResults.aspx?type=product&filter=brandName%253aSAMSUNG%253bcategory%253aTV%20%26%20Home%20Theatre&fromBrandStore=samsung"
#header is added so website doe not assume that the request is from a bot
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
page=1 #the first page of the search

**A loop that extract all required info**

In [5]:
names=[]
prices=[]

while True:
    r = requests.get(url, headers=headers)
    html_doc = r.content
    
    soup = BeautifulSoup(html_doc, 'html.parser')
    
    #Extract titles
    elements = soup.findAll('h4', attrs={'class': 'prod-title'})
    for el in elements:
        names.append(el.text.strip("\n"))
    
    #Extract prices
    elements_2 = soup.findAll('span', attrs={'class':'amount'})
    for el in elements_2:
        prices.append(el.text)
    
    #Extract number of pages
    pages = []
    elements_3 = soup.findAll('ul', attrs={'class':"pagination-control inline-list"})
    for el in elements_3:
        children = el.findChildren("li" , recursive=False)
        for child in children:
            children_2 = child.findChildren("a" , recursive=False)
            for child in children_2:
                pages.append(child['data-page'])
    max_page=int(max(pages))+1
    print("Page: ",page)
    page+=1
    
    #url for the next page
    url = "https://www.bestbuy.ca/en-CA/Search/SearchResults.aspx?type=product&filter=brandName%253aSAMSUNG%253bcategory%253aTV%20%26%20Home%20Theatre&fromBrandStore=samsung&page="+str(page)
    time.sleep(2) 
    
    if page==max_page:
        break


Page:  1
Page:  2
Page:  3
Page:  4
Page:  5
Page:  6
Page:  7
Page:  8
Page:  9


**Add extracted data to the dataframe**

In [6]:
import pandas as pd
df = pd.DataFrame(
    {'name': names,
     'price': prices
    })
df.head()

Unnamed: 0,name,price
0,"Samsung 58"" 4K UHD HDR LED Tizen Smart TV (UN5...",$799.99
1,"Samsung NU7100 50"" 4K UHD HDR LED Tizen Smart ...",$649.99
2,"Samsung NU7100 43"" 4K UHD HDR LED Tizen Smart ...",$499.99
3,"Samsung 55"" 4K UHD HDR LED Tizen Smart TV (UN5...",$699.99
4,"Samsung 55"" 4K UHD HDR LED Tizen Smart TV (UN5...",$899.99
