# Web Scraping Assignment 3 : Solutions

#### Q1. Write a python program which searches all the product under a particular product vertical from www.amazon.in. The product verticals to be searched will be taken as input from user. For e.g. If user input is ‘guitar’. Then search for guitars.

In [1]:
# Importing Libraries
import selenium
import pandas as pd
import time
from bs4 import BeautifulSoup

# Importing selenium webdriver 
from selenium import webdriver

# Importing required Exceptions which needs to handled
from selenium.common.exceptions import StaleElementReferenceException, NoSuchElementException

#Importing requests
import requests

# importing regex
import re

In [2]:
# Activating the chrome browser
driver = webdriver.Chrome('chromedriver.exe')

In [7]:
# Opening the homepage of Amazon.in
driver.get('https://www.amazon.in/')
# Asking the user to input the keywords he/she wants to search
user_inp = input('Enter the product you want to search : ')

Enter the product you want to search : guitar


In [8]:
search_bar = driver.find_element_by_id("twotabsearchtextbox")    # Locating searc_bar by id
search_bar.clear()                                               # clearing search_bar
search_bar.send_keys(user_inp)                                   # sending user input to search bar
search_button = driver.find_element_by_xpath('//div[@class="nav-search-submit nav-sprite"]/span/input')       # Locating search_button by xpath
search_button.click()                                                                # Clicking the button to start search

#### Q2. 2.	In the above question, now scrape the following details of each product listed in first 3 pages of your search results and save it in a dataframe and csv. In case if any product vertical has less than 3 pages in search results then scrape all the products available under that product vertical. Details to be scraped are: "Brand Name", "Name of the Product", "Rating", "No. of Ratings", "Price", "Return/Exchange", "Expected Delivery", "Availability", "Other Details" and “Product URL”. In case, if any of the details are missing for any of the product then replace it by “-“.

In [9]:
start_page = 0
end_page = 3
urls = []
for page in range(start_page,end_page+1):
    try:
        page_urls = driver.find_elements_by_xpath('//a[@class="a-link-normal s-no-outline"]')
        
        # appending all the urls on current page to urls list
        for url in page_urls:
            url = url.get_attribute('href')     
            if url[0:4]=='http':                
                urls.append(url)                
        print("Product urls of page {} has been scraped.".format(page+1))
        
        # Moving to next page
        nxt_button = driver.find_element_by_xpath('//li[@class="a-last"]/a')      
        if nxt_button.text == 'Next→':                                            
            nxt_button.click()                                                    
            time.sleep(5)                                                         
          
        elif driver.find_element_by_xpath('//li[@class="a-disabled a-last"]/a').text == 'Next→':    
            print("No new pages exist. Breaking the loop")  
            break
            
    except StaleElementReferenceException as e:            
        print("Stale Exception")
        next_page = nxt_button.get_attribute('href')        
        driver.get(next_page)                               

Product urls of page 1 has been scraped.
Product urls of page 2 has been scraped.
Product urls of page 3 has been scraped.
Product urls of page 4 has been scraped.


In [10]:
# creating dictionary for all columns

prod_dict = {}
prod_dict['Brand']=[]
prod_dict['Name']=[]
prod_dict['Rating']=[]
prod_dict['No. of ratings']=[]
prod_dict['Price']=[]
prod_dict['Return/Exchange']=[]
prod_dict['Expected Delivery']=[] 
prod_dict['Availability']=[]
prod_dict['Other Details']=[]
prod_dict['URL']=[]

In [11]:
for url in urls[:4]:
    driver.get(url)                                                        
    print("Scraping URL = ", url)
    #time.sleep(2)
    
    # Extracting Brand from xpath
    try:
        brand = driver.find_element_by_xpath('//a[@id="bylineInfo"]')      
        prod_dict['Brand'].append(brand.text)
    except NoSuchElementException:
        prod_dict['Brand'].append('-')
    
    # Extracting Name from xpath
    try:
        name = driver.find_element_by_xpath('//h1[@id="title"]/span')      
        prod_dict['Name'].append(name.text)
    except NoSuchElementException:
        prod_dict['Name'].append('-')
    
    # Extracting Ratings from xpath
    try:
        rating = driver.find_element_by_xpath('//span[@id="acrPopover"]')  
        prod_dict['Rating'].append(rating.get_attribute("title"))
    except NoSuchElementException:
        prod_dict['Rating'].append('-')
    
    # Extracting no. of Ratings from xpath
    try:
        n_rating = driver.find_element_by_xpath('//a[@id="acrCustomerReviewLink"]/span')     
        prod_dict['No. of ratings'].append(n_rating.text)
    except NoSuchElementException:
        prod_dict['No. of ratings'].append('-')
    
    # Extracting Price from xpath
    try:
        price = driver.find_element_by_xpath('//span[@id="priceblock_ourprice"]')            
        prod_dict['Price'].append(price.text)
    except NoSuchElementException:
        prod_dict['Price'].append('-')
        
    # Extracting Return/Exchange policy from xpath    
    try:                                                                    
        ret = driver.find_element_by_xpath('//div[@data-name="RETURNS_POLICY"]/span/div[2]/a')
        prod_dict['Return/Exchange'].append(ret.
                                            # Extracting Availability from xpathtext)
    except NoSuchElementException:
        prod_dict['Return/Exchange'].append('-')
        
    # Extracting Expected Delivery from xpath    
    try:
        delivry = driver.find_element_by_xpath('//div[@id="ddmDeliveryMessage"]/b')         
        prod_dict['Expected Delivery'].append(delivry.text)
    except NoSuchElementException:
        prod_dict['Expected Delivery'].append('-')
    
    # Extracting Availability from xpath
    try:
        avl = driver.find_element_by_xpath('//div[@id="availability"]/span')                
        prod_dict['Availability'].append(avl.text)
    except NoSuchElementException:
        prod_dict['Availability'].append('-')

    # Extracting Other Details from xpath
    try:                                                                                    
        dtls = driver.find_element_by_xpath('//ul[@class="a-unordered-list a-vertical a-spacing-mini"]')
        prod_dict['Other Details'].append('  ||  '.join(dtls.text.split('\n')))
    except NoSuchElementException:
        prod_dict['Other Details'].append('-')
    
    # Saving url                                        
    prod_dict['URL'].append(url)    
    time.sleep(2)

Scraping URL =  https://www.amazon.in/Musical-Instruments/b/ref=sxts_spkl_4_0_460c4d37-38ea-441a-8493-97d687634812?ie=UTF8&node=3677697031&pd_rd_w=NX2i1&pf_rd_p=460c4d37-38ea-441a-8493-97d687634812&pf_rd_r=Z8MVV6Q8GGMCR02JT2XC&pd_rd_r=f741f07a-d42b-4794-9640-de9501cde6dd&pd_rd_wg=0WJBF&qid=1621076267
Scraping URL =  https://www.amazon.in/Juarez-Acoustic-Guitar-Cutaway-Strings/dp/B076QHZ4HZ/ref=sr_1_1?dchild=1&keywords=guitar&qid=1621076267&sr=8-1
Scraping URL =  https://www.amazon.in/Clapton-Natural-Dreadnought-Cutaway-Acoustic/dp/B093ZQ12SD/ref=sr_1_2?dchild=1&keywords=guitar&qid=1621076267&sr=8-2
Scraping URL =  https://www.amazon.in/Acoustic-Cutaway-RDS-Strings-Sunburst/dp/B076QGY91P/ref=sr_1_3?dchild=1&keywords=guitar&qid=1621076267&smid=A14CZOWI0VEHLG&sr=8-3


In [12]:
# creating pandas dataset
prod_df = pd.DataFrame.from_dict(prod_dict)
prod_df

Unnamed: 0,Brand,Name,Rating,No. of ratings,Price,Return/Exchange,Expected Delivery,Availability,Other Details,URL
0,-,-,-,-,-,-,-,-,-,https://www.amazon.in/Musical-Instruments/b/re...
1,Visit the JUAREZ Store,"Juarez Acoustic Guitar, 38 Inch Curved Body Cu...",3.9 out of 5 stars,783 ratings,"₹ 2,499.00",7 Days Replacement,-,In stock.,"Jumbo Design, 38 Inch Acoustic Steel String Gu...",https://www.amazon.in/Juarez-Acoustic-Guitar-C...
2,Brand: Clapton,Clapton Natural Dreadnought Cutaway Acoustic G...,-,-,"₹ 3,980.00",7 Days Replacement,-,In stock.,41 inch Jumbo Sized Cut-A-Way Acoustic Guitar ...,https://www.amazon.in/Clapton-Natural-Dreadnou...
3,Visit the JUAREZ Store,"Juarez Acoustic Guitar, 38 Inch Cutaway with P...",3.9 out of 5 stars,"2,138 ratings",-,7 Days Replacement,-,In stock.,"Red Sunburst Glossy Finish, Number of Frets: 1...",https://www.amazon.in/Acoustic-Cutaway-RDS-Str...


In [None]:
#saving data to csv
prod_df.to_csv('Amazon_{}.csv'.format(user_inp))

#### Q3. Write a python program to access the search bar and search button on images.google.com and scrape 100 images each for keywords ‘fruits’, ‘cars’ and ‘Machine Learning’.

In [14]:
# Activating the chrome browser
driver = webdriver.Chrome('chromedriver.exe')
# Opening images.google.com
driver.get('https://images.google.com/')

In [15]:
search_bar = driver.find_element_by_xpath('//*[@id="sbtc"]/div/div[2]/input')    
search_bar.send_keys("fruits")       
search_button = driver.find_element_by_xpath('//*[@id="sbtc"]/button')    
search_button.click()  

In [16]:
print("start scrolling to generate more images on the page...")

for _ in range(500):
    driver.execute_script("window.scrollBy(0,10000)")

start scrolling to generate more images on the page...


In [17]:
 images = driver.find_elements_by_xpath('//img[@class="rg_i Q4LuWd"]')

In [18]:
img_urls = []
img_data = []
for image in images:
    source= image.get_attribute('src')
    if source is not None:
        if(source[0:4] == 'http'):
            img_urls.append(source)
len(img_urls)

171

In [20]:
for i in range(len(img_urls)):
    if i >= 100:
        break
    print("Downloading {0} of {1} images" .format(i, 100))
    response= requests.get(img_urls[i])
    file = open("E:/python"+str(i)+".jpg", "wb")
    file.write(response.content)

Downloading 0 of 100 images
Downloading 1 of 100 images
Downloading 2 of 100 images
Downloading 3 of 100 images
Downloading 4 of 100 images
Downloading 5 of 100 images
Downloading 6 of 100 images
Downloading 7 of 100 images
Downloading 8 of 100 images
Downloading 9 of 100 images
Downloading 10 of 100 images
Downloading 11 of 100 images
Downloading 12 of 100 images
Downloading 13 of 100 images
Downloading 14 of 100 images
Downloading 15 of 100 images
Downloading 16 of 100 images
Downloading 17 of 100 images
Downloading 18 of 100 images
Downloading 19 of 100 images
Downloading 20 of 100 images
Downloading 21 of 100 images
Downloading 22 of 100 images
Downloading 23 of 100 images
Downloading 24 of 100 images
Downloading 25 of 100 images
Downloading 26 of 100 images
Downloading 27 of 100 images
Downloading 28 of 100 images
Downloading 29 of 100 images
Downloading 30 of 100 images
Downloading 31 of 100 images
Downloading 32 of 100 images
Downloading 33 of 100 images
Downloading 34 of 100 im

#### Q4. Write a python program to search for a smartphone(e.g.: Oneplus Nord, pixel 4A, etc.) on www.flipkart.com and scrape following details for all the search results displayed on 1st page. Details to be scraped: “Brand Name”, “Smartphone name”, “Colour”, “RAM”, “Storage(ROM)”, “Primary Camera”, “Secondary Camera”, “Display Size”, “Display Resolution”, “Processor”, “Processor Cores”, “Battery Capacity”, “Price”, “Product URL”. Incase if any of the details is missing then replace it by “- “. Save your results in a dataframe and CSV.

In [27]:
# Asking for user input
prod = input(" Enter the name of the mobile phone you want to search : ")
driver.get('https://www.flipkart.com/')
time.sleep(3)
try:
    login_X_button = driver.find_element_by_xpath('//button[@class="_2KpZ6l _2doB4z"]') 
    login_X_button.click()
except NoSuchElementException : 
    print("No Login page")
search_bar = driver.find_element_by_xpath('//*[@id="container"]/div/div[1]/div[1]/div[2]/div[2]/form/div/div/input')
search_bar.clear()               
search_bar.send_keys(prod)       
search_button = driver.find_element_by_xpath('//button[@class="L0Z3Pu"]')  
search_button.click()   

 Enter the name of the mobile phone you want to search : samsung
No Login page


In [28]:
# Fetching urls of phones coming on 1st page
flip_urls = []
urls = driver.find_elements_by_xpath('//a[@class="_1fQZEK"]')
for url in urls:
    flip_urls.append(url.get_attribute("href"))

In [29]:
len(flip_urls)

24

In [30]:
flip_dict = {}
flip_dict["Brand"] = []
flip_dict["Smartphone"] = []
flip_dict["Colour"] = []
flip_dict["RAM"] = []
flip_dict["Storage(ROM)"] = []
flip_dict["Primary Camera"] = []
flip_dict["Secondary Camera"] = []
flip_dict["Display Size"] = []
flip_dict["Display Resolution"] = []
flip_dict["Processor"] = []
flip_dict["Processor Cores"] = []
flip_dict["Battery Capacity"] = []
flip_dict["Battery Type"] = []
flip_dict["Price"] = []
flip_dict["URL"] = []

In [31]:
# Scraping data from each url
for url in flip_urls:
    driver.get(url)                                                       
    print("Scraping URL = ", url)
    flip_dict['URL'].append(url)                                                          
    time.sleep(2)
    
    try:
        read_more = driver.find_element_by_xpath('//button[@class="_2KpZ6l _1FH0tX"]')  
        read_more.click()
    except NoSuchElementException:
        print("Exception Occured. Moving to next page")
    
    try:
        brand = driver.find_element_by_xpath('//span[@class="B_NuCI"]')     
        flip_dict["Brand"].append(brand.text.split()[0])
    except NoSuchElementException:
        flip_dict['Brand'].append('-')
        
    try:
        price = driver.find_element_by_xpath('//div[@class="_30jeq3 _16Jk6d"]')     
        flip_dict['Price'].append(price.text)
    except NoSuchElementException:
        flip_dict['Price'].append('-')
        
    try:
        name = driver.find_element_by_xpath('//div[@class="_3k-BhJ"][1]/table/tbody/tr[3]/td[2]/ul/li')      
        flip_dict['Smartphone'].append(name.text)
    except NoSuchElementException:
        flip_dict['Smartphone'].append('-')
    
    try:
        color = driver.find_element_by_xpath('//div[@class="_3k-BhJ"][1]/table/tbody/tr[4]/td[2]/ul/li')
        flip_dict['Colour'].append(color.text)
    except NoSuchElementException:
        flip_dict['Colour'].append('-')
    
    try:
        disp_chk = driver.find_element_by_xpath('//div[@class="_3k-BhJ"][2]/div')
        if disp_chk.text != "Display Features" : raise NoSuchElementException
        disp_size = driver.find_element_by_xpath('//div[@class="_3k-BhJ"][2]/table[1]/tbody/tr[1]/td[2]/ul/li')  
        flip_dict['Display Size'].append(disp_size.text)
    except NoSuchElementException:
        flip_dict['Display Size'].append('-')
    
    try:
        disp_chk = driver.find_element_by_xpath('//div[@class="_3k-BhJ"][2]/div')
        if disp_chk.text != "Display Features" : raise NoSuchElementException
        disp_res = driver.find_element_by_xpath('//div[@class="_3k-BhJ"][2]/table[1]/tbody/tr[2]/td[2]/ul/li') 
        flip_dict['Display Resolution'].append(disp_res.text)
    except NoSuchElementException:
        flip_dict['Display Resolution'].append('-')
    
    try:
        pro_chk = driver.find_element_by_xpath('//div[@class="_3k-BhJ"][3]/table[1]/tbody/tr[2]/td[1]')
        if pro_chk.text != "Processor Type" : raise NoSuchElementException
        processor = driver.find_element_by_xpath('//div[@class="_3k-BhJ"][3]/table[1]/tbody/tr[2]/td[2]/ul/li')   
        flip_dict['Processor'].append(processor.text)
    except NoSuchElementException:
        flip_dict['Processor'].append('-')
    
    try:                                                                                
        core_chk = driver.find_element_by_xpath('//div[@class="_3k-BhJ"][3]/table[1]/tbody/tr[3]/td[1]')
        if core_chk.text != "Processor Core" :
            core_chk = driver.find_element_by_xpath('//div[@class="_3k-BhJ"][3]/table[1]/tbody/tr[2]/td[1]')
            if core_chk.text != "Processor Core" : 
                raise NoSuchElementException
            else :
                cores = driver.find_element_by_xpath('//div[@class="_3k-BhJ"][3]/table[1]/tbody/tr[2]/td[2]/ul/li')
        else :
            cores = driver.find_element_by_xpath('//div[@class="_3k-BhJ"][3]/table[1]/tbody/tr[3]/td[2]/ul/li')
        flip_dict['Processor Cores'].append(cores.text)
    except NoSuchElementException:
        flip_dict['Processor Cores'].append('-')
    
    try:
        rom = driver.find_element_by_xpath('//div[@class="_3k-BhJ"][4]/table[1]/tbody/tr[1]/td[2]/ul/li')         
        flip_dict['Storage(ROM)'].append(rom.text)
    except NoSuchElementException:
        flip_dict['Storage(ROM)'].append('-')
    
    try:
        ram = driver.find_element_by_xpath('//div[@class="_3k-BhJ"][4]/table[1]/tbody/tr[2]/td[2]/ul/li')                
        flip_dict['RAM'].append(ram.text)
    except NoSuchElementException:
        flip_dict['RAM'].append('-')
    
    try:                                                                                    
        pri_cam = driver.find_element_by_xpath('//div[@class="_3k-BhJ"][5]/table[1]/tbody/tr[2]/td[2]/ul/li')
        flip_dict['Primary Camera'].append(pri_cam.text)
    except NoSuchElementException:
        flip_dict['Primary Camera'].append('-')
    
    try:                                                                                
        cam_chk = driver.find_element_by_xpath('//div[@class="_3k-BhJ"][5]/table[1]/tbody/tr[6]/td[1]')
        if cam_chk != "Secondary Camera" : 
            if driver.find_element_by_xpath('//div[@class="_3k-BhJ"][5]/table[1]/tbody/tr[5]/td[1]').text == "Secondary Camera":
                sec_cam = driver.find_element_by_xpath('//div[@class="_3k-BhJ"][5]/table[1]/tbody/tr[5]/td[2]/ul/li')
            else :
                raise NoSuchElementException
        else :
            sec_cam = driver.find_element_by_xpath('//div[@class="_3k-BhJ"][5]/table[1]/tbody/tr[6]/td[2]/ul/li')
        flip_dict['Secondary Camera'].append(sec_cam.text)
    except NoSuchElementException:
        flip_dict['Secondary Camera'].append('-')
        
    try:
        if driver.find_element_by_xpath('//div[@class="_3k-BhJ"][10]/div').text != "Battery & Power Features" :
            if driver.find_element_by_xpath('//div[@class="_3k-BhJ"][9]/div').text == "Battery & Power Features" :
                bat_chk = driver.find_element_by_xpath('//div[@class="_3k-BhJ"][9]/table/tbody/tr/td[1]')
                if bat_chk.text != "Battery Capacity" : raise NoSuchElementException
                bat_cap = driver.find_element_by_xpath('//div[@class="_3k-BhJ"][9]/table/tbody/tr/td[2]/ul/li')                
            elif driver.find_element_by_xpath('//div[@class="_3k-BhJ"][8]/div').text == "Battery & Power Features" :
                bat_chk = driver.find_element_by_xpath('//div[@class="_3k-BhJ"][8]/table/tbody/tr/td[1]')
                if bat_chk.text != "Battery Capacity" : raise NoSuchElementException
                bat_cap = driver.find_element_by_xpath('//div[@class="_3k-BhJ"][8]/table/tbody/tr/td[2]/ul/li')
            else:
                raise NoSuchElementException
        else :
            bat_chk = driver.find_element_by_xpath('//div[@class="_3k-BhJ"][10]/table/tbody/tr/td[1]')
            if bat_chk.text != "Battery Capacity" : raise NoSuchElementException
            bat_cap = driver.find_element_by_xpath('//div[@class="_3k-BhJ"][10]/table/tbody/tr/td[2]/ul/li')            
        flip_dict['Battery Capacity'].append(bat_cap.text)
    except NoSuchElementException:
        flip_dict['Battery Capacity'].append('-')
    
    try:
        if driver.find_element_by_xpath('//div[@class="_3k-BhJ"][10]/div').text != "Battery & Power Features" :
            if driver.find_element_by_xpath('//div[@class="_3k-BhJ"][9]/div').text == "Battery & Power Features" :
                bat_chk = driver.find_element_by_xpath('//div[@class="_3k-BhJ"][9]/table/tbody/tr[2]/td[1]')
                if bat_chk.text != "Battery Type" : raise NoSuchElementException
                bat_typ = driver.find_element_by_xpath('//div[@class="_3k-BhJ"][9]/table/tbody/tr[2]/td[2]/ul/li')
            elif driver.find_element_by_xpath('//div[@class="_3k-BhJ"][8]/div').text == "Battery & Power Features" :
                bat_chk = driver.find_element_by_xpath('//div[@class="_3k-BhJ"][8]/table/tbody/tr[2]/td[1]')
                if bat_chk.text != "Battery Type" : raise NoSuchElementException
                bat_typ = driver.find_element_by_xpath('//div[@class="_3k-BhJ"][8]/table/tbody/tr[2]/td[2]/ul/li')
            else:
                raise NoSuchElementException
        else :
            bat_chk = driver.find_element_by_xpath('//div[@class="_3k-BhJ"][10]/table/tbody/tr[2]/td[1]')
            if bat_chk.text != "Battery Type" : raise NoSuchElementException
            bat_typ = driver.find_element_by_xpath('//div[@class="_3k-BhJ"][10]/table/tbody/tr[2]/td[2]/ul/li') 
        flip_dict['Battery Type'].append(bat_typ.text)
    except NoSuchElementException:
        flip_dict['Battery Type'].append('-')
    
                                                               

Scraping URL =  https://www.flipkart.com/samsung-guru-1200/p/itmeb6w98vmhffjy?pid=MOBDA2CFCQ6CFHTZ&lid=LSTMOBDA2CFCQ6CFHTZKIMI91&marketplace=FLIPKART&q=samsung&store=tyy%2F4io&srno=s_1_1&otracker=search&otracker1=search&fm=SEARCH&iid=ab53906d-3806-4faf-94a9-4b3d13e9cc54.MOBDA2CFCQ6CFHTZ.SEARCH&ppt=hp&ppn=homepage&ssid=mk5zrklzr40000001621077750629&qH=fe546279a62683de
Scraping URL =  https://www.flipkart.com/samsung-galaxy-f41-fusion-blue-128-gb/p/itm4769d0667cdf9?pid=MOBFV5PWG5MGD4CF&lid=LSTMOBFV5PWG5MGD4CFZ8YQJZ&marketplace=FLIPKART&q=samsung&store=tyy%2F4io&srno=s_1_2&otracker=search&otracker1=search&fm=SEARCH&iid=ab53906d-3806-4faf-94a9-4b3d13e9cc54.MOBFV5PWG5MGD4CF.SEARCH&ppt=hp&ppn=homepage&ssid=mk5zrklzr40000001621077750629&qH=fe546279a62683de
Scraping URL =  https://www.flipkart.com/samsung-galaxy-f41-fusion-green-128-gb/p/itma41850a2f9e19?pid=MOBFV5PWEX7WJS7R&lid=LSTMOBFV5PWEX7WJS7RBAY0T5&marketplace=FLIPKART&q=samsung&store=tyy%2F4io&srno=s_1_3&otracker=search&otracker1=search

Scraping URL =  https://www.flipkart.com/samsung-galaxy-f62-laser-grey-128-gb/p/itmaf3ed4dbc2ee5?pid=MOBFZWSUZGKGHMHD&lid=LSTMOBFZWSUZGKGHMHD7IV39I&marketplace=FLIPKART&q=samsung&store=tyy%2F4io&srno=s_1_23&otracker=search&otracker1=search&fm=SEARCH&iid=ab53906d-3806-4faf-94a9-4b3d13e9cc54.MOBFZWSUZGKGHMHD.SEARCH&ppt=hp&ppn=homepage&ssid=mk5zrklzr40000001621077750629&qH=fe546279a62683de
Scraping URL =  https://www.flipkart.com/samsung-galaxy-f62-laser-blue-128-gb/p/itmf82cc5e797312?pid=MOBFZWSUHXGQMUPH&lid=LSTMOBFZWSUHXGQMUPHYZDVLJ&marketplace=FLIPKART&q=samsung&store=tyy%2F4io&srno=s_1_24&otracker=search&otracker1=search&fm=SEARCH&iid=ab53906d-3806-4faf-94a9-4b3d13e9cc54.MOBFZWSUHXGQMUPH.SEARCH&ppt=hp&ppn=homepage&ssid=mk5zrklzr40000001621077750629&qH=fe546279a62683de


In [32]:
print(len(flip_dict["Brand"]), len(flip_dict["Smartphone"]), len(flip_dict["Processor"]), len(flip_dict["Price"]), len(flip_dict['URL']))

24 24 24 24 24


In [33]:
# creating pandas dataset
flip_df = pd.DataFrame.from_dict(flip_dict)
flip_df

Unnamed: 0,Brand,Smartphone,Colour,RAM,Storage(ROM),Primary Camera,Secondary Camera,Display Size,Display Resolution,Processor,Processor Cores,Battery Capacity,Battery Type,Price,URL
0,SAMSUNG,Guru 1200,Black,1000,,No,-,3.81 cm (1.5 inch),128 x 128 Pixels,-,-,800 mAh,-,"₹1,150",https://www.flipkart.com/samsung-guru-1200/p/i...
1,SAMSUNG,Galaxy F41,Fusion Blue,6 GB,128 GB,64MP + 8MP + 5MP,32MP Front Camera,16.26 cm (6.4 inch),2340 x 1080 Pixels,Exynos 9611,Octa Core,6000 mAh,-,"₹14,499",https://www.flipkart.com/samsung-galaxy-f41-fu...
2,SAMSUNG,Galaxy F41,Fusion Green,6 GB,128 GB,64MP + 8MP + 5MP,32MP Front Camera,16.26 cm (6.4 inch),2340 x 1080 Pixels,Exynos 9611,Octa Core,6000 mAh,-,"₹14,499",https://www.flipkart.com/samsung-galaxy-f41-fu...
3,SAMSUNG,Galaxy F41,Fusion Black,6 GB,128 GB,64MP + 8MP + 5MP,32MP Front Camera,16.26 cm (6.4 inch),2340 x 1080 Pixels,Exynos 9611,Octa Core,6000 mAh,-,"₹14,499",https://www.flipkart.com/samsung-galaxy-f41-fu...
4,SAMSUNG,Galaxy F12,Celestial Black,4 GB,64 GB,48MP + 5MP + 2MP + 2MP,8MP Front Camera,16.55 cm (6.515 inch),1600 x 720 Pixels,Exynos 850,Octa Core,6000 mAh,-,"₹10,999",https://www.flipkart.com/samsung-galaxy-f12-ce...
5,SAMSUNG,Galaxy F12,Sky Blue,4 GB,64 GB,48MP + 5MP + 2MP + 2MP,8MP Front Camera,16.55 cm (6.515 inch),1600 x 720 Pixels,Exynos 850,Octa Core,6000 mAh,-,"₹10,999",https://www.flipkart.com/samsung-galaxy-f12-sk...
6,SAMSUNG,Galaxy F12,Sea Green,4 GB,64 GB,48MP + 5MP + 2MP + 2MP,8MP Front Camera,16.55 cm (6.515 inch),1600 x 720 Pixels,Exynos 850,Octa Core,6000 mAh,-,"₹10,999",https://www.flipkart.com/samsung-galaxy-f12-se...
7,SAMSUNG,Galaxy F41,Fusion Green,6 GB,64 GB,64MP + 8MP + 5MP,32MP Front Camera,16.26 cm (6.4 inch),2340 x 1080 Pixels,Exynos 9611,Octa Core,6000 mAh,-,"₹13,999",https://www.flipkart.com/samsung-galaxy-f41-fu...
8,SAMSUNG,Guru FM Plus SM-B110E/D,Black,"1000, Yes",,Yes,-,3.81 cm (1.5 inch),128 x 128,-,-,800 mAh,Li-Ion,"₹1,449",https://www.flipkart.com/samsung-guru-fm-plus-...
9,SAMSUNG,Guru Music 2,Blue,Yes,,-,-,5.08 cm (2 inch),128 x 160 Pixels,-,-,800 mAh,Li-on,"₹1,710",https://www.flipkart.com/samsung-guru-music-2/...


#### Q5. 5.	Write a program to scrap geospatial coordinates (latitude, longitude) of a city searched on google maps.

In [34]:
# opening google maps
driver.get("https://www.google.co.in/maps")
time.sleep(3)

city = input('Enter City Name : ')                                         
search = driver.find_element_by_id("searchboxinput")                       
search.clear()                                                             
time.sleep(2)
search.send_keys(city)                                                     
button = driver.find_element_by_id("searchbox-searchbutton")               
button.click()                                                             
time.sleep(3)

try:
    url_string = driver.current_url
    print("URL Extracted: ", url_string)
    lat_lng = re.findall(r'@(.*)data',url_string)
    if len(lat_lng):
        lat_lng_list = lat_lng[0].split(",")
        if len(lat_lng_list)>=2:
            lat = lat_lng_list[0]
            lng = lat_lng_list[1]
        print("Latitude = {}, Longitude = {}".format(lat, lng))

except Exception as e:
        print("Error: ", str(e))

Enter City Name : delhi
URL Extracted:  https://www.google.co.in/maps/place/Delhi/@28.6466772,76.8130591,10z/data=!3m1!4b1!4m5!3m4!1s0x390cfd5b347eb62d:0x37205b715389640!8m2!3d28.7040592!4d77.1024902
Latitude = 28.6466772, Longitude = 76.8130591


#### Q6.	Write a program to scrap details of all the funding deals for second quarter (i.e. July 20 – September 20) from trak.in.

In [35]:
driver.get('https://trak.in/')

In [36]:
button = driver.find_element_by_xpath('//li[@id="menu-item-51510"]/a').get_attribute('href')
driver.get(button)

In [37]:
fund_dict = {}
fund_dict['Date'] = []
fund_dict['Startup Name'] = []
fund_dict['Industry/Vertical'] = []
fund_dict['Sub-Vertical'] = []
fund_dict['Location'] = []
fund_dict['Investor'] = []
fund_dict['Investment Type'] = []
fund_dict['Amount(in USD)'] = []

In [38]:
for i in range(48,51):
    driver.find_element_by_xpath('//div[@id="tablepress-{}_wrapper"]/div/label/select/option[4]'.format(i)).click()

    # Date
    dt = driver.find_elements_by_xpath('//table[@id="tablepress-{}"]/tbody/tr/td[2]'.format(i))
    for d in dt:
        fund_dict['Date'].append(d.text)

    # Startup Name
    sn = driver.find_elements_by_xpath('//table[@id="tablepress-{}"]/tbody/tr/td[3]'.format(i))
    for n in sn:
        fund_dict['Startup Name'].append(n.text)
    
    # Industry/Vertical
    ind = driver.find_elements_by_xpath('//table[@id="tablepress-{}"]/tbody/tr/td[4]'.format(i))
    for n in ind:
        fund_dict['Industry/Vertical'].append(n.text)
    
    # Sub-Vertical
    sv = driver.find_elements_by_xpath('//table[@id="tablepress-{}"]/tbody/tr/td[5]'.format(i))
    for s in sv:
        fund_dict['Sub-Vertical'].append(s.text)

    # Location
    loc = driver.find_elements_by_xpath('//table[@id="tablepress-{}"]/tbody/tr/td[6]'.format(i))
    for l in loc:
        fund_dict['Location'].append(l.text)
    
    # Investor
    inv = driver.find_elements_by_xpath('//table[@id="tablepress-{}"]/tbody/tr/td[7]'.format(i))
    for n in inv:
        fund_dict['Investor'].append(n.text)
    
    # Investment Type
    invt = driver.find_elements_by_xpath('//table[@id="tablepress-{}"]/tbody/tr/td[8]'.format(i))
    for n in invt:
        fund_dict['Investment Type'].append(n.text)
    
    # Amount
    amt = driver.find_elements_by_xpath('//table[@id="tablepress-{}"]/tbody/tr/td[9]'.format(i))
    for a in amt:
        fund_dict['Amount(in USD)'].append(a.text)
    
fund_df = pd.DataFrame(fund_dict)
fund_df

Unnamed: 0,Date,Startup Name,Industry/Vertical,Sub-Vertical,Location,Investor,Investment Type,Amount(in USD)
0,15/07/2020,Flipkart,E-commerce,E-commerce,Bangalore,Walmart Inc,M&A,1200000000
1,16/07/2020,Vedantu,EduTech,Online Tutoring,Bangalore,Coatue Management,Series D,100000000
2,16/07/2020,Crio,EduTech,Learning Platform for Developers,Bangalore,021 Capital,pre-Series A,934160
3,14/07/2020,goDutch,FinTech,Group Payments,Mumbai,"Matrix India,Y Combinator, Global Founders Cap...",Seed,1700000
4,13/07/2020,Mystifly,Airfare Marketplace,"Ticketing, Airline Retailing, and Post-Ticketi...",Singapore and Bangalore,Recruit Co. Ltd.,pre-Series B,3300000
5,09/07/2020,JetSynthesys,Gaming and Entertainment,Gaming and Entertainment,Pune,Adar Poonawalla and Kris Gopalakrishnan.,Venture-Series Unknown,400000
6,10/07/2020,gigIndia,Marketplace,"Crowd Sourcing, Freelance",Pune,Incubate Fund India and Beyond Next Ventures,pre-Series A,974200
7,15/07/2020,PumPumPum,Automotive Rental,Used Car-leasing platform,Gurgaon,Early Adapters Syndicate,Seed,292800
8,14/07/2020,FLYX,OTT Player,Streaming Social Network,New York and Delhi,"Raj Mishra, founder of AIT Global Inc",pre-Seed,200000
9,13/07/2020,Open Appliances Pvt. Ltd.,Information Technology,Internet-of-Things Security Solutions,Bangalore,Unicorn India Ventures,Venture-Series Unknown,500000


In [39]:
fund_df.to_csv("Indian Startups_Q2_2020.csv")

#### Q7. Write a program to scrap all the available details of top 10 gaming laptops from digit.in.

In [48]:
driver.get('https://www.digit.in/')

In [50]:
#clickng on top 10 option 
top_10=driver.find_element_by_xpath("//div[@class='menu']/ul/li[3]/a")
top_10.click()

#best gaming laptops link
best_gaming=driver.find_element_by_xpath("//ul[@class='list-unstyled sidebar-list']/li[9]/a")
driver.get(best_gaming.get_attribute('href'))

#intialising lists
name = []
price = []
OS = []
display = []
processor = []
HDD = []
RAM = []
weight = []
dimension = []
GPU = []

#names
names=driver.find_elements_by_xpath("//div[@class='right-container']/div/a/h3")
for i in names:
    name.append(i.text)
    
#os
os=driver.find_elements_by_xpath("//div[@class='product-detail']/div/ul/li[1]/div/div")
for i in os:
    OS.append(i.text)
    
#display
displays=driver.find_elements_by_xpath("//div[@class='product-detail']/div/ul/li[2]/div/div")
for i in displays:
    display.append(i.text)
    
#processor
processors=driver.find_elements_by_xpath("//div[@class='product-detail']/div/ul/li[3]/div/div")
for i in processors:
    processor.append(i.text)
processor

#memory
memories=driver.find_elements_by_xpath("//div[@class='Spcs-details'][1]/table/tbody/tr[6]/td[1]")#list of specificaion name
memories_spec=driver.find_elements_by_xpath("//div[@class='Spcs-details'][1]/table/tbody/tr[6]/td[3]")#values of specifiations 
for i in range(len(memories)):
        if memories[i].text=='Memory':
            HDD.append(memories_spec[i].text.split('/')[0])
            RAM.append(memories_spec[i].text.split('/')[1])
        else:
            HDD.append('No details available')#append no details as value for memory is missing in some of the laptops
            RAM.append('No details available')#append no details as value for memory is missing in some of the laptops

#weight
weights=driver.find_elements_by_xpath("//div[@class='Spcs-details']/table/tbody/tr/td[1]")#list of specificaion name
weight_spec=driver.find_elements_by_xpath("//div[@class='Spcs-details']/table/tbody/tr/td[3]")#values of specifiations
for i in range(len(weights)):
        if weights[i].text=='Weight':
            weight.append(weight_spec[i].text)
        
#dimension
dimension=[]
dims=driver.find_elements_by_xpath("//div[@class='Spcs-details']/table/tbody/tr/td[1]")#list of specificaion name
dims_spec=driver.find_elements_by_xpath("//div[@class='Spcs-details']/table/tbody/tr/td[3]")#values of specifiations
for i in range(len(dims)):
        if dims[i].text=='Dimension':
            dimension.append(dims_spec[i].text)

#graphical processor
GPUs=driver.find_elements_by_xpath("//div[@class='Spcs-details']/table/tbody/tr/td[1]")#list of specificaion name
GPUs_spec=driver.find_elements_by_xpath("//div[@class='Spcs-details']/table/tbody/tr/td[3]")#values of specifiations
for i in range(len(GPUs)):
        if GPUs[i].text=='Graphics Processor':
            GPU.append(GPUs_spec[i].text)
        
full_specs=[]
urls=driver.find_elements_by_xpath("//div[@class='full-specs']/span")#getting the url of full specs links
for i in urls:
    if i.get_attribute('data-href'):
        full_specs.append(i.get_attribute('data-href'))
    
for i in full_specs:#iterating throug every laptops full specs' page
    driver.get(i)
    try:
        prices=driver.find_element_by_xpath("//div[@class='Block-price']/b")
        price.append(prices.text)
    except NoSuchElementException:#exception handling for no price details
        price.append("No details available")
        
df=pd.DataFrame({"Name":name,
                "Price":price,
                "OS":OS,
                "Display":display,
                "HDD":HDD,
                 "RAM":RAM,
                "processor":processor,
                "weight":weight,
                "Dimension":dimension,
                "Graphical processor":GPU})
df

Unnamed: 0,Name,Price,OS,Display,HDD,RAM,processor,weight,Dimension,Graphical processor
0,LENOVO IDEAPAD S145,32490,WINDOWS 10 HOME,"15.6"" (1920 X 1080)",1TB HDD,4GB DDR4,7TH GENERATION CORE INTEL I3-7020U | 2.3 GHZ,1.85,362 x 251 x 20,INTEGRATED GFX
1,HP 14S,34490,WINDOWS 10 HOME,"14"" (1366 X 768)",256 GB SSD,4GB DDR4,CORE I3 10TH GEN 1005G1 | 1.2 GHZ,1.47,324 x 225.9 x 19.9,Intel Integrated UHD
2,HP 245 G7,29090,WINDOWS-10,"14"" (1366 X 768)",1 TB HDD,4 GBGB DDR4,AMD RYZEN 3-3300U | 2.1 GHZ,1.52,335 x 234 x 19.9,AMD Radeon Vega 6
3,HP 15 DB1069AU,36900,WINDOWS 10,"15.6"" (1366 X 768)",1 TB HDD,4 GBGB DDR4,3RD GEN RYZEN 3 3200U | 2.6 GHZ,2.04,245 x 361 x 18,AMD Radeon Vega 3
4,ASUS VIVOBOOK X409JA-EK011T,33986,WINDOWS 10,"14"" (1920 X 1080)",1 TB HDD,4 GBGB DDR4,10TH GEN INTEL CORE I3-1005G1 | 1.2 GHZ,1.6,216 x 325 x 23,Intel Integrated UHD Graphics


In [51]:
df.to_csv("Gaming laptops_digit.csv")