## Web Scrapping

**Web scraping is an automated method used to extract large amounts of data from websites.**

The data found on websites are often unstructured.Web scraping is a technique to collect that unstructured data and store it in structured form.


When any web scraping code is run the request is sent to the URL that you have mentioned. The website responds to the request by sending data and allows it to read the XML or HTML page. The code will then extract the required data from that XML or HTML page.

#### Any web scraping code you need to follow the basic steps:

1. Find the URL(address) of web page you want to scrape
2. Inspect the page and find the data you want to extract
3. Write the logic for extracting the data
4. Store extracted data into structured form(E.g. Pandas DataFrame)

### Performing web scraping with multiple libraries

**Web scraping using Selenium and Beautiful Soup**

**Selenium:** Selenium is a web testing library. It is used to automate browser activities.

**BeautifulSoup:** Beautiful Soup is a Python package for parsing HTML and XML documents. It creates parse trees that is helpful to extract the data easily.

**Pandas:** Pandas is a library used for data manipulation and analysis. It is used to extract the data and store it in the desired format.

**Requests**: It allows you to send HTTP/1.1 requests with ease and it does not require manually add query strings to your URLs, or to form-encode your POST data.

# Lets scrappe the data of Washing Machines from the Flipkart website.

### Suppose we want information like Machine name, price and rating

In [None]:
#pip install selenium

In [4]:
#pip install beautifulsoup4

In [5]:
from bs4 import BeautifulSoup
from selenium import webdriver
import requests
import pandas as pd

In [7]:
# open the URL and extract the data from the website

url = "https://www.flipkart.com/home-kitchen/home-appliances/washing-machines/fully-automatic-front-load~function/pr?sid=j9e%2Cabm%2C8qx&otracker=nmenu_sub_TVs%20%26%20Appliances_0_Fully%20Automatic%20Front%20Load"

In [9]:
driver = webdriver.Chrome()
driver.get(url)
content = driver.page_source
driver.quit()

**Create empty arrays to store the scraped data.**

In [12]:
Name=[]  #List to store names 
Price=[]   #List to store the price 
Rating=[] #List to store ratings 

In [14]:
#Make a request to the web page and gets it's HTML

#content = requests.get(response).content
content

'<html lang="en"><head><script type="text/javascript" async="async" src="https://flipkart.d1.sc.omtrdc.net/id?d_visid_ver=1.5.4&amp;callback=s_c_il%5B0%5D._setAnalyticsFields&amp;mcorgid=17EB401053DAF4840A490D4C%40AdobeOrg&amp;mid=29889359884644254163012640867163219269"></script><style id="react-native-stylesheet"></style><script type="text/javascript" async="async" src="https://dpm.demdex.net/id?d_visid_ver=1.5.4&amp;d_rtbd=json&amp;d_ver=2&amp;d_orgid=17EB401053DAF4840A490D4C%40AdobeOrg&amp;d_nsid=0&amp;d_cb=s_c_il%5B0%5D._setMarketingCloudFields"></script><link href="https://rukminim2.flixcart.com" rel="preconnect"><link rel="stylesheet" href="//static-assets-web.flixcart.com/fk-p-linchpin-web/fk-cp-zion/css/atlas.chunk.f9cc90.css"><link rel="stylesheet" href="//static-assets-web.flixcart.com/fk-p-linchpin-web/fk-cp-zion/css/app_modules.chunk.c48a12.css"><link rel="stylesheet" href="//static-assets-web.flixcart.com/fk-p-linchpin-web/fk-cp-zion/css/app.chunk.f32dbb.css"><meta http-eq

In [16]:
#Store the HTML page in 'soup', a BeautifulSoup object
soup = BeautifulSoup(content, "html.parser")

soup.prettify()

'<html lang="en">\n <head>\n  <script async="async" src="https://flipkart.d1.sc.omtrdc.net/id?d_visid_ver=1.5.4&amp;callback=s_c_il%5B0%5D._setAnalyticsFields&amp;mcorgid=17EB401053DAF4840A490D4C%40AdobeOrg&amp;mid=29889359884644254163012640867163219269" type="text/javascript">\n  </script>\n  <style id="react-native-stylesheet">\n  </style>\n  <script async="async" src="https://dpm.demdex.net/id?d_visid_ver=1.5.4&amp;d_rtbd=json&amp;d_ver=2&amp;d_orgid=17EB401053DAF4840A490D4C%40AdobeOrg&amp;d_nsid=0&amp;d_cb=s_c_il%5B0%5D._setMarketingCloudFields" type="text/javascript">\n  </script>\n  <link href="https://rukminim2.flixcart.com" rel="preconnect"/>\n  <link href="//static-assets-web.flixcart.com/fk-p-linchpin-web/fk-cp-zion/css/atlas.chunk.f9cc90.css" rel="stylesheet"/>\n  <link href="//static-assets-web.flixcart.com/fk-p-linchpin-web/fk-cp-zion/css/app_modules.chunk.c48a12.css" rel="stylesheet"/>\n  <link href="//static-assets-web.flixcart.com/fk-p-linchpin-web/fk-cp-zion/css/app.ch

In [57]:
links = soup.findAll(attrs={'class':'CGtC98'})
links

[<a class="CGtC98" href="/samsung-7-kg-5-star-ai-wi-fi-digital-inverter-fully-automatic-front-load-washing-machine-in-built-heater-silver/p/itmb2a2336207b22?pid=WMNG87FQVRVSZ5DF&amp;lid=LSTWMNG87FQVRVSZ5DFL8G7RI&amp;marketplace=FLIPKART&amp;store=j9e%2Fabm%2F8qx&amp;srno=b_1_1&amp;otracker=nmenu_sub_TVs%20%26%20Appliances_0_Fully%20Automatic%20Front%20Load&amp;iid=en_P3166lAX8gCk1aTE3ymZNyn0OFuX7Od49xPjIWDQfxRrG2UMImhqHwV1zkvZTvEg2AMIiFHLLCb-UZAiPDYy5Q%3D%3D&amp;ssid=niyvjsekds0000001722299045361" rel="noopener noreferrer" target="_blank"><div class="Otbq5D" data-tkid="ADVIEW_en_P3166lAX8gCk1aTE3ymZNyn0OFuX7Od49xPjIWDQfxRrG2UMImhqHwV1zkvZTvEg2AMIiFHLLCb-UZAiPDYy5Q=="><div class="yPq5Io"><div><div class="_4WELSP" style="height: 200px; width: 200px;"><img alt="SAMSUNG 7 kg 5 Star, AI, Wi-Fi, Digital Inverter Fully Automatic Front Load Washing Machine with In-bu..." class="DByuf4" loading="eager" src="https://rukminim2.flixcart.com/image/312/312/xif0q/washing-machine-new/i/3/r/-original-i

In [59]:
for a in soup.findAll(attrs={'class':'yKfJKb'}):
    
    name=a.find('div', attrs={'class':'KzDlHZ'})
    price=a.find('div', attrs={'class':'Nx9bqj _4b5DiR'})
    rating=a.find('div', attrs = {'class':'XQDdHH'})
    print("Names of Washing Machines",name.text)
    print("Price",price.text)
    print("Rating",rating.text)

Names of Washing Machines SAMSUNG 7 kg 5 Star, AI, Wi-Fi, Digital Inverter Fully Automatic Front Load Washing Machine with In-bu...
Price ₹30,990
Rating 4.4
Names of Washing Machines SAMSUNG 8 kg Fully Automatic Front Load Washing Machine with In-built Heater Black, Grey
Price ₹35,990
Rating 4.4
Names of Washing Machines Voltas Beko by A Tata Product 6.5 kg 5 star Fully Automatic Front Load Washing Machine with In-built H...
Price ₹21,990
Rating 3.9
Names of Washing Machines IFB 8 kg 5 Star with Steam Refresh program, 9 Swirl Wash, Eco Inverter, Touch Panel with AI Fully Auto...
Price ₹33,990
Rating 4.3
Names of Washing Machines IFB 8 kg 5 Star with Steam Refresh program, 9 Swirl Wash, Eco Inverter, Touch Panel with AI Fully Auto...
Price ₹32,990
Rating 4.3
Names of Washing Machines LG 7 kg 5 Star with Steam, Inverter Direct Drive Technology, 6 Motion DD, Touch Panel and 1200 RPM Ful...
Price ₹29,990
Rating 4.4
Names of Washing Machines IFB 7 kg 5 Star 2X Power Steam,Hard Water Wash Fu

In [61]:
import numpy as np

for a in soup.findAll(attrs={'class':'yKfJKb'}):
    
    name=a.find('div', attrs={'class':'KzDlHZ'})
    price=a.find('div', attrs={'class':'Nx9bqj _4b5DiR'})
    rating=a.find('div', attrs = {'class':'XQDdHH'})
            
    # Check if rating is None before accessing its text attribute
    if rating is not None:
        rating_data = rating.text
    else:
        rating_data = np.nan

    # Append data to the lists
    Name.append(name.text if name else np.nan)
    Price.append(price.text if price else np.nan)
    Rating.append(rating_data)


In [63]:
len(Name)

24

In [65]:
len(Price)

24

In [67]:
len(Rating)

24

To ensure they have the same length, you can use a common identifier to create a mapping between the elements of each list. One common identifier could be the index of the loop iteration.

### Store the scraped data in Comma-separated values (CSV format)

In [78]:
df = pd.DataFrame({'Popular Washing Machines' : Name,
                   'Price' : Price,
                   'Rating' : Rating})
df.to_csv('Washing_Machines.csv', index=False, encoding='utf-8')#read the data stored in laptop.csv file
data = pd.read_csv('Washing_Machines.csv')

In [80]:
data.head()

Unnamed: 0,Popular Washing Machines,Price,Rating
0,"SAMSUNG 7 kg 5 Star, AI, Wi-Fi, Digital Invert...","₹30,990",4.4
1,SAMSUNG 8 kg Fully Automatic Front Load Washin...,"₹35,990",4.4
2,Voltas Beko by A Tata Product 6.5 kg 5 star Fu...,"₹21,990",3.9
3,"IFB 8 kg 5 Star with Steam Refresh program, 9 ...","₹33,990",4.3
4,"IFB 8 kg 5 Star with Steam Refresh program, 9 ...","₹32,990",4.3
