# Web Scraping 


**Need of Web Scraping**


---


You need to fetch some information from the website. To do so, copy and paste the data displayed by the website which is a very tedious job that may take many hours or sometime days to complete.

**What is Web Scraping**


---


Web Scraping is a technique employed to extract large amount of data from websites whereby the data is extracted and saved to a local file in your computer or to a database.


Webpages --> Web Scraping --> (XML, CSV, Database)

**Steps of Webscraping**


---




1.   Document Load
2.   Parsing
3.   Extraction
4.   Transformation



**Different Python Packages**


---




*   Pattern
*   Scrapy
*   Mechanize
*   Beautiful Soup
*   Requests







# Demo - Scrape Flipkart

***Problem Statement***


---


Scrape flipkart website and fetch details like: **name** of the product, **price** and the **ratings** of the searched product.

In [53]:
# importing modules needed for webscraping
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq

# web page to scrap
my_url = 'https://www.flipkart.com/search?q=iphone&otracker=start&as-show=on&as=off'
# opens connection and load page to uClient variable
uClient = uReq(my_url)
# read the website's html
page_html = uClient.read()
# closing the connection
uClient.close()
# parsing html
page_soup = soup(page_html, 'html.parser')

# the html page is very large so storing only needed content
# taking div tag whose class name is 'col col-7-12'
containers = page_soup.findAll('div', {'class': 'col col-7-12'})
# print("Total containers :", len(containers))

# prettify brings html into organised manner
# print(soup.prettify(containers[0]))


#--------------------------------------------------
# accessing all div tags that contain name
names = page_soup.findAll('div', {'class': '_4rR01T'})
# get_text() is used to get content of any tag
print('Name :', names[0].get_text())

# accessing all div tags that contain price
prices = page_soup.findAll('div', {'class': '_30jeq3 _1_WHN1'})
# printing price of mobile (text is used to get content)
print('Price :', prices[0].text)

# accessing all div tags that contain 
ratings = page_soup.findAll('div', {'class': '_3LWZlK'})
# printing rating of mobile (text is used to get content)
print('Rating :', ratings[0].text)

Name : APPLE iPhone 12 mini (Black, 64 GB)
Price : ₹37,999
Rating : 4.5


In [56]:
# storing all details into a file

# opening file
Iphones = 'iphones_details.csv'
f = open(Iphones, 'w')

# heading of that file
headers = 'Product_Name,Pricing,Ratings\n'
f.write(headers)

for i in range(len(containers)):
  product_name = names[i].text
  pricing = prices[i].text
  rating = ratings[i].text
  # print(product_name, pricing, rating)

  # comma separates data into different column and \n changes rows
  # so remove comma in product names and price
  f.write(product_name.replace(', ', '|') + ',' + pricing.replace(',', '') + ',' + rating + '\n')
  # print(product_name.replace(', ', '|') + ',' + pricing.replace(',', '') + ',' + rating)

f.close()

APPLE iPhone 12 mini (Black|64 GB),₹37999,4.5
APPLE iPhone 13 ((PRODUCT)RED|128 GB),₹61999,4.7
APPLE iPhone 13 (Midnight|128 GB),₹61999,4.7
APPLE iPhone 11 (Red|64 GB),₹37999,4.6
APPLE iPhone 11 (Black|64 GB),₹38999,4.6
APPLE iPhone 13 (Green|128 GB),₹61999,4.7
APPLE iPhone 12 mini (White|64 GB),₹37999,4.5
APPLE iPhone 11 (White|64 GB),₹38999,4.6
APPLE iPhone 11 (Red|128 GB),₹43999,4.6
APPLE iPhone 11 (White|128 GB),₹45999,4.6
APPLE iPhone 11 (Black|128 GB),₹45999,4.6
APPLE iPhone 11 (Purple|64 GB),₹39999,4.6
APPLE iPhone 11 (Purple|128 GB),₹45999,4.6
APPLE iPhone 11 (Green|64 GB),₹39999,4.6
APPLE iPhone 13 (Blue|128 GB),₹61999,4.7
APPLE iPhone 12 mini (Black|128 GB),₹42999,4.5
APPLE iPhone 12 mini (Blue|64 GB),₹37999,4.5
APPLE iPhone 13 Pro (Graphite|256 GB),₹129900,4.6
APPLE iPhone 14 Pro Max (Deep Purple|128 GB),₹139900,4.7
APPLE iPhone 13 (Blue|512 GB),₹83999,4.7
APPLE iPhone 14 Plus (Starlight|128 GB),₹83990,4.5
APPLE iPhone 13 (Pink|128 GB),₹61999,4.7
APPLE iPhone 13 (Midnight|51