## Python Scrapping Project - Flipkart Product Review


1. go to  a website
2. search for a respective product
3. locate review part
4. scrap all the data in local system



In [1]:
pip install flask_cors

Note: you may need to restart the kernel to use updated packages.


In [2]:
# import libraries
from flask import Flask, render_template, request,jsonify
from flask_cors import CORS,cross_origin
import requests
# BeautifulSoup iss a Python library that is used for web scraping purposes to pull the data out of HTML and XML files
from bs4 import BeautifulSoup as bs
# urlopen is a library to open any url
from urllib.request import urlopen as uReq

In [3]:
# We Can search the product here eg. Samsung, iphone 
# make changes according to our requirement, we choose iphone
flipkart_url = "https://www.flipkart.com/search?q=" + "iphone"

In [4]:
# We will get flipkart url to search for the particular product we added in the previous line
flipkart_url

'https://www.flipkart.com/search?q=iphone'

In [5]:
# openup the url
uClient = uReq(flipkart_url)

In [6]:
# when we pass some url request we get http request
# if the url is wrong, will not able to pull out and get error
uClient

<http.client.HTTPResponse at 0x14d76fc9790>

In [7]:
# read all the html and css information
flipkartPage = uClient.read()

In [8]:
flipkartPage

b'<!doctype html><html lang="en"><head><link href="https://rukminim1.flixcart.com" rel="preconnect"/><link rel="stylesheet" href="//static-assets-web.flixcart.com/fk-p-linchpin-web/fk-cp-zion/css/app_modules.chunk.905c37.css"/><link rel="stylesheet" href="//static-assets-web.flixcart.com/fk-p-linchpin-web/fk-cp-zion/css/app.chunk.116d54.css"/><meta http-equiv="Content-type" content="text/html; charset=utf-8"/><meta http-equiv="X-UA-Compatible" content="IE=Edge"/><meta property="fb:page_id" content="102988293558"/><meta property="fb:admins" content="658873552,624500995,100000233612389"/><meta name="robots" content="noodp"/><link rel="shortcut icon" href="https://static-assets-web.flixcart.com/www/promos/new/20150528-140547-favicon-retina.ico"/><link type="application/opensearchdescription+xml" rel="search" href="/osdd.xml?v=2"/><meta property="og:type" content="website"/><meta name="og_site_name" property="og:site_name" content="Flipkart.com"/><link rel="apple-touch-icon" sizes="57x57" 

In [9]:
# close this unable to read because not in structered format
uClient.close()

In [10]:
# calling BeautifulSoup(bs) for reading the page. Because BeautifulSoup make this page more readable
flipkart_html = bs(flipkartPage, "html.parser")

In [11]:
# Comparing the above output, this is much better and little bit structured
flipkart_html

<!DOCTYPE html>
<html lang="en"><head><link href="https://rukminim1.flixcart.com" rel="preconnect"/><link href="//static-assets-web.flixcart.com/fk-p-linchpin-web/fk-cp-zion/css/app_modules.chunk.905c37.css" rel="stylesheet"/><link href="//static-assets-web.flixcart.com/fk-p-linchpin-web/fk-cp-zion/css/app.chunk.116d54.css" rel="stylesheet"/><meta content="text/html; charset=utf-8" http-equiv="Content-type"/><meta content="IE=Edge" http-equiv="X-UA-Compatible"/><meta content="102988293558" property="fb:page_id"/><meta content="658873552,624500995,100000233612389" property="fb:admins"/><meta content="noodp" name="robots"/><link href="https://static-assets-web.flixcart.com/www/promos/new/20150528-140547-favicon-retina.ico" rel="shortcut icon"/><link href="/osdd.xml?v=2" rel="search" type="application/opensearchdescription+xml"/><meta content="website" property="og:type"/><meta content="Flipkart.com" name="og_site_name" property="og:site_name"/><link href="/apple-touch-icon-57x57.png" rel

In [12]:
# find all is a method inside BeautifulSoup to search for class
# all the products have div tags which are same throughout the page
# for a particular website, it will be same for different boxes of different products
# informations will be availables as list format
flipkart_html.findAll("div", {"class": "_1AtVbE col-12-12"})

[<div class="_1AtVbE col-12-12"><div class="_1KOcBL"><section class="JWMl0H _2hbLCH"><div class="_2ssEMF"><div class="_3V8rao"><span>Filters</span></div></div></section><div class="_2q_g77"><section class="_2aDURW"><div class="_2lfNTw"><span>CATEGORIES</span></div><div><div class="TB_InB"><span><svg class="_2Iqv73" height="10" viewbox="0 0 16 27" width="10" xmlns="http://www.w3.org/2000/svg"><path class="_3zK8He" d="M16 23.207L6.11 13.161 16 3.093 12.955 0 0 13.161l12.955 13.161z" fill="#fff"></path></svg></span><a class="_2qvBBJ _2Mji8F" href="/mobiles-accessories/pr?sid=tyy&amp;q=iphone&amp;otracker=categorytree" title="Mobiles &amp; Accessories">Mobiles &amp; Accessories</a></div></div><div><div class="TB_InB"><span><svg class="_2Iqv73" height="10" viewbox="0 0 16 27" width="10" xmlns="http://www.w3.org/2000/svg"><path class="" d="M16 23.207L6.11 13.161 16 3.093 12.955 0 0 13.161l12.955 13.161z" fill="#fff"></path></svg></span><a class="_1jJQdf _2Mji8F" href="/mobiles/pr?sid=tyy,4io

In [13]:
# Check number of product using length because class id will be same for same segment. So can check number of product in a page.
len(flipkart_html.findAll("div", {"class": "_1AtVbE col-12-12"}))

30

In [14]:
# 30 bigboxes are available
bigboxes = flipkart_html.findAll("div", {"class": "_1AtVbE col-12-12"})

In [15]:
# loop through bigboxes
# check all bigboxes
for i in bigboxes:
    print(i)

<div class="_1AtVbE col-12-12"><div class="_1KOcBL"><section class="JWMl0H _2hbLCH"><div class="_2ssEMF"><div class="_3V8rao"><span>Filters</span></div></div></section><div class="_2q_g77"><section class="_2aDURW"><div class="_2lfNTw"><span>CATEGORIES</span></div><div><div class="TB_InB"><span><svg class="_2Iqv73" height="10" viewbox="0 0 16 27" width="10" xmlns="http://www.w3.org/2000/svg"><path class="_3zK8He" d="M16 23.207L6.11 13.161 16 3.093 12.955 0 0 13.161l12.955 13.161z" fill="#fff"></path></svg></span><a class="_2qvBBJ _2Mji8F" href="/mobiles-accessories/pr?sid=tyy&amp;q=iphone&amp;otracker=categorytree" title="Mobiles &amp; Accessories">Mobiles &amp; Accessories</a></div></div><div><div class="TB_InB"><span><svg class="_2Iqv73" height="10" viewbox="0 0 16 27" width="10" xmlns="http://www.w3.org/2000/svg"><path class="" d="M16 23.207L6.11 13.161 16 3.093 12.955 0 0 13.161l12.955 13.161z" fill="#fff"></path></svg></span><a class="_1jJQdf _2Mji8F" href="/mobiles/pr?sid=tyy,4io&

In [16]:
# details of first product in the page
bigboxes[0]

<div class="_1AtVbE col-12-12"><div class="_1KOcBL"><section class="JWMl0H _2hbLCH"><div class="_2ssEMF"><div class="_3V8rao"><span>Filters</span></div></div></section><div class="_2q_g77"><section class="_2aDURW"><div class="_2lfNTw"><span>CATEGORIES</span></div><div><div class="TB_InB"><span><svg class="_2Iqv73" height="10" viewbox="0 0 16 27" width="10" xmlns="http://www.w3.org/2000/svg"><path class="_3zK8He" d="M16 23.207L6.11 13.161 16 3.093 12.955 0 0 13.161l12.955 13.161z" fill="#fff"></path></svg></span><a class="_2qvBBJ _2Mji8F" href="/mobiles-accessories/pr?sid=tyy&amp;q=iphone&amp;otracker=categorytree" title="Mobiles &amp; Accessories">Mobiles &amp; Accessories</a></div></div><div><div class="TB_InB"><span><svg class="_2Iqv73" height="10" viewbox="0 0 16 27" width="10" xmlns="http://www.w3.org/2000/svg"><path class="" d="M16 23.207L6.11 13.161 16 3.093 12.955 0 0 13.161l12.955 13.161z" fill="#fff"></path></svg></span><a class="_1jJQdf _2Mji8F" href="/mobiles/pr?sid=tyy,4io&

In [17]:
box = bigboxes[0]

In [18]:
box

<div class="_1AtVbE col-12-12"><div class="_1KOcBL"><section class="JWMl0H _2hbLCH"><div class="_2ssEMF"><div class="_3V8rao"><span>Filters</span></div></div></section><div class="_2q_g77"><section class="_2aDURW"><div class="_2lfNTw"><span>CATEGORIES</span></div><div><div class="TB_InB"><span><svg class="_2Iqv73" height="10" viewbox="0 0 16 27" width="10" xmlns="http://www.w3.org/2000/svg"><path class="_3zK8He" d="M16 23.207L6.11 13.161 16 3.093 12.955 0 0 13.161l12.955 13.161z" fill="#fff"></path></svg></span><a class="_2qvBBJ _2Mji8F" href="/mobiles-accessories/pr?sid=tyy&amp;q=iphone&amp;otracker=categorytree" title="Mobiles &amp; Accessories">Mobiles &amp; Accessories</a></div></div><div><div class="TB_InB"><span><svg class="_2Iqv73" height="10" viewbox="0 0 16 27" width="10" xmlns="http://www.w3.org/2000/svg"><path class="" d="M16 23.207L6.11 13.161 16 3.093 12.955 0 0 13.161l12.955 13.161z" fill="#fff"></path></svg></span><a class="_1jJQdf _2Mji8F" href="/mobiles/pr?sid=tyy,4io&

In [19]:
len(bigboxes)

30

In [20]:
# delete first 3 boxes
del bigboxes[0:3]

In [21]:
bigboxes[0]

<div class="_1AtVbE col-12-12"><div class="_13oc-S"><div data-id="MOBG6VF5SMXPNQHG" style="width:100%"><div class="_2kHMtA"><a class="_1fQZEK" href="/apple-iphone-13-blue-128-gb/p/itm6c601e0a58b3c?pid=MOBG6VF5SMXPNQHG&amp;lid=LSTMOBG6VF5SMXPNQHGL5FN51&amp;marketplace=FLIPKART&amp;q=iphone&amp;store=tyy%2F4io&amp;srno=s_1_2&amp;otracker=search&amp;fm=organic&amp;iid=bc5c7ee5-830b-43ce-9246-aa0029f99c76.MOBG6VF5SMXPNQHG.SEARCH&amp;ppt=None&amp;ppn=None&amp;ssid=bsxfyaju4g0000001659379587201&amp;qH=0b3f45b266a97d70" rel="noopener noreferrer" target="_blank"><div class="MIXNux"><div class="_2QcLo-"><div><div class="CXW8mj" style="height:200px;width:200px"><img alt="APPLE iPhone 13 (Blue, 128 GB)" class="_396cs4 _3exPp9" src="https://rukminim1.flixcart.com/image/312/312/ktketu80/mobile/2/y/o/iphone-13-mlpk3hn-a-apple-original-imag6vpyur6hjngg.jpeg?q=70"/></div></div></div><div class="_3wLduG"><div class="_3PzNI-"><span class="f3A4_V"><label class="_2iDkf8"><input class="_30VH1S" readonly=""

In [22]:
# select box 4, we have deleted 3
box = bigboxes[0]

In [23]:
# Extract the url
# we have key and value
# from a, we are giving href as key, then we get value
box.div.div.div.a['href']

'/apple-iphone-13-blue-128-gb/p/itm6c601e0a58b3c?pid=MOBG6VF5SMXPNQHG&lid=LSTMOBG6VF5SMXPNQHGL5FN51&marketplace=FLIPKART&q=iphone&store=tyy%2F4io&srno=s_1_2&otracker=search&fm=organic&iid=bc5c7ee5-830b-43ce-9246-aa0029f99c76.MOBG6VF5SMXPNQHG.SEARCH&ppt=None&ppn=None&ssid=bsxfyaju4g0000001659379587201&qH=0b3f45b266a97d70'

In [24]:
# add the extracted link with website link
productLink = "https://www.flipkart.com" + box.div.div.div.a['href']

In [25]:
# below link guides to product page
productLink

'https://www.flipkart.com/apple-iphone-13-blue-128-gb/p/itm6c601e0a58b3c?pid=MOBG6VF5SMXPNQHG&lid=LSTMOBG6VF5SMXPNQHGL5FN51&marketplace=FLIPKART&q=iphone&store=tyy%2F4io&srno=s_1_2&otracker=search&fm=organic&iid=bc5c7ee5-830b-43ce-9246-aa0029f99c76.MOBG6VF5SMXPNQHG.SEARCH&ppt=None&ppn=None&ssid=bsxfyaju4g0000001659379587201&qH=0b3f45b266a97d70'

In [26]:
# Reviews
prodRes = requests.get(productLink)

In [27]:
prodRes

<Response [200]>

In [28]:
prodRes.encoding='utf-8'

In [29]:
#scrap everything in that product page
prod_html = bs(prodRes.text, "html.parser")

In [30]:
prod_html

<!DOCTYPE html>
<html lang="en"><head><link href="https://rukminim1.flixcart.com" rel="preconnect"/><link href="//static-assets-web.flixcart.com/fk-p-linchpin-web/fk-cp-zion/css/app_modules.chunk.905c37.css" rel="stylesheet"/><link href="//static-assets-web.flixcart.com/fk-p-linchpin-web/fk-cp-zion/css/app.chunk.116d54.css" rel="stylesheet"/><meta content="text/html; charset=utf-8" http-equiv="Content-type"/><meta content="IE=Edge" http-equiv="X-UA-Compatible"/><meta content="102988293558" property="fb:page_id"/><meta content="658873552,624500995,100000233612389" property="fb:admins"/><meta content="noodp" name="robots"/><link href="https://static-assets-web.flixcart.com/www/promos/new/20150528-140547-favicon-retina.ico" rel="shortcut icon"/><link href="/osdd.xml?v=2" rel="search" type="application/opensearchdescription+xml"/><meta content="website" property="og:type"/><meta content="Flipkart.com" name="og_site_name" property="og:site_name"/><link href="/apple-touch-icon-57x57.png" rel

In [31]:
#find the class responsible for review and get the data
commentboxes = prod_html.find_all('div', {'class': "_16PBlm"})

In [32]:
commentboxes

[<div class="_16PBlm"><div class="col"><div class="col _2wzgFH"><div class="row"><div class="_3LWZlK _1BLPMq">5</div><p class="_2-N8zT">Brilliant</p></div><div class="row"><div class="t-ZTKy"><div><div class="">I switch to ios from android after 10 years so this review might help for migrators<br/><br/>It’s been a month using the iPhone13 and this was my experience<br/><br/>1. Design - its simple and no nonsense design . Expect white and pink rest of the colours are fingerprint magnets.  I have seen all the colours and I highly recommend the pink . It’s so light pink which makes it not girlish. See it for yourself it really looks so premium in light pink colour. <br/><br/>For 

In [33]:
# number of comments
len(commentboxes)

11

In [34]:
# select zeroth comment to understand the comments
commentboxes[0]

<div class="_16PBlm"><div class="col"><div class="col _2wzgFH"><div class="row"><div class="_3LWZlK _1BLPMq">5</div><p class="_2-N8zT">Brilliant</p></div><div class="row"><div class="t-ZTKy"><div><div class="">I switch to ios from android after 10 years so this review might help for migrators<br/><br/>It’s been a month using the iPhone13 and this was my experience<br/><br/>1. Design - its simple and no nonsense design . Expect white and pink rest of the colours are fingerprint magnets.  I have seen all the colours and I highly recommend the pink . It’s so light pink which makes it not girlish. See it for yourself it really looks so premium in light pink colour. <br/><br/>For r

In [35]:
# Reachout the particular segment
commentboxes[0].div.div

<div class="col _2wzgFH"><div class="row"><div class="_3LWZlK _1BLPMq">5</div><p class="_2-N8zT">Brilliant</p></div><div class="row"><div class="t-ZTKy"><div><div class="">I switch to ios from android after 10 years so this review might help for migrators<br/><br/>It’s been a month using the iPhone13 and this was my experience<br/><br/>1. Design - its simple and no nonsense design . Expect white and pink rest of the colours are fingerprint magnets.  I have seen all the colours and I highly recommend the pink . It’s so light pink which makes it not girlish. See it for yourself it really looks so premium in light pink colour. <br/><br/>For rest of it except white the aluminium f

In [36]:
# find the particular class in that div and extract the name 
commentboxes[0].div.div.find_all('p', {'class': '_2sc7ZR _2V5EHH'})[0].text

'Mahim Chauhan'

In [37]:
# Extract all the comments
for i in range (len(commentboxes)-1):
    # Name
    print(commentboxes[i].div.div.find_all('p', {'class': '_2sc7ZR _2V5EHH'})[0].text)
    # Star Rating
    print(commentboxes[i].div.div.div.div.text)
    # opinion on product - comment
    print(commentboxes[i].div.div.div.p.text)
    # Review - comment
    comtag = commentboxes[i].div.div.find_all('div', {'class': ''})
    print(comtag[0].div.text)
    print("\n")

Mahim Chauhan
5
Brilliant
I switch to ios from android after 10 years so this review might help for migratorsIt’s been a month using the iPhone13 and this was my experience1. Design - its simple and no nonsense design . Expect white and pink rest of the colours are fingerprint magnets.  I have seen all the colours and I highly recommend the pink . It’s so light pink which makes it not girlish. See it for yourself it really looks so premium in light pink colour. For rest of it except white the aluminium frame wil...


Vaibhav  Raj
5
Fabulous!
Amazing beast....As expected , didn't disappoint me,Had to sell hard chunk of kidneys to get it !!;pCamera quality is definitely a super upgradeBattery is super.. easily last throughout the day with heavy usage.Light weight looks stylish what else you need??Starlight color just wow!!!Apple it would have been better if you should  have given an adaptor. Increase 2k price and give it in box!!!Simple ....Edit 1 : After 14 days of usage highly satisfie