# Web Scraping FlipKart using Beautiful Soup

#### The Reviews of Samsung Galaxy A30 from flipkart are scraped using Beautiful Soup

Importing all necessary variables

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

The URL of the site is passed to variable response

In [2]:
response = requests.get("https://www.flipkart.com/samsung-galaxy-a30-black-64-gb/product-reviews/itmfec2hqbxcmbzn?pid=MOBFE4CSBDN9XETN&lid=LSTMOBFE4CSBDN9XETNK6F9XA&marketplace=FLIPKART&page=1")

Checking for the status code. If code is 200, no error; go forward for the scraping

In [3]:
response

<Response [200]>

Parsing html using Beautiful Soup and viewing the content in the html file. prettify() method automatically alligns the data and improves readability.

In [4]:
soup = BeautifulSoup(response.content,"html.parser")
print(soup.prettify())

<!DOCTYPE html>
<html lang="en">
 <head>
  <link href="https://rukminim1.flixcart.com" rel="dns-prefetch"/>
  <link href="https://img1a.flixcart.com" rel="dns-prefetch"/>
  <link href="//img1a.flixcart.com/www/linchpin/fk-cp-zion/css/app.chunk.219133.css" rel="stylesheet"/>
  <link as="image" href="//img1a.flixcart.com/www/linchpin/fk-cp-zion/img/fk-logo_9fddff.png" rel="preload"/>
  <meta content="text/html; charset=utf-8" http-equiv="Content-type"/>
  <meta content="IE=Edge" http-equiv="X-UA-Compatible"/>
  <meta content="102988293558" property="fb:page_id"/>
  <meta content="658873552,624500995,100000233612389" property="fb:admins"/>
  <meta content="noodp" name="robots"/>
  <link href="https://img1a.flixcart.com/www/promos/new/20150528-140547-favicon-retina.ico" rel="shortcut icon">
   <link href="/osdd.xml?v=2" rel="search" type="application/opensearchdescription+xml"/>
   <meta content="website" property="og:type"/>
   <meta content="Flipkart.com" name="og_site_name" property="og

Getting the Reviews available in the page

In [5]:
review_content = soup.find_all("div",class_="qwjRop")
review_content

[<div class="qwjRop"><div><div class="">good display amazing samsung good job decent price better samsung</div><span class="_2jRR3v"><span>READ MORE</span></span></div></div>,
 <div class="qwjRop"><div><div class="">Awesome phone. which is great for this price. loved it's back cam❤❤, the phone is so cool and lightweight 😎</div><span class="_2jRR3v"><span>READ MORE</span></span></div></div>,
 <div class="qwjRop"><div><div class="">Best mid range smart phone <br/>camera -4/5<br/>display-4/5<br/>performance-4.5/5<br/>design-5/5<br/>battery-4.5/5 <br/>I bought it for my mom she is really happy about this product and she enjoy to play games like candy crush .</div><span class="_2jRR3v"><span>READ MORE</span></span></div></div>,
 <div class="qwjRop"><div><div class="">Bought this for my mother and she is loving it. Tested Camera and performance. It's good for the price. Display is just mind blowing. Super AMLOED is awesome.</div><span class="_2jRR3v"><span>READ MORE</span></span></div></div>

Selecting the text values and storing them in a list (excluding tags)

In [6]:
review_body = []
for i in range(0,len(review_content)):
    review_body.append(review_content[i].get_text())
review_body

['good display amazing samsung good job decent price better samsungREAD MORE',
 "Awesome phone. which is great for this price. loved it's back cam❤❤, the phone is so cool and lightweight 😎READ MORE",
 'Best mid range smart phone camera -4/5display-4/5performance-4.5/5design-5/5battery-4.5/5 I bought it for my mom she is really happy about this product and she enjoy to play games like candy crush .READ MORE',
 "Bought this for my mother and she is loving it. Tested Camera and performance. It's good for the price. Display is just mind blowing. Super AMLOED is awesome.READ MORE",
 'Good mobile for this price.Battery drains just like that. Not even stood for a single day, even I use less apps and not playing any games. Sound is very low while playing music or other sounds in loudspeaker but on Bluetooth headset or earphones Dolby Atmos plays superb role. Camera quality is awesome. They given many options like Pro mode, Panaroma, live shoot which can blur the background while shooting and a

Removing the unnecessary part of the content

In [7]:
review_body[:] = [body.rstrip('READ MORE') for body in review_body]
review_body

['good display amazing samsung good job decent price better samsung',
 "Awesome phone. which is great for this price. loved it's back cam❤❤, the phone is so cool and lightweight 😎",
 'Best mid range smart phone camera -4/5display-4/5performance-4.5/5design-5/5battery-4.5/5 I bought it for my mom she is really happy about this product and she enjoy to play games like candy crush .',
 "Bought this for my mother and she is loving it. Tested Camera and performance. It's good for the price. Display is just mind blowing. Super AMLOED is awesome.",
 'Good mobile for this price.Battery drains just like that. Not even stood for a single day, even I use less apps and not playing any games. Sound is very low while playing music or other sounds in loudspeaker but on Bluetooth headset or earphones Dolby Atmos plays superb role. Camera quality is awesome. They given many options like Pro mode, Panaroma, live shoot which can blur the background while shooting and after the shoot. Attached some pics. 

Creating a dataframe to store the data collected

In [8]:
df = pd.DataFrame()

Creating column in the dataframe and viewing the data frame

In [9]:
df["ReviewContent"] = review_body
df

Unnamed: 0,ReviewContent
0,good display amazing samsung good job decent p...
1,Awesome phone. which is great for this price. ...
2,Best mid range smart phone camera -4/5display-...
3,Bought this for my mother and she is loving it...
4,Good mobile for this price.Battery drains just...
5,Let me drill down into the Pros and Cons.Must ...
6,Excellent mobile from samsung .1.AMOLED DISP...
7,Iam using 2 months complete mobile is awesome ...
8,SAMSUNG GALAXY A30 is Good Product and After 2...
9,very nic& excellent mobile..but one thing this...


The dataframe is converted into a csv file in the specified location.

In [10]:
df.to_csv(r"C:\Users\My PC\Desktop\Ratings.csv",index=True)