# Web Scrapping

### Amazon product review
https://www.amazon.in/New-Apple-iPhone-XR-128GB/dp/B08L89NSQK/ref=cm_cr_srp_d_product_top?ie=UTF8


### Library for web scrapping

In [1]:
from bs4 import BeautifulSoup 
import requests

In [2]:
url='https://www.amazon.in/New-Apple-iPhone-XR-128GB/product-reviews/B08L89NSQK/ref=cm_cr_getr_mb_paging_btm_4?ie=UTF8&reviewerType=all_reviews&pageNumber=4'

### Sending Requests

In [3]:
link=requests.get(url)

# Lets verify the url
link.raise_for_status() # It does'nt throw an error it mean the url is valid

### Using BeautifulSoup parser the html

In [4]:
# Let's get the html parser 
soup= BeautifulSoup(link.text,'html.parser')
soup

<!DOCTYPE html>
<html class="a-no-js" data-19ax5a9jf="dingo" lang="en-in"><!-- sp:feature:head-start -->
<head><script>var aPageStart = (new Date()).getTime();</script><meta charset="utf-8"/>
<!-- sp:end-feature:head-start -->
<!-- sp:feature:cs-optimization -->
<meta content="on" http-equiv="x-dns-prefetch-control"/>
<link href="https://images-eu.ssl-images-amazon.com" rel="dns-prefetch"/>
<link href="https://m.media-amazon.com" rel="dns-prefetch"/>
<link href="https://completion.amazon.com" rel="dns-prefetch"/>
<!-- sp:end-feature:cs-optimization -->
<!-- sp:feature:aui-assets -->
<link href="https://images-eu.ssl-images-amazon.com/images/I/11EIQ5IGqaL._RC|01ZTHTZObnL.css,41JZEtDv4tL.css,31Y8m1dzTdL.css,013z33uKh2L.css,017DsKjNQJL.css,0131vqwP5UL.css,41EWOOlBJ9L.css,11TIuySqr6L.css,01ElnPiDxWL.css,11bGSgD5pDL.css,01Dm5eKVxwL.css,01IdKcBuAdL.css,01y-XAlI+2L.css,21N4kUH7pxL.css,01oDR3IULNL.css,41CYNGpGlrL.css,01XPHJk60-L.css,114y0SIP+yL.css,21aPhFy+riL.css,11gneA3MtJL.css,21fecG8pUzL.c

In [5]:
print(soup.prettify())

<!DOCTYPE html>
<html class="a-no-js" data-19ax5a9jf="dingo" lang="en-in">
 <!-- sp:feature:head-start -->
 <head>
  <script>
   var aPageStart = (new Date()).getTime();
  </script>
  <meta charset="utf-8"/>
  <!-- sp:end-feature:head-start -->
  <!-- sp:feature:cs-optimization -->
  <meta content="on" http-equiv="x-dns-prefetch-control"/>
  <link href="https://images-eu.ssl-images-amazon.com" rel="dns-prefetch"/>
  <link href="https://m.media-amazon.com" rel="dns-prefetch"/>
  <link href="https://completion.amazon.com" rel="dns-prefetch"/>
  <!-- sp:end-feature:cs-optimization -->
  <!-- sp:feature:aui-assets -->
  <link href="https://images-eu.ssl-images-amazon.com/images/I/11EIQ5IGqaL._RC|01ZTHTZObnL.css,41JZEtDv4tL.css,31Y8m1dzTdL.css,013z33uKh2L.css,017DsKjNQJL.css,0131vqwP5UL.css,41EWOOlBJ9L.css,11TIuySqr6L.css,01ElnPiDxWL.css,11bGSgD5pDL.css,01Dm5eKVxwL.css,01IdKcBuAdL.css,01y-XAlI+2L.css,21N4kUH7pxL.css,01oDR3IULNL.css,41CYNGpGlrL.css,01XPHJk60-L.css,114y0SIP+yL.css,21aPhFy+riL

### Name of customers

In [6]:
names= soup.find_all('span',class_='a-profile-name')

In [7]:
# Using loop for extracting all the names
cust_name=[]
for i in range(0,len(names)):
    name= (names[i]).get_text()
    cust_name.append(name)

In [8]:
cust_name

['Darshan Sanghvi',
 'abrakca',
 'MK',
 'Nikhil Singh',
 'Amazon Customer',
 'Yugal Parmar',
 'sujesh',
 'Aditya Sharma',
 'Sunil',
 'Amazon Customer',
 'nusrat',
 'Ravee']

#### Removing repeated customers

In [9]:
cust_name.pop(4)

'Amazon Customer'

In [10]:
cust_name

['Darshan Sanghvi',
 'abrakca',
 'MK',
 'Nikhil Singh',
 'Yugal Parmar',
 'sujesh',
 'Aditya Sharma',
 'Sunil',
 'Amazon Customer',
 'nusrat',
 'Ravee']

In [11]:
cust_name.pop(8)

'Amazon Customer'

### Review Title

In [12]:
title= soup.find_all('a',class_='review-title-content')
title

[<a class="a-size-base a-link-normal review-title a-color-base review-title-content a-text-bold" data-hook="review-title" href="/gp/customer-reviews/R34IG6M0V1260J?ASIN=B08L89NSQK">
 <span>Seems to be refurbished phone</span>
 </a>,
 <a class="a-size-base a-link-normal review-title a-color-base review-title-content a-text-bold" data-hook="review-title" href="/gp/customer-reviews/R1DWQBAK108IBN?ASIN=B08L89NSQK">
 <span>Defective product</span>
 </a>,
 <a class="a-size-base a-link-normal review-title a-color-base review-title-content a-text-bold" data-hook="review-title" href="/gp/customer-reviews/R2JAUEE8QP74TX?ASIN=B08L89NSQK">
 <span>Awesome Iphone XR - Worth buying 2021</span>
 </a>,
 <a class="a-size-base a-link-normal review-title a-color-base review-title-content a-text-bold" data-hook="review-title" href="/gp/customer-reviews/R2Z1LCMYQSOXOV?ASIN=B08L89NSQK">
 <span>I Phone XR - White</span>
 </a>,
 <a class="a-size-base a-link-normal review-title a-color-base review-title-content

In [13]:
# Using for loop for all the reviews
review_title= []
for i in range(0,len(title)):
    titles=(title[i]).get_text()
    review_title.append(titles)

In [14]:
review_title

['\nSeems to be refurbished phone\n',
 '\nDefective product\n',
 '\nAwesome Iphone XR - Worth buying 2021\n',
 '\nI Phone XR - White\n',
 '\nValue for money phone\n',
 '\nXR Tried and tested model\n',
 '\nBeautiful phone\n',
 '\nIphone XR in 2021\n',
 '\nBest purchase\n',
 '\nIt’s an iPhone and it lives up to its reputation.\n']

#### Triming extra spaces and line

In [15]:
# Trimng fro left
review_title[:]= [i.lstrip('\n') for i in review_title]

In [16]:
review_title

['Seems to be refurbished phone\n',
 'Defective product\n',
 'Awesome Iphone XR - Worth buying 2021\n',
 'I Phone XR - White\n',
 'Value for money phone\n',
 'XR Tried and tested model\n',
 'Beautiful phone\n',
 'Iphone XR in 2021\n',
 'Best purchase\n',
 'It’s an iPhone and it lives up to its reputation.\n']

In [17]:
review_title[:]= [i.rstrip('\n') for i in review_title]

In [18]:
review_title

['Seems to be refurbished phone',
 'Defective product',
 'Awesome Iphone XR - Worth buying 2021',
 'I Phone XR - White',
 'Value for money phone',
 'XR Tried and tested model',
 'Beautiful phone',
 'Iphone XR in 2021',
 'Best purchase',
 'It’s an iPhone and it lives up to its reputation.']

### rating given by customers

In [19]:
rating= soup.find_all('i',class_= 'review-rating')
rating

[<i class="a-icon a-icon-star a-star-5 review-rating" data-hook="review-star-rating-view-point"><span class="a-icon-alt">5.0 out of 5 stars</span></i>,
 <i class="a-icon a-icon-star a-star-1 review-rating" data-hook="review-star-rating-view-point"><span class="a-icon-alt">1.0 out of 5 stars</span></i>,
 <i class="a-icon a-icon-star a-star-1 review-rating" data-hook="review-star-rating"><span class="a-icon-alt">1.0 out of 5 stars</span></i>,
 <i class="a-icon a-icon-star a-star-1 review-rating" data-hook="review-star-rating"><span class="a-icon-alt">1.0 out of 5 stars</span></i>,
 <i class="a-icon a-icon-star a-star-4 review-rating" data-hook="review-star-rating"><span class="a-icon-alt">4.0 out of 5 stars</span></i>,
 <i class="a-icon a-icon-star a-star-5 review-rating" data-hook="review-star-rating"><span class="a-icon-alt">5.0 out of 5 stars</span></i>,
 <i class="a-icon a-icon-star a-star-5 review-rating" data-hook="review-star-rating"><span class="a-icon-alt">5.0 out of 5 stars</sp

In [20]:
ratings=[]
for i in range(0,len(rating)):
    rate=(rating[i]).get_text()
    ratings.append(rate)

In [21]:
ratings

['5.0 out of 5 stars',
 '1.0 out of 5 stars',
 '1.0 out of 5 stars',
 '1.0 out of 5 stars',
 '4.0 out of 5 stars',
 '5.0 out of 5 stars',
 '5.0 out of 5 stars',
 '5.0 out of 5 stars',
 '5.0 out of 5 stars',
 '5.0 out of 5 stars',
 '4.0 out of 5 stars',
 '4.0 out of 5 stars']

#### Removing repeated customers ratings

In [22]:
ratings.pop(4)

'4.0 out of 5 stars'

In [23]:
ratings.pop(8)

'5.0 out of 5 stars'

In [24]:
ratings

['5.0 out of 5 stars',
 '1.0 out of 5 stars',
 '1.0 out of 5 stars',
 '1.0 out of 5 stars',
 '5.0 out of 5 stars',
 '5.0 out of 5 stars',
 '5.0 out of 5 stars',
 '5.0 out of 5 stars',
 '4.0 out of 5 stars',
 '4.0 out of 5 stars']

### Review given by customers

In [25]:
review_body= soup.find_all('span', {'data-hook':"review-body"})
review_body

[<span class="a-size-base review-text review-text-content" data-hook="review-body">
 <span>
   had purchased IPhone XR on 10th November ….<br/>The handset’s mic had the issue which was only solved by replacing the new IPhone XR with your excellent customer service.<br/><br/>To my regret the replaced IPhone XR is facing the same problem…<br/>Possible apple replaces with the refurbished phone by fooling the customers…customer support and service centre people both had asked from where you had purchase …thus proving replaced phones are refurbished …<br/><br/>They don’t have public complainants portals nor twitter …<br/><br/>This kind of issues are pretty common with everyone<br/><br/>Apples are fooling people
 </span>
 </span>,
 <span class="a-size-base review-text review-text-content" data-hook="review-body">
 <span>
   After use of 2 days i face issues in my iphone. Amazon send me defective product. When i tell for return he denied.Not happy with amazon.
 </span>
 </span>,
 <span class=

In [26]:
review_text=[]
for i in range (0,len(review_body)):
    review= review_body[i].get_text()
    review_text.append(review)

In [27]:
review_text

['\n\n  had purchased IPhone XR on 10th November ….The handset’s mic had the issue which was only solved by replacing the new IPhone XR with your excellent customer service.To my regret the replaced IPhone XR is facing the same problem…Possible apple replaces with the refurbished phone by fooling the customers…customer support and service centre people both had asked from where you had purchase …thus proving replaced phones are refurbished …They don’t have public complainants portals nor twitter …This kind of issues are pretty common with everyoneApples are fooling people\n\n',
 '\n\n  After use of 2 days i face issues in my iphone. Amazon send me defective product. When i tell for return he denied.Not happy with amazon.\n\n',
 '\n\n  Awesome Phone, Initially thought to buy Iphone 11 but due to budget constraint and comparing both the phones, i felt IphoneXR is worth the buy because there was no much difference between XR and 11 except the camera and processor..\n\n',
 '\n\n  i phone X

#### Triming extra line or space

In [28]:
review_text[:]= [i.lstrip('\n') for i in review_text ]
review_text[:]= [i.lstrip(' ') for i in review_text ]
review_text[:]= [i.rstrip('\n') for i in review_text ]
review_text

['had purchased IPhone XR on 10th November ….The handset’s mic had the issue which was only solved by replacing the new IPhone XR with your excellent customer service.To my regret the replaced IPhone XR is facing the same problem…Possible apple replaces with the refurbished phone by fooling the customers…customer support and service centre people both had asked from where you had purchase …thus proving replaced phones are refurbished …They don’t have public complainants portals nor twitter …This kind of issues are pretty common with everyoneApples are fooling people',
 'After use of 2 days i face issues in my iphone. Amazon send me defective product. When i tell for return he denied.Not happy with amazon.',
 'Awesome Phone, Initially thought to buy Iphone 11 but due to budget constraint and comparing both the phones, i felt IphoneXR is worth the buy because there was no much difference between XR and 11 except the camera and processor..',
 'i phone XR is Super Smooth I phone like other

### Priting the lenght

In [29]:
print(len(cust_name))
print(len(review_title))
print(len(ratings))
print(len(review_text))

10
10
10
10


### Creating DataFrame

In [30]:
# Creating the dataframe
import pandas as pd
df= pd.DataFrame()


### Creating column in DataFrame

In [31]:
df['Customer name']= cust_name
df['Review']= review_title
df['Ratings']= ratings
df['Review_text']= review_text

In [32]:
df

Unnamed: 0,Customer name,Review,Ratings,Review_text
0,Darshan Sanghvi,Seems to be refurbished phone,5.0 out of 5 stars,had purchased IPhone XR on 10th November ….The...
1,abrakca,Defective product,1.0 out of 5 stars,After use of 2 days i face issues in my iphone...
2,MK,Awesome Iphone XR - Worth buying 2021,1.0 out of 5 stars,"Awesome Phone, Initially thought to buy Iphone..."
3,Nikhil Singh,I Phone XR - White,1.0 out of 5 stars,i phone XR is Super Smooth I phone like other ...
4,Yugal Parmar,Value for money phone,5.0 out of 5 stars,Good option at 42k for 128gb variant.Fav featu...
5,sujesh,XR Tried and tested model,5.0 out of 5 stars,"Apple XR, it’s been around for a while, best d..."
6,Aditya Sharma,Beautiful phone,5.0 out of 5 stars,It’s an amazing phone
7,Sunil,Iphone XR in 2021,5.0 out of 5 stars,"See, for short review.. it fits the profile fo..."
8,nusrat,Best purchase,4.0 out of 5 stars,I was really hesitant to buy any apple product...
9,Ravee,It’s an iPhone and it lives up to its reputation.,4.0 out of 5 stars,Phone is certainly good as per its reputation....


### Storing the DataFrame into excel file

In [34]:
df.to_excel(r'Amazon_review.xlsx', index=False)