## Web Scraping using BeautifulSoup 

The below code scrapes the Amazon webpage. The URL contains the details of the book selling on Amazon Website. 
The code scrapes the details of the Books like, 'Title','Price','Publisher, 'Language' and other details. Finally the details are stored into a '.csv' file.

In [218]:
# import libraries

from bs4 import BeautifulSoup
import re
import csv
import requests
import smtplib
import time
import datetime

In [219]:
# Connection URL
URL = 'https://www.amazon.in/Business-Intelligence-Analytics-Data-Science/dp/9353067022/ref=sr_1_1_sspa?keywords=data+analytics+books&qid=1652177305&sprefix=data+ana%2Caps%2C104&sr=8-1-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUFRNTRWS1IwSlI3M1ImZW5jcnlwdGVkSWQ9QTA4ODY1NzczU1FCVDFVRVJaT1czJmVuY3J5cHRlZEFkSWQ9QTEwMjU5ODkxM0NFN1hFUU9PWlROJndpZGdldE5hbWU9c3BfYXRmJmFjdGlvbj1jbGlja1JlZGlyZWN0JmRvTm90TG9nQ2xpY2s9dHJ1ZQ=='
# User-Agent from "https://httpbin.org/get"
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.41 Safari/537.36", "Accept-Encoding":"gzip, deflate", "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9", "DNT":"1","Connection":"close", "Upgrade-Insecure-Requests":"1"}

# Sending request
page = requests.get(URL, headers=headers)

# Checking response
print(page)

<Response [200]>


In [220]:
soup = BeautifulSoup(page.content, "html.parser")
# print(soup.prettify())

<b>The details of the webpage can change, which can affect the code. Hence, a local copy of webpage is created from which scraping will be conducted</b>

In [221]:
# Writing HTML to a file
out = open("amazon_page.html","w",encoding="utf-8")
out.write(str(soup))
out.close()

In [222]:
# Reading from HTML(Local) file
soup = BeautifulSoup(open("amazon_page.html",encoding="utf-8"), "html.parser")
# print(soup.prettify())

### Scraping Basic Details

In [223]:
title = soup.find(id = 'productTitle').get_text()             # Scraping Title
price = soup.find(id = 'price').get_text()                    # Scraping Price
title = re.sub(r"[^a-zA-Z0-9: ]","",title)
title = title.split('Fourth', 1)[0]
price = re.sub(r"[^a-zA-Z0-9: ]","",price)
print("Title: ", title)
print("Price: ", price)

d = dict([('Title',title),('Price',price)])                   # Saving the details into dictionary

Title:   Business Intelligence Analytics and Data Science: A Managerial Perspective  
Price:  64000


In [224]:
describe = []
for element in soup.find(id = 'bookDescription_feature_div'):  # Scraping Description
    temp = element.get_text()
    describe.append(temp)
description = describe[1]
description = re.sub(r"[^a-zA-Z0-9: ]","",description)
d['Description'] = description.strip()
print(description)

  Brchapter 1 An overview of business intelligence analytics and data science brChapter 2 descriptive analytics I: nature of data statistical modeling and visualization brChapter 3 descriptive analytics II: business intelligence and data warehousing brChapter 4 predictive analytics I: data mining process methods and algorithms brChapter 5 predictive analytics II: text web and social media brChapter 6 prescriptive analytics: optimization and simulation brChapter 7 big data concepts and tools brChapter 8 future trends privacy and managerial considerations in analytics  Read more 


### Scraping further details

In [225]:
details = soup.find('ul', attrs = {'class':'a-unordered-list a-nostyle a-vertical a-spacing-none detail-bullet-list'})
inf = []
info = []
for child in details.children:
    inf.append(child.get_text())
for element in inf:
    element = re.sub(r"[^a-zA-Z0-9: ]","",element)
    info.append(element)
info = list(filter(str.strip,info))

keys = [i.split(':', 1)[0] for i in info]
values = [i.split(':', 1)[1] for i in info]

In [226]:
key = []
for element in keys:
    element = re.sub(" +", " ",element)
    key.append(element)
value = []
for element in values:
    element = re.sub(" +", " ",element)
    value.append(element)

In [227]:
for i in range(0,len(key)):
    d[key[i]] = value[i]

print(d)

{'Title': ' Business Intelligence Analytics and Data Science: A Managerial Perspective  ', 'Price': '64000', 'Description': 'Brchapter 1 An overview of business intelligence analytics and data science brChapter 2 descriptive analytics I: nature of data statistical modeling and visualization brChapter 3 descriptive analytics II: business intelligence and data warehousing brChapter 4 predictive analytics I: data mining process methods and algorithms brChapter 5 predictive analytics II: text web and social media brChapter 6 prescriptive analytics: optimization and simulation brChapter 7 big data concepts and tools brChapter 8 future trends privacy and managerial considerations in analytics  Read more', ' Publisher ': ' Pearson Education Fourth edition 25 March 2019 ', ' Language ': ' English ', ' Paperback ': ' 512 pages ', ' ISBN10 ': ' 9353067022 ', ' ISBN13 ': ' 9789353067021 ', ' Item Weight ': ' 748 g ', ' Dimensions ': ' 203 x 254 x 47 cm ', ' Country of Origin ': ' India '}


### Saving the data into CSV file

In [228]:
with open('book_data.csv', 'w') as f:
    for key in d.keys():
        f.write("%s, %s\n" % (key, d[key]))