## **Amazon Web Scraping**

**Web Scraping** is the process of automatically extracting data from websites.

Instead of manually copying:

1. product name
2. price
3. rating
4. reviews

You can use code to collect this data automatically from HTML pages.

### **Description of Libraries**

**Requests :**

The requests library is used to send HTTP requests to websites and retrieve web page content for scraping.

**LXML :**

lxml is a fast and efficient HTML/XML parser used to parse and process the structure of web pages.

**BeautifulSoup (bs4) :**

BeautifulSoup helps in parsing HTML content and extracting specific data such as movie names, ratings, and release years from web pages.

**CSV :**

The csv module is used to store the extracted data into a CSV (Comma-Separated Values) file format.

**Pandas :**

pandas is used to organize the scraped data into a structured format called a DataFrame and export it to CSV for analysis.

In [None]:
# import libraries for web scraping

import requests
import lxml
from bs4 import BeautifulSoup as bs
import csv

In [None]:
# amazon wesbite

url = "https://www.amazon.in/dp/B0FCMKSP7V?_encoding=UTF8&ref_=cct_cg_Budget_2a1&pf_rd_p=b1ea44b7-0053-4e59-8fc9-f3adda094cbb&pf_rd_r=V44ZA770474J6VMEHH3F&th=1"

In [None]:
# dummy user agent of browser

header = {"user-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:147.0) Gecko/20100101 Firefox/147.0"}

In [None]:
# requests to url to server

response = requests.get(url, headers = header)

if response.status_code == 200:
  html_content = response.text
else:
  print("error occured")

In [None]:
# display html content

html_content



In [None]:
soup = bs(html_content, 'lxml')

In [None]:
soup

<!DOCTYPE html>
<html class="a-no-js" data-19ax5a9jf="dingo" lang="en-in"><!-- sp:feature:head-start -->
<head><script>var aPageStart = (new Date()).getTime();</script><meta charset="utf-8"/>
<!-- sp:end-feature:head-start -->
<!-- sp:feature:csm:head-open-part1 -->
<script type="text/javascript">var ue_t0=ue_t0||+new Date();</script>
<!-- sp:end-feature:csm:head-open-part1 -->
<!-- sp:feature:cs-optimization -->
<meta content="on" http-equiv="x-dns-prefetch-control"/>
<link crossorigin="" href="https://images-eu.ssl-images-amazon.com" rel="preconnect"/>
<link crossorigin="" href="https://m.media-amazon.com" rel="preconnect"/>
<!-- sp:end-feature:cs-optimization -->
<!-- sp:feature:csm:head-open-part2 -->
<script type="text/javascript">
window.ue_ihb = (window.ue_ihb || window.ueinit || 0) + 1;
if (window.ue_ihb === 1) {

var ue_csm = window,
    ue_hob = +new Date();
(function(d){var e=d.ue=d.ue||{},f=Date.now||function(){return+new Date};e.d=function(b){return f()-(b?0:d.ue_t0)};e.st

In [None]:
# print(soup.prettify()) # html contnet

In [None]:
# display product name

product_name = soup.find("span", id = "productTitle").text.strip()

In [None]:
print(product_name)

OnePlus Nord 5 | Snapdragon 8s Gen 3 | Stable 144FPS Gaming | Dual 50MP Flagship Camera | Powered by OnePlus AI | 256GB 8GB | Dry Ice


In [None]:
# display product price

product_price = soup.find("span", class_ = "a-price-whole").text.strip(".") # strip remove unwanted space

In [None]:
print(product_price)

33,999


In [None]:
# display product rating

product_rating = soup.find("span", class_ = "a-size-small a-color-base").text.strip()

In [None]:
print(product_rating)

4.4


In [None]:
# display product reviews

product_review = soup.find("span", class_ = "aui-primitive __SAR2l0zNyyuZ").text.strip()

In [None]:
print(product_review)

Customers find this phone to be a good mid-range device with an awesome rear camera and fast charging capabilities. The display receives positive feedback, with one customer highlighting its stunning AMOLED 144Hz display, and customers appreciate its buttery smooth operation. While some customers report no heating issues, others mention slight heating problems during gaming at 90 fps.


In [None]:
# display product description

product_description = soup.find("li", class_ = "a-spacing-mini").text

In [None]:
print(product_description)

 Flagship Performance with Snapdragon(TM) 8s Gen 3: Couple this with the latest LPDDR5X RAM and segment-leading VC cooling (7300mm2), enjoy unprecedented 144 FPS BGMI and CODM steady-smooth gaming for a cool 5 hours.  


In [None]:
# display product top review

product_top_review = soup.find("div", class_ = "a-expander-content reviewText review-text-content a-expander-partial-collapse-content").text.strip()

In [None]:
product_top_review

'OnePlus Nord 5Purchased on: 20 January 2026Best Display:The display is excellent with vibrant colors and smooth performance. Great for watching videos and gaming.Best Camera:Camera quality is amazing. Photos are sharp, clear, and perform very well in both daylight and low-light.Best Battery Backup:Strong battery life that easily lasts a full day. Fast charging works perfectly and is very convenient.Smooth & Fast Performance:Very smooth overall performance with no lag. Handles multitasking and daily use effortlessly.Premium Design:Sleek, stylish, and comfortable to hold. Looks premium.Best Overall Phone:Excellent combination of display, camera, battery, and performance.Final Verdict:OnePlus Nord 5 is a complete all-rounder and totally worth buying. Very satisfied with this phone! ⭐⭐⭐⭐⭐'

In [None]:
# creating csv file

with open("Amazon Product.csv", mode = "w", newline = "", encoding = "utf-8") as file:
    writer = csv.writer(file)
    writer.writerow(["product_name", "product_price", "product_rating", "product_review", "product_description", "product_top_review"])
    writer.writerow([product_name, product_price, product_rating, product_review, product_description, product_top_review])
print("data saved")

data saved


## **Conclusion**

**BeautifulSoup** is a powerful and easy-to-use Python library for web scraping that helps extract useful data from HTML and XML webpages. It allows users to parse webpage content, locate specific elements, and retrieve information like text, links, images, prices, and reviews efficiently.