<h1 align="center"> WEB SCRAPING USING PYTHON </h1>
<h4 align="center"> (Scraping Top 50 Best-Selling Electronic Items) </h4>

#### Importing Required Libraries

In [1]:
from bs4 import BeautifulSoup

import lxml

import requests

import pandas as pd
import numpy as np

#### Make HTTP requests to get the data from the website

In [2]:
html_text = requests.get('https://www.amazon.in/gp/bestsellers/electronics').text

#### ***Beautiful Soup*** is a Python library that is used for web scraping purposes to pull the data out of HTML and XML files. It creates a parse tree from page source code that can be used to extract data in a hierarchical and more readable manner. 

In [3]:
soup = BeautifulSoup(html_text,'lxml')

#### *Get model Name, Rating, Offer Type, Current Price, Actual Price and URL to the Page using HTML tags and Class* (Try first with "find" if the data is exactly what you want then to extract entire data use "find_all")
Whenever you see ***soup.find***, it means we are trying to find an element of the page using its HTML tag (like div, or span, etc.) and/or attributes (name, id, class, etc.)

In [42]:
## For trail  
electronic = soup.find('span', class_ ="a-list-item")
name_c = electronic.find('div', class_="p13n-sc-truncate p13n-sc-line-clamp-3").text.strip()

current_price_c = electronic.find('span', class_ ="p13n-sc-price").text
more_info_c = 'https://www.amazon.in' + electronic.a['href']


# to prevent script from crashing when there is no rating for the product
try:
    rating_c = float(electronic.find('span', class_="a-icon-alt").text.split()[0])
except:
    rating_c = 'NA'
try:
    users_rated_c = int(electronic.find('span', class_="a-size-small a-color-secondary").text)
except:
    users_rated_c = 0

In [43]:
print(name_c)
print(rating_c)
print(current_price_c)
print(more_info_c)
print(users_rated_c)


OnePlus Nord 2 5G (Blue Haze, 8GB RAM, 128GB Storage) I Extra upto Rs.1000 off on Exchange
NA
₹29,999.00
https://www.amazon.in/OnePlus-128GB-Storage-Rs-1000-Exchange/dp/B097RD2JX8?_encoding=UTF8&psc=1
0


#### Extract entire data using 'find_all' and run a FOR LOOP to get the data for all the mobile phones

>***Final Execution:*** Since most of my values are coming true for the first product. Now, I can run *find_all* to extract the tags of all the products in Bestseller List

In [90]:
electronics_dataset = soup.find_all('span', class_ ="a-list-item")

In [83]:
## Created empty list for the variables we want to extract
Name = []
Rating = []
Price = []
Users_rated = []
More_info = []

###### Running FOR LOOP and appending the variables with the data

In [84]:

for i in electronics_dataset :
    try:
        name = i.find('div', class_="p13n-sc-truncate p13n-sc-line-clamp-3").text.strip()
    except:
        name = i.find('div', class_="p13n-sc-truncate p13n-sc-line-clamp-2").text.strip()
    try:
        price = float(i.find('span', class_ ="p13n-sc-price").text.replace(",","")[1:])
    except:
        price = 'NA'
    more_info = 'https://www.amazon.in' + i.a['href']
    try:
        rating = float(i.find('span', class_="a-icon-alt").text.split()[0])
    except:
        rating = 'NA'
    
    try:
        users_rated = int(i.find('a', class_="a-size-small a-link-normal").text.replace(",",""))
    except:
        users_rated = 0
    Name.append(name)
    Price.append(price)
    Rating.append(rating)
    Users_rated.append(users_rated)
    More_info.append(more_info)

###### Creating Dictionary out of the Variables

In [85]:
Electronic_dict = {'Name': Name, 'Price': Price, 'Ratings':Rating, 'No. of Users Rated': Users_rated, 'URL': More_info}
Electronic_dict

{'Name': ['OnePlus Nord 2 5G (Blue Haze, 8GB RAM, 128GB Storage) I Extra upto Rs.1000 off on Exchange',
  'OnePlus Nord 2 5G (Gray Sierra, 8GB RAM, 128GB Storage) I Extra upto Rs.1000 off on Exchange',
  'Redmi Note 10 (Aqua Green, 4GB RAM, 64GB Storage) -Amoled Dot Display | 48MP Sony Sensor IMX582 | Snapdragon 678 Processor',
  'Redmi 9 (Carbon Black, 4GB RAM, 64GB Storage) | Extra INR 200 Amazon Pay Cashback | 2.3GHz Mediatek Helio G35 Octa core Processor',
  'Redmi 9A (Nature Green, 2GB RAM, 32GB Storage) | 2GHz Octa-core Helio G25 Processor | 5000 mAh Battery',
  'Redmi 9A (Sea Blue 2GB RAM 32GB Storage) | 2GHz Octa-core Helio G25 Processor | 5000 mAh Battery',
  'Redmi 9A (Midnight Black 2GB RAM 32GB Storage) | 2GHz Octa-core Helio G25 Processor | 5000 mAh Battery',
  'Redmi Note 10S (Deep Sea Blue, 6GB RAM, 64GB Storage) -Super Amoled Display | 64 MP Quad Camera | 6 Month Free Screen Replacement (Prime only)|Extra INR 1000 Off Through Coupons',
  'Samsung Galaxy M31s (Mirage Blu

###### Creating Dataframe from the above dictionary

In [86]:
df = pd.DataFrame(Electronic_dict, index=range(1,51,1))

#### TOP 50 Best-Selling Electronic Items

In [87]:
df

Unnamed: 0,Name,Price,Ratings,No. of Users Rated,URL
1,"OnePlus Nord 2 5G (Blue Haze, 8GB RAM, 128GB S...",29999.0,,0,https://www.amazon.in/OnePlus-128GB-Storage-Rs...
2,"OnePlus Nord 2 5G (Gray Sierra, 8GB RAM, 128GB...",29999.0,,0,https://www.amazon.in/OnePlus-Sierra-Storage-R...
3,"Redmi Note 10 (Aqua Green, 4GB RAM, 64GB Stora...",12999.0,4.1,71842,https://www.amazon.in/Test-Exclusive_2020_1140...
4,"Redmi 9 (Carbon Black, 4GB RAM, 64GB Storage) ...",8999.0,4.2,66529,https://www.amazon.in/Redmi-Carbon-Black-64GB-...
5,"Redmi 9A (Nature Green, 2GB RAM, 32GB Storage)...",6799.0,4.2,69234,https://www.amazon.in/Redmi-9A-2GB-32GB-Storag...
6,Redmi 9A (Sea Blue 2GB RAM 32GB Storage) | 2GH...,6799.0,4.2,69234,https://www.amazon.in/Redmi-9A-2GB-32GB-Storag...
7,Redmi 9A (Midnight Black 2GB RAM 32GB Storage)...,6799.0,4.2,69234,https://www.amazon.in/Redmi-9A-Midnight-2GB-32...
8,"Redmi Note 10S (Deep Sea Blue, 6GB RAM, 64GB S...",14999.0,4.0,5413,https://www.amazon.in/Redmi-Note-Deep-Blue-Sto...
9,"Samsung Galaxy M31s (Mirage Blue, 6GB RAM, 128...",,4.3,78853,https://www.amazon.in/Samsung-Galaxy-Mirage-12...
10,"Echo Dot (3rd Gen, Black) + Wipro 9W LED Smart...",3599.0,4.4,62329,https://www.amazon.in/Black-Wipro-Smart-Color-...


###### Store Data into a ***.csv*** file

In [88]:
df.to_csv('Amazon_top50_BestSelling_Electronics_Items.csv')

 <h3 align= "center"> END </h3>