# WEB SCRAPING WITH BEAUTIFUL SOUP

Web scraping is the process of extracting data from websites, and Python provides powerful libraries 
like Beautiful Soup 4 that make it easier to scrape and parse HTML and XML content.

Imports

In [48]:
from bs4 import BeautifulSoup
import requests
import pandas as pd

We are collecting data of computer peripherals from "MD Computers" site using Beautiful Soup 4

Enter the item which you wanted to collect information.

In [49]:
search_item = input("Which item do you want to search? ")

From the dynamic url of the website we are formatting the search result to the url.

In [50]:
url = f"https://mdcomputers.in/index.php?submit_search=&route=product%2Fsearch&&search={search_item}"

Parsing the html content of the page.

In [51]:
page_ = requests.get(url).text
doc = BeautifulSoup(page_, "html.parser")

Finding the total number of result pages by filtering the html tags through class name

In [52]:
#total_pages_x = doc.find(class_="col-sm-6 text-right")
total_pages_x = doc.find(class_="col-md-6 text-md-end mb-2")


In [53]:
total_pages_x.text.split("(")[1].split(" ")[0]

'8'

In [54]:
total_pages = int(total_pages_x.text.split("(")[1].split(" ")[0])

In [55]:
total_pages

8

Total page results obtained for the search

In [56]:
total_pages 

8

We are going to gather the item's name , its link , its price (old and new) below

Creating empty lists for the data

In [111]:
link = []
item_name = []
new_price = []
old_price = []

Here is the main code which scrape the information from the html code

From this url , we can search for the products we need , through iterating the pages till the total page results we obtained earlier.

In [112]:
for page in range(1, total_pages + 1):
    url = f"https://mdcomputers.in/index.php?route=product/search&page={page}&search={search_item}"
    print(url)
    page_ = requests.get(url).text
    doc = BeautifulSoup(page_, "html.parser")
    
    items = doc.find_all(class_ = "product-wrapper") #Finding the main div tag for each product which holds the data. 
    print(items)
    for item in items:
        
        link.append((item.find("h3")).find("a")['href']) #finding the link through href attribute
        item_name.append(((item.find("h3")).find("a")).text) #finding the name in the h3 tag
        old_price.append(((item.find("span",class_ = "del")).find("span",class_ = "amount")).text[1:]) #finding the old price
        new_price.append(((item.find("span",class_ = "ins")).find("span",class_ = "amount")).text[1:])
        #try:
            #In some Products, there is no old price so in order to clear the error , finding the old price in this try block
        #    old_price.append(((item.find("div",class_ = "price")).find("span",class_ = "price-old")).string[1:])    
        #except:
        #    #For the products having no old price , it is declared as null
        #    old_price.append(None)
    

https://mdcomputers.in/index.php?route=product/search&page=1&search=mouse
[<div class="product-wrapper"><div class="product-element-top product-quick-shop"><a class="product-image-link" href="https://mdcomputers.in/product/asus-tuf-gaming-m4-wireless-mouse"><div class="product-labels labels-rectangular"><span class="onsale product-label">-48%</span></div><img alt="Asus TUF Gaming M4 Wireless Gaming Mouse (Black)" class="attachment-large size-large" decoding="async" fetchpriority="low" loading="lazy" src="https://mdcomputers.in/cdn-cgi/image/width=500,height=500,quality=75/image/catalog/mouse/asus/tuf-gaming-m4-wireless/tuf-gaming-m4-wireless-image-main.png"/></a><div class="wrapp-buttons"><div class="product-buttons"><div class="product-add-btn product-action-btn product-style-icon product-add-cart-icon"><button class="button product_type_simple add_to_cart_button ajax_add_to_cart add-to-cart-loop" onclick="cart.add('20099');"><span>Add to Cart</span><i class="bi bi-cart-plus"></i></bu

In [113]:
len(item_name)

145

In [114]:
len(link)

145

In [115]:
sorted(item_name)

['ANT ESPORTS KM550 Keyboard and Mouse Combo',
 'ANT ESPORTS Thunder 10 RGB Keyboard and Mouse Combo with Brown Switches',
 'ANT ESPORTS Thunder 30 RGB Keyboard and Mouse Combo',
 'ASUS ROG Keris II ACE Wireless RGB Gaming Mouse',
 'ASUS ROG Keris II ACE Wireless RGB Moonlight White Gaming Mouse',
 'ASUS ROG Strix Impact III Wireless RGB Gaming Mouse',
 'AVerMedia Elena Chan Mouse Pad (Large)',
 'Adata XPG Battleground L Exoskeleton Totem Edition Gaming Mouse Pad (Large)',
 'Adata XPG Battleground L Gaming Mouse Pad (Large)',
 'Adata XPG Infarex M20 RGB',
 'Adata XPG Primer RGB Gaming Mouse (Black)',
 'Adata XPG Slingshot RGB Gaming Mouse',
 'Ant Esports AEC410 Type-C To USB Hub',
 'Ant Esports Cherry Storm Gaming Mousepad - Cherry Red',
 'Ant Esports GM320 RGB Gaming Mouse (Black)',
 'Ant Esports GM325 PRO Mini Wireless Gaming Mouse',
 'Ant Esports GM330 RGB Gaming Mouse (Black)',
 'Ant Esports GM333 Gaming Mouse (Black-Red-Yellow)',
 'Ant Esports GM333 Gaming Mouse (Blue-White-Yellow

Converting the lists to Dictionary and then to DataFrame.

In [126]:
my_dict = {'item_name': item_name , 'link': link , 'new_price': new_price , 'old_price': old_price}

In [123]:
len(old_price)

145

In [127]:
for key in my_dict:
    print(len(my_dict[key]))

145
145
145
145


In [128]:
df = pd.DataFrame(my_dict)

Here is our DataFrame with required data

In [129]:
df

Unnamed: 0,item_name,link,new_price,old_price
0,Asus TUF Gaming M4 Wireless Gaming Mouse (Black),https://mdcomputers.in/product/asus-tuf-gaming...,"₹3,639",6999
1,Razer Deathadder Essential White Wired Gaming ...,https://mdcomputers.in/product/razer-deathadde...,"₹1,199",4499
2,Logitech G502 X Plus Lightspeed RGB Wireless G...,https://mdcomputers.in/product/logitech-g502-x...,"₹12,440",14700
3,Logitech G502 X Gaming Mouse (White),https://mdcomputers.in/product/logitech-g502-x...,"₹5,980",7995
4,Logitech MK200 Keyboard and Mouse Combo,https://mdcomputers.in/product/logitech-mk200-...,"₹1,080",1600
...,...,...,...,...
140,Ant Esports KM550 Pro Wireless Keyboard and Mo...,https://mdcomputers.in/product/ant-esports-km5...,"₹1,480",3999
141,HP M310 Dual Mode Wireless Mouse,https://mdcomputers.in/product/hp-m310-dual-mo...,₹700,1799
142,Cosmic Byte Atlas Tri-Mode Wireless Gaming Mouse,https://mdcomputers.in/product/cosmic-byte-atl...,"₹2,000",2999
143,Cosmic Byte Atlas Tri-Mode White Wireless Gami...,https://mdcomputers.in/product/cosmic-byte-atl...,"₹2,000",2999


Converting the Dataframe to excel document with file name as the item we searched 

In [130]:
df.to_excel(f'{search_item}_list_mdcomputers.xlsx')

Finally we scraped the data from website(MD Computers) successfully through BeautifulSoup4.