# Object:
**To scrape the laptops that are available on the flipkart search page inlcuding their names, prices, RAM, Secondary Storage and other impotant specifications. The scraped data will then be converted into the pandas dataframe to better visualize and get insights from it.**


# Tools used: 
**Python, Beautiful Soup, Pandas, requests**

Import necessary libraries and packages

In [13]:
from bs4 import BeautifulSoup as bs         # for parsing and extracting the data from the web page
import requests                             # to make a get request to a web page
import pandas as pd                         # for creating dataframes and vector algebra

Get the web page through the specified address

In [None]:
destUrl = "https://www.flipkart.com/search?q=laptop&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off"
"""
The following is the method of requesting the web page using a proxy. Proxy is an intermediate server
 that responds to the client's requests on behalf of the server and performs actions like scheduling, routing.
webPage = requests.get(destUrl, proxies={"http": proxy, "https": proxy})

"""
webPage = requests.get(destUrl)             # make a get request to the URL
webPage.content                             # get the content of the page

Parse the data

In [None]:
parsedData = bs(webPage.content, 'html.parser')             # parse the data
parsedData.prettify()                                       # perform formatting of the data

Traverse the parsed data and get different elements based on HTML tags

In [18]:
products = []                      # to store names of each product
prices = []                           # to store names of each product
ratings = []                         # to store names of each product
processors = []                   # to store names of each product
RAMs = []                           # to store names of each product
operatingSystems = []        # to store names of each product
secondaryStorages = []      # to store names of each product

# function to get all the products from the parsed data
def getProducts(parsedData):
    
    # traverse the elements of parsed text
    for elements in parsedData.findAll('div',class_='_3pLy-c row'):
        name = elements.find('div', attrs={'class':'_4rR01T'})
        price = elements.find('div', attrs={'class':'_30jeq3 _1_WHN1'})
        rating = elements.find('div', attrs={'class':'_3LWZlK'})
        
        # get specification of the product
        specification = elements.find('div', attrs={'class':'fMghEO'})
        
        # traverse each element of the specification
        for element in specification:
            col = element.find_all('li', attrs={'class':'rgWa7D'})
            processor = col[0].text
            RAM = col[1].text
            os = col[2].text
            secondaryStorage = col[3].text

        # append each element into the list
        products.append(name.text)
        prices.append(price.text)
        processors.append(processor)
        RAMs.append(RAM)
        operatingSystems.append(os)
        secondaryStorages.append(secondaryStorage) 

#print the first product details
def printFirstProduct():
    print("Product: ", products[0])
    print("Price: ", prices[0])
    print("Processor: ", processors[0])
    print("Operating System: ", operatingSystems[0])
    print("RAM: ", RAMs[0])
    print("Secondary Storage:", secondaryStorages[0])

getProducts(parsedData)
printFirstProduct()

Product:  acer Aspire 5 Core i5 11th Gen - (8 GB/1 TB HDD/256 GB SSD/Windows 10 Home) A515-56 Thin and Light Lap...
Price:  ₹49,999
Processor:  Intel Core i5 Processor (11th Gen)
Operating System:  64 bit Windows 10 Operating System
RAM:  8 GB DDR4 RAM
Secondary Storage: 1 TB HDD|256 GB SSD


Create pandas dataframe

In [19]:
# create data structure to convert the data into the pandas dataframe
data = {'Product': [product for product in products],
        'Price':[price for price in prices],
        'CPU':[processor for processor in processors],
        'Operating System':[os for os in operatingSystems],
        'RAM':[RAM for RAM in RAMs],
        'Secondary Storage':[secondaryStorage for secondaryStorage in secondaryStorages]}

# convert into the pandas dataframe
dataframe = pd.DataFrame(data)

#print top 5 rows of the dataframe
dataframe.head()

Unnamed: 0,Product,Price,CPU,Operating System,RAM,Secondary Storage
0,acer Aspire 5 Core i5 11th Gen - (8 GB/1 TB HD...,"₹49,999",Intel Core i5 Processor (11th Gen),64 bit Windows 10 Operating System,8 GB DDR4 RAM,1 TB HDD|256 GB SSD
1,MSI GF63 Thin Core i5 10th Gen - (8 GB/1 TB HD...,"₹49,990",Intel Core i5 Processor (10th Gen),64 bit Windows 10 Operating System,8 GB DDR4 RAM,1 TB HDD
2,acer Aspire 7 Core i5 10th Gen - (8 GB/512 GB ...,"₹49,990",Free upgrade to Windows 11 when available,8 GB DDR4 RAM,Intel Core i5 Processor (10th Gen),64 bit Windows 10 Operating System
3,ASUS VivoBook 15 (2021) Core i3 11th Gen - (4 ...,"₹34,990",Intel Core i3 Processor (11th Gen),Windows 10 Operating System,4 GB DDR4 RAM,256 GB SSD
4,ASUS TUF Gaming F15 Core i5 10th Gen - (8 GB/5...,"₹59,990",Intel Core i5 Processor (10th Gen),64 bit Windows 10 Operating System,8 GB DDR4 RAM,512 GB SSD


Convert the database into a CSV file which acts as a database

In [20]:
dataframe.to_csv("/content/flipkartData.csv")