# Web Scraping
Data scraping is one of the most used ways to collect data. In simple terms it means, to get HTML code for a webpage and scan it for data.

### Beautiful Soup is the most used package for scanning/scraping data.

In this notebook we'll see how to use Beautiful Soup and get a set of reviews and its associated metadata posted by the customers on its website for 2 of the headphones and create a dataset out of it.


## Let's Get started:

### Importing modules
Request Module is used to get the HTML code for the URL given.

In [1]:
# libraries for Web Scrapping
import requests
from bs4 import BeautifulSoup

# library for advance string manipulation
import string

# library for data manipulation
import pandas as pd

# library for advance mathematical operations
import numpy as np

# importing os module 
import os 

# libraries for data visualization
import matplotlib.pyplot as plt
import seaborn as sns

## Extract the html data from the webpage

In [2]:
# extract the data fro page number 1 only
page_number = 8

# extract the page as a whole
page = requests.get("https://www.flipkart.com/search?q=laptop&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off&sort=popularity".format(page_number))

# parse the whole page using a html parser
soup = BeautifulSoup(page.content, 'html.parser')
# soup

# Find the main class that contains the group items

In [3]:
# find_all() searches for the all the tags where the class is found and 
# returns a list of all the occurences 
all_laps = soup.findAll('a', href=True, attrs={'class':"_1fQZEK"})

# find the length of the elements extracted
len(all_laps)

24

In [4]:
for a in all_laps:
    name = all_laps[7].find('ul', attrs={'class':'_1xgFaf'})
    sp = name.findAll(class_="rgWa7D")
    for i in range(len(sp)):
        if('Processor' in sp[i].text  and 'Processor:' not in sp[i].text):
            processor = sp[i].text
            #print(processor)

        elif('RAM' in sp[i].text and 'RAM &' not in sp[i].text):
            ram = sp[i].text
            #print(ram)

        elif('Operating' in sp[i].text):
            os = sp[i].text
            #print(os)

        elif('SSD' in sp[i].text or 'HDD' in sp[i].text):
            storage = sp[i].text
            #print(storage)
            
    

In [5]:
data = []

for a in all_laps:
    
    processor = None
    ram = None
    os = None
    storage = None
    
    name = a.find('div', attrs={'class':'_4rR01T'})
    ratings = a.find('div', attrs={'class':'_3LWZlK'})
    num_ratings = a.find('span', attrs={'class':'_2_R_DZ'})
    print(name.text)
    price = a.find(class_='_30jeq3 _1_WHN1')
    org_price = a.find(class_='_3I9_wc _27UcVY')
    print(price.text)
    
    allspecs = a.find('ul', attrs={'class':'_1xgFaf'})
    sp = allspecs.findAll(class_="rgWa7D")

    for i in range(len(sp)):
        if('Processor' in sp[i].text  and 'Processor:' not in sp[i].text):
            processor = sp[i]
           
        elif('RAM' in sp[i].text and 'RAM &' not in sp[i].text):
            ram = sp[i]
            
        elif('Operating' in sp[i].text):
            os = sp[i]
            
        elif('SSD' in sp[i].text or 'HDD' in sp[i].text):
            storage = sp[i].text
            print(storage)
            
    data.append({'name': name.text,
                 'ratings': ratings.text if ratings is not None else '',
                 'num_ratings': num_ratings.text if num_ratings is not None else '',
                'processor': processor.text,
                'ram': ram.text,
                'os': os.text,
                'storage': storage if storage is not None else '',
                'price': price.text,
                'org_price': org_price.text if org_price is not None else ''})
    print('_______________')

ASUS VivoBook 15 (2022) Core i3 10th Gen - (8 GB/512 GB SSD/Windows 11 Home) X515JA-EJ362WS | X515JA-E...
₹32,990
512 GB SSD
_______________
HP Pavilion Ryzen 5 Hexa Core AMD R5-5600H - (8 GB/512 GB SSD/Windows 10/4 GB Graphics/NVIDIA GeForce ...
₹49,990
512 GB SSD
_______________
Lenovo Intel Celeron Dual Core - (8 GB/256 GB SSD/Windows 11 Home) 81WQ00MQIN|81WQ00NXIN Laptop
₹28,490
256 GB SSD
_______________
HP 14s Intel Core i3 11th Gen - (8 GB/256 GB SSD/Windows 11 Home) 14s - dy2507TU Thin and Light Laptop
₹35,990
256 GB SSD
_______________
acer Aspire 3 Ryzen 3 Dual Core 3250U - (8 GB/256 GB SSD/Windows 11 Home) A315-23 Laptop
₹26,990
256 GB SSD
_______________
Lenovo V15 G2 Core i3 11th Gen - (8 GB/512 GB SSD/Windows 11 Home) V15 ITL G2 Laptop
₹33,599
512 GB SSD
_______________
Lenovo V15 G2 Core i3 11th Gen - (8 GB/1 TB HDD/256 GB SSD/Windows 11 Home) V15 ITL G2 Laptop
₹35,499
1 TB HDD|256 GB SSD
_______________
DELL Vostro Core i3 11th Gen - (8 GB/1 TB HDD/256 GB SSD/Windows 11

In [6]:
data

[{'name': 'ASUS VivoBook 15 (2022) Core i3 10th Gen - (8 GB/512 GB SSD/Windows 11 Home) X515JA-EJ362WS | X515JA-E...',
  'ratings': '4.3',
  'num_ratings': '9,965 Ratings\xa0&\xa01,000 Reviews',
  'processor': 'Intel Core i3 Processor (10th Gen)',
  'ram': '8 GB DDR4 RAM',
  'os': '64 bit Windows 11 Operating System',
  'storage': '512 GB SSD',
  'price': '₹32,990',
  'org_price': '₹45,990'},
 {'name': 'HP Pavilion Ryzen 5 Hexa Core AMD R5-5600H - (8 GB/512 GB SSD/Windows 10/4 GB Graphics/NVIDIA GeForce ...',
  'ratings': '4.4',
  'num_ratings': '11,810 Ratings\xa0&\xa01,198 Reviews',
  'processor': 'AMD Ryzen 5 Hexa Core Processor',
  'ram': '8 GB DDR4 RAM',
  'os': '64 bit Windows 10 Operating System',
  'storage': '512 GB SSD',
  'price': '₹49,990',
  'org_price': '₹63,539'},
 {'name': 'Lenovo Intel Celeron Dual Core - (8 GB/256 GB SSD/Windows 11 Home) 81WQ00MQIN|81WQ00NXIN Laptop',
  'ratings': '4.1',
  'num_ratings': '366 Ratings\xa0&\xa035 Reviews',
  'processor': 'Intel Celeron 

In [7]:
df = pd.DataFrame.from_dict(data) 
df 

Unnamed: 0,name,ratings,num_ratings,processor,ram,os,storage,price,org_price
0,ASUS VivoBook 15 (2022) Core i3 10th Gen - (8 ...,4.3,"9,965 Ratings & 1,000 Reviews",Intel Core i3 Processor (10th Gen),8 GB DDR4 RAM,64 bit Windows 11 Operating System,512 GB SSD,"₹32,990","₹45,990"
1,HP Pavilion Ryzen 5 Hexa Core AMD R5-5600H - (...,4.4,"11,810 Ratings & 1,198 Reviews",AMD Ryzen 5 Hexa Core Processor,8 GB DDR4 RAM,64 bit Windows 10 Operating System,512 GB SSD,"₹49,990","₹63,539"
2,Lenovo Intel Celeron Dual Core - (8 GB/256 GB ...,4.1,366 Ratings & 35 Reviews,Intel Celeron Dual Core Processor,8 GB DDR4 RAM,64 bit Windows 11 Operating System,256 GB SSD,"₹28,490","₹40,490"
3,HP 14s Intel Core i3 11th Gen - (8 GB/256 GB S...,4.2,"1,539 Ratings & 136 Reviews",Intel Core i3 Processor (11th Gen),8 GB DDR4 RAM,64 bit Windows 11 Operating System,256 GB SSD,"₹35,990","₹47,206"
4,acer Aspire 3 Ryzen 3 Dual Core 3250U - (8 GB/...,4.2,247 Ratings & 53 Reviews,AMD Ryzen 3 Dual Core Processor,8 GB DDR4 RAM,64 bit Windows 11 Operating System,256 GB SSD,"₹26,990","₹42,999"
5,Lenovo V15 G2 Core i3 11th Gen - (8 GB/512 GB ...,4.1,14 Ratings & 1 Reviews,Intel Core i3 Processor (11th Gen),8 GB DDR4 RAM,64 bit Windows 11 Operating System,512 GB SSD,"₹33,599","₹59,760"
6,Lenovo V15 G2 Core i3 11th Gen - (8 GB/1 TB HD...,3.9,10 Ratings & 1 Reviews,Intel Core i3 Processor (11th Gen),8 GB DDR4 RAM,64 bit Windows 11 Operating System,1 TB HDD|256 GB SSD,"₹35,499","₹60,120"
7,DELL Vostro Core i3 11th Gen - (8 GB/1 TB HDD/...,4.1,32 Ratings & 1 Reviews,Intel Core i3 Processor (11th Gen),8 GB DDR4 RAM,Windows 11 Operating System,1 TB HDD|256 GB SSD,"₹39,990","₹58,489"
8,ASUS TUF Gaming A17 with 90Whr Battery Ryzen 5...,4.4,"1,340 Ratings & 166 Reviews",AMD Ryzen 5 Hexa Core Processor,8 GB DDR4 RAM,64 bit Windows 11 Operating System,512 GB SSD,"₹52,990","₹71,990"
9,ASUS ROG Strix G15 (2021) Ryzen 7 Octa Core AM...,4.5,"2,015 Ratings & 256 Reviews",AMD Ryzen 7 Octa Core Processor,8 GB DDR4 RAM,64 bit Windows 10 Operating System,512 GB SSD,"₹75,490","₹90,990"


In [8]:
# extract the data for page number 1 only
page_number = 58
data = []

for i in range(1, page_number):

    # extract the page as a whole
    page = requests.get("https://www.flipkart.com/search?q=laptop&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off&sort=popularity&page={}".format(i))

    # parse the whole page using a html parser
    soup = BeautifulSoup(page.content, 'html.parser')
    # soup

    # find_all() searches for the all the tags where the class is found and 
    # returns a list of all the occurences 
    all_laps = soup.findAll('a', href=True, attrs={'class':"_1fQZEK"})

    # find the length of the elements extracted
    len(all_laps)
    
    for a in all_laps:
        
        processor = None
        ram = None
        os = None
        storage = None

        name = a.find('div', attrs={'class':'_4rR01T'})
        ratings = a.find('div', attrs={'class':'_3LWZlK'})
        num_ratings = a.find('span', attrs={'class':'_2_R_DZ'})

        price = a.find(class_='_30jeq3 _1_WHN1')
        org_price = a.find(class_='_3I9_wc _27UcVY')


        allspecs = a.find('ul', attrs={'class':'_1xgFaf'})
        sp = allspecs.findAll(class_="rgWa7D")

        for i in range(len(sp)):
            if('Processor' in sp[i].text  and 'Processor:' not in sp[i].text):
                processor = sp[i].text

            elif('RAM' in sp[i].text and 'RAM &' not in sp[i].text):
                ram = sp[i].text

            elif('Operating' in sp[i].text):
                os = sp[i].text

            elif('SSD' in sp[i].text or 'HDD' in sp[i].text):
                storage = sp[i].text

        data.append({'name': name.text,
                     'ratings': ratings.text if ratings is not None else '',
                     'num_ratings': num_ratings.text if num_ratings is not None else '',
                    'processor': processor,
                    'ram': ram,
                    'os': os,
                    'storage': storage if storage is not None else '',
                    'price': price.text,
                    'org_price': org_price.text if org_price is not None else ''})

In [9]:
data

[{'name': 'ASUS VivoBook 15 (2022) Core i3 10th Gen - (8 GB/512 GB SSD/Windows 11 Home) X515JA-EJ362WS | X515JA-E...',
  'ratings': '4.3',
  'num_ratings': '9,965 Ratings\xa0&\xa01,000 Reviews',
  'processor': 'Intel Core i3 Processor (10th Gen)',
  'ram': '8 GB DDR4 RAM',
  'os': '64 bit Windows 11 Operating System',
  'storage': '512 GB SSD',
  'price': '₹32,990',
  'org_price': '₹45,990'},
 {'name': 'HP Pavilion Ryzen 5 Hexa Core AMD R5-5600H - (8 GB/512 GB SSD/Windows 10/4 GB Graphics/NVIDIA GeForce ...',
  'ratings': '4.4',
  'num_ratings': '11,810 Ratings\xa0&\xa01,198 Reviews',
  'processor': 'AMD Ryzen 5 Hexa Core Processor',
  'ram': '8 GB DDR4 RAM',
  'os': '64 bit Windows 10 Operating System',
  'storage': '512 GB SSD',
  'price': '₹49,990',
  'org_price': '₹63,539'},
 {'name': 'Lenovo Intel Celeron Dual Core - (8 GB/256 GB SSD/Windows 11 Home) 81WQ00MQIN|81WQ00NXIN Laptop',
  'ratings': '4.1',
  'num_ratings': '366 Ratings\xa0&\xa035 Reviews',
  'processor': 'Intel Celeron 

In [10]:
df = pd.DataFrame.from_dict(data) 
df 

Unnamed: 0,name,ratings,num_ratings,processor,ram,os,storage,price,org_price
0,ASUS VivoBook 15 (2022) Core i3 10th Gen - (8 ...,4.3,"9,965 Ratings & 1,000 Reviews",Intel Core i3 Processor (10th Gen),8 GB DDR4 RAM,64 bit Windows 11 Operating System,512 GB SSD,"₹32,990","₹45,990"
1,HP Pavilion Ryzen 5 Hexa Core AMD R5-5600H - (...,4.4,"11,810 Ratings & 1,198 Reviews",AMD Ryzen 5 Hexa Core Processor,8 GB DDR4 RAM,64 bit Windows 10 Operating System,512 GB SSD,"₹49,990","₹63,539"
2,Lenovo Intel Celeron Dual Core - (8 GB/256 GB ...,4.1,366 Ratings & 35 Reviews,Intel Celeron Dual Core Processor,8 GB DDR4 RAM,64 bit Windows 11 Operating System,256 GB SSD,"₹28,490","₹40,490"
3,HP 14s Intel Core i3 11th Gen - (8 GB/256 GB S...,4.2,"1,539 Ratings & 136 Reviews",Intel Core i3 Processor (11th Gen),8 GB DDR4 RAM,64 bit Windows 11 Operating System,256 GB SSD,"₹35,990","₹47,206"
4,acer Aspire 3 Ryzen 3 Dual Core 3250U - (8 GB/...,4.2,247 Ratings & 53 Reviews,AMD Ryzen 3 Dual Core Processor,8 GB DDR4 RAM,64 bit Windows 11 Operating System,256 GB SSD,"₹26,990","₹42,999"
...,...,...,...,...,...,...,...,...,...
931,Avita SATUS ULTIMUS Celeron Dual Core - (4 GB/...,3.9,646 Ratings & 89 Reviews,Intel Celeron Dual Core Processor,4 GB LPDDR4 RAM,64 bit Windows 11 Operating System,128 GB SSD,"₹17,490","₹29,990"
932,DELL Core i3 11th Gen - (8 GB/512 GB SSD/Windo...,,,Intel Core i3 Processor (11th Gen),8 GB DDR4 RAM,Windows 11 Operating System,512 GB SSD,"₹61,202",
933,acer Predator Helios 300 Core i9 12th Gen - (1...,4.6,105 Ratings & 16 Reviews,Intel Core i9 Processor (12th Gen),16 GB DDR5 RAM,64 bit Windows 11 Operating System,1 TB SSD,"₹1,59,990","₹1,89,990"
934,LG Gram Core i5 12th Gen - (8 GB/512 GB SSD/Wi...,,,Intel Core i5 Processor (12th Gen),8 GB DDR5 RAM,64 bit Windows 11 Operating System,512 GB SSD,"₹89,990","₹1,27,000"


In [12]:
df.to_csv('flipkart_laptop.csv', index=False)