# Web scraping - hp Laptops 

![alt text](laptop_EN.jpg)

**Web Scraping means extracting information from websites by parsing the HTML of the webpage.**

website used for scraping : https://store.hp.com/in-en/default/laptops-tablets.html

Libraries used:

- **requests** <br>
  Requests gets the web page for you

- **BeautifulSoup**<br>
  you need to parse the HTML from the page to retrieve the data. That is done by BeautifulSoup.

- **re**<br>
  regular expressions module used to make manipulations on data we got.

- **pandas**<br>
  To create the dataframe from the data we got by scraping and creating
  

  
### Importing the above modules

In [1]:
import requests
from bs4 import BeautifulSoup
import re
import pandas as pd

### Let's get started

Each Laptop product is rendered in the webpage as follow

![alt text](lappyt.jpeg)

In [2]:
prod_list=[]
def getData(products):
    '''
    Takes a List of products and scrapes the required specifications of each product and
    append them to the prod_list defined above.
    '''
    
    for product in products:
        
        #name 
        name=product.find('a',{"class":"product-item-link"}).text.strip()

        #rating of product is in div tag with attribute data-by-average-overall-rating
        rating=product.find('div',attrs={'data-bv-average-overall-rating' : True})['data-bv-average-overall-rating']

        #processor family
        processor=product.find('li',{'class':'processorfamily'})
        if processor is not None:
            processor=processor.text.strip()
        else:
            processor=None
            

        # processor name, generation, type
        if processor is not None:
            try:
                if processor.find('Intel')!=-1:
                    proc_company='Intel'
                    if processor.find('Core')!=-1:  # processor type and generation for Intel core processors
                        generation=re.findall(r'\d',processor)[0] 
                        proc_type=re.findall(r'i\d',processor)[0]
                    else:
                        temp=processor.split()
                        proc_type=' '.join(temp[1:-1]) # processor type for Intel pentium and other series processors

                elif processor.find('AMD')!=-1: # processor type for AMD processors
                    proc_company='AMD' 
                    temp=processor.split()
                    proc_type=temp[1]+' '+temp[2]
                    generation=None
                else:                     # processor type for other processors
                    proc_company=None
                    proc_type=None
                    generation=None
            except:
                print('something went wrong')
        else:
            proc_company=None
            proc_type=None
            generation=None
            

        # Os installed
        os_installed=product.find('li',{'class':'osinstalled'})
        if os_installed is not None:
            os_installed=os_installed.text
        else:
            os_installed=None
            

        # RAM
        ram=product.find('li',{'class':'memstdes_01'})
        if ram is not None:
            ram=ram.text
        else:
            ram=None

        #hard disk
        hd=product.find('li',{'class':'hd_01des'})
        if hd is not None:
            hd=hd.text
        else:
            hd=None

        
        # graphic card information is stored in <li> element with two different classes so multiple classes are used below
        if product.find('li',{'class':['graphicseg_01card_01','graphicseg_02card_01']}) is not None:
            graphic_card=product.find('li',{'class':['graphicseg_01card_01','graphicseg_02card_01']}).text
        else:
            graphic_card=None
            

        #display-type
        display_type=product.find('li',{'class':['display-displaydes']})
        if display_type is None:
            display_type=None
        else:
            display_type=display_type.text


        #price
        
        # actual price, discount and final price all stored in span tag with class price hence we get a list
        prices=product.find_all('span',{'class':'price'})    
        if len(prices)==3:
            aprice=prices[0].text[1:]
            fprice=prices[1].text[1:]
            dprice=prices[2].text[1:]
        else:
            aprice=fprice=prices[0].text[1:] 
            dprice='0'

        #included items
        inc_items=product.find_all('ul',{'class':'included'})

        items_list=[]

        if inc_items is not None:
            for i in inc_items:
                items_list=[item.text for item in i.find_all('li')]
                
        inc_items=','.join(items_list) # converting the list of included items into string seperated by commas

        print('Name:',name)
        print('rating:',rating)
        print("processor:",processor)
        print("processor_company:",proc_company)
        print("processor_type:",proc_type)
        print("generation:",generation)
        print("os_installed:",os_installed)
        print("ram:",ram)
        print("hard_disk:",hd)
        print("graphic_card:",graphic_card)
        print("display:",display_type)
        print("Actual_price:",aprice)
        print("final_price:",fprice)
        print("Discount:",dprice)
        print("included_items:",inc_items)
        
        
        # creating a dictionary with keys as specification names and values as their respective information
        prod_data={
            'Name':name,
            'rating':rating,
            'processor':processor,
            'processor_company':proc_company,
            'processor_type':proc_type,
            'generation':generation,
            'os_installed':os_installed,
            'ram':ram,
            'hard_disk':hd,
            'graphic_card':graphic_card,
            'display':display_type,
            'actual_price':aprice,
            'final_price':fprice,
            'discout':dprice,
            'included_items':inc_items
        }
        
        #Appending the each product to list
        prod_list.append(prod_data)
        
        # end of the product
        print('-'*50)
        
    # end of all products
    print('*'*50)

All the products in website is rendered across 6 pages so we set a loop and get the all products of each page at time
and call the getData function defined above

In [3]:
for page_count in range(1,7):
    url='https://store.hp.com/in-en/default/laptops-tablets.html?p='+str(page_count)+'&product_list_limit=30'
    data=requests.get(url) 
    data_soup=BeautifulSoup(data.text,'lxml')
    products=data_soup.find_all("div", {"class": "product-item-details"})
    getData(products)

Name: HP ENVY x360 - 13-ag0035au
rating: 3.6667
processor: AMD Ryzen™ 5 processor
processor_company: AMD
processor_type: Ryzen™ 5
generation: None
os_installed: Windows 10 Home Single Language 64
ram: 8 GB DDR4-2400 SDRAM (onboard)
hard_disk: 256 GB SSD
graphic_card: AMD Radeon™ Vega 8 Graphics
display: 13.3" FHD multitouch-enabled edge-to-edge glass (1920 x 1080)
Actual_price: 83,496
final_price: 72,990
Discount: 10,506
included_items: 
--------------------------------------------------
Name: HP Gaming Pavilion - 15-cx0140tx
rating: 5.0000
processor: 8th Generation Intel® Core™ i5 processor
processor_company: Intel
processor_type: i5
generation: 8
os_installed: Windows 10 Home Single Language 64
ram: 8 GB DDR4-2666 SDRAM (1 x 8 GB)
hard_disk: 1 TB 7200 rpm SATA
graphic_card: NVIDIA® GeForce® GTX 1050 (4 GB GDDR5 dedicated)
display: None
Actual_price: 86,476
final_price: 72,990
Discount: 13,486
included_items: HP Odyssey backpack (Worth ₹3,499),Microsoft Office Home and Student
-------

--------------------------------------------------
Name: HP EliteBook x360 1030 G3 Notebook PC
rating: 0.0000
processor: 8th Generation Intel® Core™ i7 processor
processor_company: Intel
processor_type: i7
generation: 8
os_installed: Windows 10 Pro 64
ram: 8 GB DDR4-2133 SDRAM (onboard)
hard_disk: 512 GB SSD
graphic_card: Intel® UHD Graphics 620
display: None
Actual_price: 176,627
final_price: 175,084
Discount: 1,543
included_items: HP Active Pen (Worth ₹3,145),HP Original Bag  (Worth ₹1,499) (#5DD44PA)
--------------------------------------------------
Name: HP Notebook - 15-db0186au
rating: 3.6667
processor: AMD Ryzen™ 3 processor
processor_company: AMD
processor_type: Ryzen™ 3
generation: None
os_installed: Windows 10 Home Single Language 64
ram: 4 GB DDR4-2400 SDRAM (1 x 4 GB)
hard_disk: 1 TB 5400 rpm SATA
graphic_card: AMD Radeon™ Vega 3 Graphics
display: None
Actual_price: 38,622
final_price: 33,080
Discount: 5,542
included_items: HP Original Laptop Bag (Worth ₹1,123),Microsoft O

Name: HP ProBook 430 G6 Notebook PC
rating: 0.0000
processor: 8th Generation Intel® Core™ i7 processor (i7-8565U)
processor_company: Intel
processor_type: i7
generation: 8
os_installed: Windows 10 Pro 64
ram: 16 GB DDR4-2400 SDRAM (1 x 16 GB)
hard_disk: 1 TB 5400 rpm SATA
graphic_card: Intel® UHD Graphics 620
display: None
Actual_price: 106,227
final_price: 103,082
Discount: 3,145
included_items: HP Overnighter Backpack (Worth ₹2,499)
--------------------------------------------------
Name: HP Pavilion x360 14-dh0101tu
rating: 0.0000
processor: 8th Generation Intel® Core™ i3 processor
processor_company: Intel
processor_type: i3
generation: 8
os_installed: Windows 10 Home Single Language 64
ram: 4 GB DDR4-2400 SDRAM (1 x 4 GB)
hard_disk: 256 GB PCIe® NVMe™ M.2 SSD
graphic_card: Intel® UHD Graphics 620
display: None
Actual_price: 51,826
final_price: 49,990
Discount: 1,836
included_items: HP Active Pen (Worth ₹3,145)
--------------------------------------------------
Name: HP Pavilion x36

final_price: 60,812
Discount: 3,175
included_items: HP Original Bag  (Worth ₹1,499) (#5DD44PA)
--------------------------------------------------
Name: HP 348 G4 Notebook PC (ENERGY STAR)
rating: 0.0000
processor: 7th Generation Intel® Core™ i3 processor (i3-7020U)
processor_company: Intel
processor_type: i3
generation: 7
os_installed: Windows 10 Pro 64
ram: 4 GB DDR4-2133 SDRAM (1 x 4 GB)
hard_disk: 1 TB 7200 rpm SATA
graphic_card: Intel® HD Graphics 620
display: None
Actual_price: 51,827
final_price: 49,256
Discount: 2,571
included_items: HP Original Bag  (Worth ₹1,499) (#5DD44PA)
--------------------------------------------------
Name: HP ZBook 17 G5 Mobile Workstation
rating: 0.0000
processor: Intel® Xeon® processor
processor_company: Intel
processor_type: Xeon®
generation: 7
os_installed: Windows 10 Pro 64
ram: 32 GB DDR4-2666 ECC SDRAM (2 X 16 GB)
hard_disk: 512 GB PCIe® NVMe™ SSD
graphic_card: NVIDIA® Quadro® P4200 (8 GB GDDR5 dedicated)
display: None
Actual_price: 667,152
final

Name: HP ENVY - 13-ah0042tu
rating: 0.0000
processor: 8th Generation Intel® Core™ i3 processor
processor_company: Intel
processor_type: i3
generation: 8
os_installed: Windows 10 Home Single Language 64
ram: 4 GB LPDDR3-1866 SDRAM (onboard)
hard_disk: 128 GB SSD
graphic_card: Intel® UHD Graphics 620
display: None
Actual_price: 64,011
final_price: 60,380
Discount: 3,631
included_items: 1 Year On-site Warranty
--------------------------------------------------
Name: HP EliteBook 830 G5 Notebook PC
rating: 0.0000
processor: 8th Generation Intel® Core™ i5 processor
processor_company: Intel
processor_type: i5
generation: 8
os_installed: Windows 10 Pro 64
ram: 8 GB DDR4-2400 SDRAM (1 x 8 GB)
hard_disk: 256 GB SSD
graphic_card: Intel® UHD Graphics 620
display: None
Actual_price: 107,507
final_price: 106,568
Discount: 939
included_items: 
--------------------------------------------------
Name: HP ProBook 640 G4 Notebook PC
rating: 0.0000
processor: None
processor_company: None
processor_type: 

Name: HP ZBook 15u G5 Mobile Workstation
rating: 0.0000
processor: None
processor_company: None
processor_type: None
generation: None
os_installed: Windows 10 Pro 64
ram: 16 GB DDR4-2400 SDRAM (1 x 16 GB)
hard_disk: 512 GB SSD
graphic_card: AMD Radeon™ Pro WX 3100 Graphics (2 GB GDDR5 dedicated)
display: None
Actual_price: 264,534
final_price: 161,517
Discount: 103,017
included_items: HP Business Backpack (Worth ₹4,200)
--------------------------------------------------
Name: HP EliteBook x360 1030 G3 Notebook PC
rating: 0.0000
processor: 8th Generation Intel® Core™ i7 processor
processor_company: Intel
processor_type: i7
generation: 8
os_installed: Windows 10 Pro 64
ram: 16 GB LPDDR3-2133 SDRAM (onboard)
hard_disk: 1 TB SSD
graphic_card: Intel® UHD Graphics 620
display: HP Sure View, 13.3" FHD touch screen
Actual_price: 208,728
final_price: 206,418
Discount: 2,310
included_items: HP Overnighter Backpack (Worth ₹2,499)
--------------------------------------------------
Name: HP Elite x

List of products we got after scraping

In [4]:
prod_list

[{'Name': 'HP ENVY x360 - 13-ag0035au',
  'actual_price': '83,496',
  'discout': '10,506',
  'display': '13.3" FHD multitouch-enabled edge-to-edge glass (1920 x 1080)',
  'final_price': '72,990',
  'generation': None,
  'graphic_card': 'AMD Radeon™ Vega 8 Graphics',
  'hard_disk': '256 GB SSD',
  'included_items': '',
  'os_installed': 'Windows 10 Home Single Language 64',
  'processor': 'AMD Ryzen™ 5 processor',
  'processor_company': 'AMD',
  'processor_type': 'Ryzen™ 5',
  'ram': '8 GB DDR4-2400 SDRAM (onboard)',
  'rating': '3.6667'},
 {'Name': 'HP Gaming Pavilion - 15-cx0140tx',
  'actual_price': '86,476',
  'discout': '13,486',
  'display': None,
  'final_price': '72,990',
  'generation': '8',
  'graphic_card': 'NVIDIA® GeForce® GTX 1050 (4 GB GDDR5 dedicated)',
  'hard_disk': '1 TB 7200 rpm SATA',
  'included_items': 'HP Odyssey backpack (Worth ₹3,499),Microsoft Office Home and Student',
  'os_installed': 'Windows 10 Home Single Language 64',
  'processor': '8th Generation Intel

creating a dataframe with the List of products

In [5]:
df=pd.DataFrame(prod_list)
df.head()

Unnamed: 0,Name,actual_price,discout,display,final_price,generation,graphic_card,hard_disk,included_items,os_installed,processor,processor_company,processor_type,ram,rating
0,HP ENVY x360 - 13-ag0035au,83496,10506,"13.3"" FHD multitouch-enabled edge-to-edge glas...",72990,,AMD Radeon™ Vega 8 Graphics,256 GB SSD,,Windows 10 Home Single Language 64,AMD Ryzen™ 5 processor,AMD,Ryzen™ 5,8 GB DDR4-2400 SDRAM (onboard),3.6667
1,HP Gaming Pavilion - 15-cx0140tx,86476,13486,,72990,8.0,NVIDIA® GeForce® GTX 1050 (4 GB GDDR5 dedicated),1 TB 7200 rpm SATA,"HP Odyssey backpack (Worth ₹3,499),Microsoft O...",Windows 10 Home Single Language 64,8th Generation Intel® Core™ i5 processor,Intel,i5,8 GB DDR4-2666 SDRAM (1 x 8 GB),5.0
2,HP Notebook - 15-da0435tx,50292,5712,,44580,7.0,NVIDIA® GeForce® MX110 (2 GB DDR3 dedicated),1 TB 5400 rpm SATA,,Windows 10 Home Single Language 64,7th Generation Intel® Core™ i3 processor,Intel,i3,8 GB DDR4-2133 SDRAM (1 x 8 GB),4.0
3,HP Notebook - 15g-dr0006tx,66137,7146,,58991,8.0,NVIDIA® GeForce® MX110 (2 GB DDR3 dedicated),1 TB 5400 rpm SATA,"HP Original Laptop Bag (Worth ₹1,123),1 Year O...",Windows 10 Home Single Language 64,8th Generation Intel® Core™ i5 processor,Intel,i5,8 GB DDR4-2400 SDRAM (1 x 8 GB),4.0625
4,HP Notebook 15-da1030tu,50720,3730,,46990,8.0,,1 TB 5400 rpm SATA,"HP Original Laptop Bag (Worth ₹1,123),Microsof...",Windows 10 Home 64,8th Generation Intel® Core™ i5 processor,Intel,i5,4 GB DDR4-2400 SDRAM,2.0


saving the dataframe into Hp_laptops.csv

In [6]:
df.to_csv("Hp_laptops.csv")