# Lab Exercise 1. Scraping Static Websites


This is the warmup task for the first laboratory exercise. It consists of scraping static Websites with BeautifulSoap.

 It should be completed at home and presented at the laboratory.

**Total points: 2**

### Task Description

Scrape the information about the products on the following page:
https://clevershop.mk/product-category/mobilni-laptopi-i-tableti/

For each product scrape:


*   Product title (selector `'.wd-entities-title'`)
*   Product regular price (selector `'.woocommerce-Price-amount'`)
*   Product discount price (if available), same selector as regular price
*   URL to the product page
*   Add to cart button URL

***Help: There are multiple product pages, for each page you need to send a separate request***


Save the results as a DataFrame object

You can add as many code cells as you need.

________________________________________________________________

### Requirements

Import libraries and modules that you are going to use

In [6]:
import pandas as pd
from bs4 import BeautifulSoup
import numpy as np
import requests

### Send HTTP request to the target Website

In [7]:
url="https://clevershop.mk/product-category/mobilni-laptopi-i-tableti/"
response=requests.get(url)

check the response status code

In [8]:
response

<Response [200]>

### Parse the response content with BeautifulSoap

In [1]:
soup=BeautifulSoup(response.text,"html.parser")
print(soup.prettify()).

SyntaxError: invalid syntax (3803379191.py, line 2)

### Extract data from the BeautifulSoap object using any selectors, attribute identifiers, etc.

* Product title (selector '.wd-entities-title')
* Product regular price (selector '.woocommerce-Price-amount')
* Product discount price (if available), same selector as regular price
* URL to the product page
* Add to cart button URL

In [10]:
products=soup.select(".product-wrapper")

In [11]:

title=products[0].select_one(".wd-entities-title a")
regular_price=products[0].select_one("div.product-wrapper > span > span > bdi").text.split("\xa0")[0]
isOnSale=products[2].select_one(".onsale").text
discount_price=products[2].select(".price span bdi")[1].text.split("\xa0")[0]
url_product_page=products[0].select_one(".hover-img a").get("href")
url_cart_button=products[0].select_one("div.wd-add-btn.wd-add-btn-replace a").get("href")
title.text,regular_price,isOnSale,discount_price,url_product_page,url_product_page+url_cart_button

('Acer A315-23-A7KD',
 '17.590',
 '-16%',
 '15.999',
 'https://clevershop.mk/product/acer-a315-23-a7kd/',
 'https://clevershop.mk/product/acer-a315-23-a7kd/?add-to-cart=21494')

In [12]:
all_products_on_page=[]
for product in products:
    title=product.select_one(".wd-entities-title a").text
    regular_price=product.select_one(".woocommerce-Price-amount").text.split("\xa0")[0]
    isOnSale=product.select_one(".onsale")
    if isOnSale is not None:
        discount_price=product.select(".price span bdi")[1].text.split("\xa0")[0] 
    else:
        discount_price=None
    url_product_page=product.select_one(".hover-img a").get("href")
    url_cart_button=product.select_one("div.wd-add-btn.wd-add-btn-replace a").get("href")
    product_dict={
        "Title":title,
        "Regular price":regular_price,
        "Discount price":discount_price,
        "Url product page":url_product_page,
        "Url cart page": url_product_page+url_cart_button
    }
    all_products_on_page.append(product_dict)
len(all_products_on_page)

24

Repeat the extraction process for each page of products

In [4]:
def return_product_dict(product):
    title=product.select_one(".wd-entities-title a").text
    regular_price=product.select_one(".woocommerce-Price-amount").text.split("\xa0")[0].replace(".","")
    isOnSale=product.select_one(".onsale")
    if isOnSale is not None:
        discount_price=product.select(".price span bdi")[1].text.split("\xa0")[0].replace(".","")
    else:
        discount_price=None
    url_product_page=product.select_one(".product-image-link").get("href")
    url_cart_button=product.select_one("div.wd-add-btn.wd-add-btn-replace a").get("href")
    product_dict={
        "Title":title,
        "Regular price":regular_price,
        "Discount price":discount_price,
        "Url product page":url_product_page,
        "Url cart page": url_product_page+url_cart_button
    }
    return product_dict

In [7]:
base_url="https://clevershop.mk/product-category/mobilni-laptopi-i-tableti/page/"
products_on_all_pages=[]
for i in range(1,15):
    url=base_url+str(i)
    response=requests.get(url)
    soup=BeautifulSoup(response.text,"html.parser")
    products=soup.select(".product-wrapper")
    for product in products:
        result=return_product_dict(product)
        products_on_all_pages.append(result)
len(products_on_all_pages)
products_on_all_pages

[{'Title': 'Acer A315-23-A7KD',
  'Regular price': '17590',
  'Discount price': None,
  'Url product page': 'https://clevershop.mk/product/acer-a315-23-a7kd/',
  'Url cart page': 'https://clevershop.mk/product/acer-a315-23-a7kd/?add-to-cart=21494'},
 {'Title': 'Acer A315-23-R5P2',
  'Regular price': '27490',
  'Discount price': None,
  'Url product page': 'https://clevershop.mk/product/acer-a315-23-r5p2/',
  'Url cart page': 'https://clevershop.mk/product/acer-a315-23-r5p2/?add-to-cart=21510'},
 {'Title': 'ACER Aspire 1 A115-22',
  'Regular price': '18999',
  'Discount price': '15999',
  'Url product page': 'https://clevershop.mk/product/acer-aspire-1-nx-a7pex-001/',
  'Url cart page': 'https://clevershop.mk/product/acer-aspire-1-nx-a7pex-001/?add-to-cart=20826'},
 {'Title': 'Acer Aspire 3 A315-23-R26A',
  'Regular price': '29990',
  'Discount price': None,
  'Url product page': 'https://clevershop.mk/product/acer-aspire-3-a315-23-r26a/',
  'Url cart page': 'https://clevershop.mk/produ

### Create a pandas DataFrame with the scraped products

In [15]:
df=pd.DataFrame(products_on_all_pages)
df

Unnamed: 0,Title,Regular price,Discount price,Url product page,Url cart page
0,Acer A315-23-A7KD,17590,,https://clevershop.mk/product/acer-a315-23-a7kd/,https://clevershop.mk/product/acer-a315-23-a7k...
1,Acer A315-23-R5P2,27490,,https://clevershop.mk/product/acer-a315-23-r5p2/,https://clevershop.mk/product/acer-a315-23-r5p...
2,ACER Aspire 1 A115-22,18999,15999,https://clevershop.mk/product/acer-aspire-1-nx...,https://clevershop.mk/product/acer-aspire-1-nx...
3,Acer Aspire 3 A315-23-R26A,29990,,https://clevershop.mk/product/acer-aspire-3-a3...,https://clevershop.mk/product/acer-aspire-3-a3...
4,Acer Aspire 3 A315-58-33WK,24490,,https://clevershop.mk/product/21498/,https://clevershop.mk/product/21498/?add-to-ca...
...,...,...,...,...,...
315,Monitor 27 Philips 272E1GAJ/00 VA 1ms 144Hz,12890,,https://clevershop.mk/product/monitor-27-phili...,https://clevershop.mk/product/monitor-27-phili...
316,Philips 24″ 243V7QDSB,8390,,https://clevershop.mk/product/philips-24%e2%80...,https://clevershop.mk/product/philips-24%e2%80...
317,Philips 27″ 278E1A/00 4K UHD IPS,18990,,https://clevershop.mk/product/hp-27%e2%80%b3-2...,https://clevershop.mk/product/hp-27%e2%80%b3-2...
318,Philips 279C9-00 MON LED 27″ 3840 x 2160 5Ms 6...,26990,,https://clevershop.mk/product/philips-279c9-00...,https://clevershop.mk/product/philips-279c9-00...


Save the dataframe as `.csv`

In [355]:
df.to_csv("products.csv",index=False)