# Lab Exercise 1. Scraping Static Websites


This is the warmup task for the first laboratory exercise. It consists of scraping static Websites with BeautifulSoap.

 It should be completed at home and presented at the laboratory.

**Total points: 2**

### Task Description

Scrape the information about the products on the following page:
https://clevershop.mk/product-category/mobilni-laptopi-i-tableti/

For each product scrape:


*   Product title (selector `'.wd-entities-title'`)
*   Product regular price (selector `'.woocommerce-Price-amount'`)
*   Product discount price (if available), same selector as regular price
*   URL to the product page
*   Add to cart button URL

***Help: There are multiple product pages, for each page you need to send a separate request***


Save the results as a DataFrame object

You can add as many code cells as you need.

________________________________________________________________

### Requirements

Import libraries and modules that you are going to use

In [6]:
!pip install pandas

Collecting pandas
  Downloading pandas-2.2.3-cp310-cp310-win_amd64.whl (11.6 MB)
Collecting pytz>=2020.1
  Using cached pytz-2024.2-py2.py3-none-any.whl (508 kB)
Installing collected packages: pytz, pandas
Successfully installed pandas-2.2.3 pytz-2024.2


You should consider upgrading via the 'C:\Users\User-PC\AppData\Local\Programs\Python\Python310\python.exe -m pip install --upgrade pip' command.


In [6]:
import datetime
import pandas as pd
import requests as req
import urllib3
from bs4 import BeautifulSoup
from IPython.display import HTML
import warnings

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
warnings.filterwarnings("ignore")

def get_soup(snapshot_url):
    response = req.get(snapshot_url)

    if response.status_code != 200:
        raise Exception("Something went wrong")

    raw_html = response.text
    soup = BeautifulSoup(raw_html, "html.parser")
    return soup

def print_elements(collection):
    for element in collection:
        print(element)
def format_white_space(string:str):
    return " ".join(string.split())

### Send HTTP request to the target Website

In [3]:
snapshot_url = "https://clevershop.mk/product-category/mobilni-laptopi-i-tableti/"
response = req.get(snapshot_url)

check the response status code

In [4]:
response.status_code

200

### Parse the response content with BeautifulSoap

In [5]:
try:
    soup = get_soup(snapshot_url)
except Exception as e:
    print(e)
    exit(1)

### Extract data from the BeautifulSoap object using any selectors, attribute identifiers, etc.

* Product title (selector '.wd-entities-title')
* Product regular price (selector '.woocommerce-Price-amount')
* Product discount price (if available), same selector as regular price
* URL to the product page
* Add to cart button URL

In [7]:
page_numbers = soup.select(".page-numbers li")
max_pages = int(page_numbers[-2].text)
num_pages = [i for i in range(1, max_pages + 1)]

products_page_url = "https://clevershop.mk/product-category/mobilni-laptopi-i-tableti/page/"
products = []
for page_number in num_pages:
    fetch_url = products_page_url + str(page_number) + "/"
    try:
        page_soup = get_soup(fetch_url)
    except Exception as e:
        continue

    product_elements = page_soup.select(".product-wrapper")
    for product_element in product_elements:
        title = product_element.select_one(".wd-entities-title").select_one("a").text
        children = list(product_element.select_one(".price").children)

        if len(children) == 1:
            regular_price = children[0].text
            discounted_price = "0 ден"
        else:
            regular_price = product_element.select(".woocommerce-Price-amount")[0].select_one("bdi").text
            discounted_price = product_element.select(".woocommerce-Price-amount")[1].select_one("bdi").text
        regular_price = format_white_space(regular_price)
        discounted_price = format_white_space(discounted_price)

        product_link = product_element.select_one(".product-image-link").get("href")
        query_string = product_element.select_one(".add_to_cart_button").get("href")
        if query_string.startswith("?"):
            add_to_cart_link = fetch_url + query_string
        else:
            add_to_cart_link = None
        product = {"title": title, "price": regular_price, "discounted_price": discounted_price, "link": product_link, "add_to_cart": add_to_cart_link}
        products.append(product)

Repeat the extraction process for each page of products

In [8]:
products

[{'title': 'Acer A315-23-A7KD',
  'price': '17.590 ден',
  'discounted_price': '0 ден',
  'link': 'https://clevershop.mk/product/acer-a315-23-a7kd/',
  'add_to_cart': 'https://clevershop.mk/product-category/mobilni-laptopi-i-tableti/page/1/?add-to-cart=21494'},
 {'title': 'Acer A315-23-R5P2',
  'price': '27.490 ден',
  'discounted_price': '0 ден',
  'link': 'https://clevershop.mk/product/acer-a315-23-r5p2/',
  'add_to_cart': 'https://clevershop.mk/product-category/mobilni-laptopi-i-tableti/page/1/?add-to-cart=21510'},
 {'title': 'ACER Aspire 1 A115-22',
  'price': '18.999 ден',
  'discounted_price': '15.999 ден',
  'link': 'https://clevershop.mk/product/acer-aspire-1-nx-a7pex-001/',
  'add_to_cart': 'https://clevershop.mk/product-category/mobilni-laptopi-i-tableti/page/1/?add-to-cart=20826'},
 {'title': 'Acer Aspire 3 A315-23-R26A',
  'price': '29.990 ден',
  'discounted_price': '0 ден',
  'link': 'https://clevershop.mk/product/acer-aspire-3-a315-23-r26a/',
  'add_to_cart': 'https://cl

### Create a pandas DataFrame with the scraped products

In [10]:
df = pd.DataFrame(products)
print(df.head())

                        title       price discounted_price  \
0           Acer A315-23-A7KD  17.590 ден            0 ден   
1           Acer A315-23-R5P2  27.490 ден            0 ден   
2       ACER Aspire 1 A115-22  18.999 ден       15.999 ден   
3  Acer Aspire 3 A315-23-R26A  29.990 ден            0 ден   
4  Acer Aspire 3 A315-58-33WK  24.490 ден            0 ден   

                                                link  \
0   https://clevershop.mk/product/acer-a315-23-a7kd/   
1   https://clevershop.mk/product/acer-a315-23-r5p2/   
2  https://clevershop.mk/product/acer-aspire-1-nx...   
3  https://clevershop.mk/product/acer-aspire-3-a3...   
4               https://clevershop.mk/product/21498/   

                                         add_to_cart  
0  https://clevershop.mk/product-category/mobilni...  
1  https://clevershop.mk/product-category/mobilni...  
2  https://clevershop.mk/product-category/mobilni...  
3  https://clevershop.mk/product-category/mobilni...  
4  https://clev

Save the dataframe as `.csv`

In [12]:
df.to_csv("./products.csv", index=False)