# Web Scraping product with beautiful soup

## Introduction

In this project, a Python script has been developed to perform web scraping on a website containing information about whisky products. The purpose of this script is to extract product data from multiple pages of the website and organize it into a data structure that can be further processed. This process involves the use of libraries such as requests for sending HTTP requests, BeautifulSoup for parsing HTML, pandas for data manipulation, and tqdm for providing a progress bar visualization.


# Import Library

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from tqdm.notebook import tqdm

## Initialize Variables and List

In [2]:
all_products = []  # List untuk menyimpan data produk
base_url = 'https://minuman.com/id/collections/whisky?page='  # URL dasar situs web

* Initialize an empty list all_products to store product data.
* Define the base URL of the website to be scraped.

# Loop Through Pages and Product Data Extraction Loop

In [4]:
for page_num in tqdm(range(1, 6), desc='Scraping pages'):
    page_url = base_url + str(page_num)  # Membuat URL lengkap untuk halaman saat ini
    response = requests.get(page_url)   # Mengirim permintaan GET ke URL
    soup = BeautifulSoup(response.content, 'html.parser')  # Mengurai konten menggunakan BeautifulSoup
    product_containers = soup.find_all('div', class_='product-item')  # Menemukan semua kontainer produk di halaman
    
    for product in product_containers:
        product_name = product.find('a', class_='product-item__title').get_text(strip=True)
        product_price = product.find('span', class_='price').get_text(strip=True)
        all_products.append({'name': product_name, 'price': product_price})
        

Scraping pages:   0%|          | 0/5 [00:00<?, ?it/s]

* Utilize a loop to fetch data from the first five pages.
* Build the complete URL for each page.
* Send a GET request and parse the content using BeautifulSoup.
* Find all product containers on the page with the class 'product-item'.
* Iterate through each product container and extract the product name and price.
* Append the product data to the all_products list in the form of a dictionary.

# Convert to DataFrame

In [5]:
all_products_df = pd.DataFrame(all_products)


In [6]:
all_products_df

Unnamed: 0,name,price
0,Label 5 - Classic Black - Blended Whisky - 700ml,"Sale priceIDR390,000"
1,Bells Original - Blended Whisky - 700ml,"Sale priceIDR399,000"
2,Glenfiddich 12yrs - Single Malt Whisky - 700ml,"Sale priceIDR1,090,000"
3,Monkey Shoulder - Blended Whisky - 700ml,"Sale priceIDR990,000"
4,Grants - Triple Wood - Blended Whisky - 700ml,"Sale priceIDR469,000"
...,...,...
115,Dewars - Japanese Smooth Cask - Blended Whisky...,"Sale priceIDR690,000"
116,The Pogues - Blended Whiskey - 700ml,"Sale priceIDR690,000"
117,The Macallan - Sir Peter Blake - Single Malt W...,"Sale priceIDR41,000,000"
118,Dewars 25yrs - Blended Whisky - 750ml,"Sale priceIDR3,490,000"


* This script demonstrates the ability to extract data from a website using web scraping techniques.
* Successfully retrieved information includes name product, and price of product
