<a href="https://colab.research.google.com/github/michaelchndra/Scraping-and-Make-Dataset-from-Tokopedia/blob/main/Scrape_%26_Make_Dataset_from_Tokopedia.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Scrape & Make Dataset from Website Tokopedia**

This is an automated method of web scraping, which functions to scrape or take data from the Tokopedia website and then make it into a dataframe. I made this for the needs of my learning project, namely creating an E-commerce AI Chatbot.

This scrape method uses the Python language with some help from the Python's library:
- **Selenium** to automation scraping process
- **Beautiful Soup** to scrape or collect data from websites, and
- **Pandas** to transform the data into a dataframe, then save to a csv file.

## **Web scraping result**
Here is the list of column name and definition in `dataset-tokopedia.csv` file:

  |Column Name|Definition                                         |                          
|-----------|---------------------------------------------------|
|Img    |Image of the product                                   |
|Name     |Name of the product                                |
|Price      |Price of the product (in IDR)                      |
|City       |Location city Shop/Seller of the product                  |      |Lokasi     |City or state of the shop/seller                   |   


## **Reference**

- [**Github - Hannah2gah**](https://github.com/hannah2gah/web-scraping-tokopedia)
- [**Youtube - Yuk Nyistem**](https://youtu.be/ARJ3f0bbcqU?si=XlBnAP52DbSrJx1E)
- **My friends at Infinite Learning - IBM Advanced AI, which has helped me improve this code - Filza Rizki Ramadhan**


Don't forget to check my [**Github repo**](https://github.com/michaelchndra) :)


**Note:**
Updates will be made as long as I can and are always in my github repository: [**Here**](https://github.com/michaelchndra/Scraping-and-Make-Dataset-from-Tokopedia).
If you encounter a problem or bug, try to solve it yourself, if not, you can let me know via my [social media](https://linktree-mindra.vercel.app).

###**Install Library & Requirements**

In [None]:
!apt get-update
!pip install selenium
from selenium.webdriver.common.by import By
from selenium import webdriver
import time
import pandas as pd
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup

class Scraper:
    def __init__(self):
        chrome_options = webdriver.ChromeOptions()
        self.driver = webdriver.Chrome(options=chrome_options)

    def search_product(self, query):
        url = f"https://www.tokopedia.com/search?navsource=&ob=5&srp_component_id=02.01.00.00&srp_page_id=&srp_page_title=&st=product&q={query}"
        self.driver.get(url)
        for _ in range(0, 6500, 500):
            time.sleep(0.1)
            self.driver.execute_script("window.scrollBy(0,500)")

        elements = self.driver.find_elements(by=By.XPATH, value="//img[@class='css-1q90pod']")
        datas = []
        for element in elements:
            img = element.get_attribute('src')
            name = element.find_element(by=By.CLASS_NAME, value='css-3um8ox').text
            price = element.find_element(by=By.CLASS_NAME, value='css-h66vau').text
            city = element.find_element(by=By.CLASS_NAME, value='prd_link-shop-loc css-1kdc32b').text
            datas.append({
                'img': img,
                'name': name,
                'price': price,
                'city': city
            })
        return datas

###**Execute Scrape**

In [None]:

if __name__ == "__main__":
    query = input("Masukkan query pencarian produk: ")
    scraper = Scraper()
    datas = scraper.search_product(query)
    print(datas)

###**Menyimpan Hasil Scrape ke Dataset**

In [None]:
    df = pd.DataFrame(datas)
    df.to_csv('dataset-tokopedia.csv', index=False, quoting=csv.QUOTE_NONE)

    print("Data telah disimpan dalam file 'dataset-tokopedia.csv'")