Skip to content

This project aims to scrape all product data from the Tiki website in June 2023 and store it in a MongoDB database. The collected data can then be analyzed and used for various purposes 🕸️

Notifications You must be signed in to change notification settings

quannguyen0103/Tiki-product-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 

Repository files navigation

Source

0. Set up

  • Install MongoDB & MySQL

1. Scrape data

Script: load_data.py

Description

Scrape information of all products on the Tiki website

Workflow

  • Send request to Tiki web APIs to get product information
  • Use time.sleep to alternate pauses after 50 and 100 requests, avoiding IP blockage.
  • Insert scraped data directly to the product collection within the tiki MongoDB database
  • output: sample_output

2. Migrate data from MongoDB to MySQL

Script: migrate_data.py

Description

Migrate specific data fields from MongoDB to MySQL for further use and analysis

Workflow

  • Create the product_data table within the tiki_product database
  • Set up the metadata for the product_data table
  • Get these fields from each document in product collection and insert into product_data table in MySQL: id, name, category_id, category_name, subcategory_id, subcategory_name, short_description, description, url, price, rating, quantity_sold, origin
  • Use BeautifulSoup to remove html tags in description field before insert into MySQL
  • Output: sample_output

3. Extract ingredient information

Script: extract_data.py

Description

Extract product_id and ingredient information in the product's description for product development team to use

Workflow

  • Find all documents that have the string pattern thành phần: and extract ingredient data after thành phần:
  • Use BeautifulSoup to remove html tags in description
  • Output: sample_output

4. Analyze data

Script: analyze_data.py

Description

Create visualizations for better understanding of product

Alt text

About

This project aims to scrape all product data from the Tiki website in June 2023 and store it in a MongoDB database. The collected data can then be analyzed and used for various purposes 🕸️

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages