- Install MongoDB & MySQL
Script: load_data.py
Scrape information of all products on the Tiki website
- Send
request
toTiki web APIs
to get product information - Use
time.sleep
to alternate pauses after 50 and 100 requests, avoiding IP blockage. - Insert scraped data directly to the
product
collection within thetiki
MongoDB database - output: sample_output
Script: migrate_data.py
Migrate specific data fields from MongoDB to MySQL for further use and analysis
- Create the
product_data
table within thetiki_product
database - Set up the metadata for the
product_data
table - Get these fields from each document in
product
collection and insert intoproduct_data
table in MySQL:id
,name
,category_id
,category_name
,subcategory_id
,subcategory_name
,short_description
,description
,url
,price
,rating
,quantity_sold
,origin
- Use
BeautifulSoup
to removehtml tags
indescription
field before insert into MySQL - Output: sample_output
Script: extract_data.py
Extract product_id
and ingredient
information in the product's description
for product development team to use
- Find all documents that have the string pattern
thành phần:
and extract ingredient data afterthành phần:
- Use
BeautifulSoup
to removehtml tags
indescription
- Output: sample_output
Script: analyze_data.py
Create visualizations for better understanding of product