<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Capstone: Shopee Sales Analysis

## Executive Summary
[*Jump to table of contents*](#Table-of-Contents)

In this analysis, we will be using the pycaret library to get the best models to predict sales performance in terms of quantity on Shopee. This project be focused on the Beauty and Personal Care category. This will not only allow us to forecast future sales performances in terms of categories, and will help sellers remain competitive on the platform. This in turn, increases sellers' efficiency on the platform.

We will be scraping data using Selenium from shops mainly on Shopee Mall, as it is an area on Shopee that lists verified sellers - mainly authorised distributors, retailers and flagship stores. These sellers are all regulated and are required to sell authentic products, following the guidelines of the brands they are selling. We will focus mainly on Watsons Official Store as it has a huge variety of 3000 product listings from different brands and categories within the health and beauty category, which provides us with a substantial sample size. 

The provided data dictionary is reflected below, along with an additional column describing the corresponding data type.

**The main metric we will be using to assess the models is RMSE (root mean squared error) as it allows us to accurately assess how a model performs both on train and test data. Although we are optimizing for RMSE, we will also be looking at R2 to compare between models. 

This analysis will benefit retailers in the Health and Beauty industry to make better business decisions and pick more popular products to increase sales. On Shopee, we can see that there is a limit of 3000 product listings, and larger distributors and retailers will have more than 3000 products. Listing products online, apart from their brick and mortar stores requires additional costs (creation of listings, allocating stocks to e-commerce, online stock inventory system etc.). Hence this will help Health and Beauty retailers have a better selection of products from popular categories, meeting the needs of shoppers.

In this analysis, we will be looking to answer the following questions:
* What features are most relevant for sellers to increase sale quantity?
* In what ways do categories or sub categories impact sales quantity?
* What mechanics are ideal for their range of products? 
* What other insights can we get from the analysis? 


## Problem Statement
This project aims to help retailers in the Health and Beauty industry by finding out key features or criterias that affect sale quantity on e-commerce platform. This allows sellers to have better selections of products that better suit the needs of shoppers on Shopee. 

The model can also be used to forecast future sales, and hence allow brands to restock sufficient quantities to meet the volume on the platform. This prevents sellers and brands from missing out on sales.

The model will be trained using products from Watsons' official store on Shopee Mall, using the PyCaret which will determine the suitable models. 

## Table of Contents

* [i. Data Dictionary](#i.-Data-Dictionary)
    * [1. Data Scraping](#1.-Data-Scraping)
        * [1.1 Scraping of Shopee product listings](#1.1-Scraping-of-Shopee-product-listings)
            * [1.1.1 Import Libraries](#1.1.1-Import-Libraries)
            * [1.1.2 Scraping individual product links from Watsons Homepage](#1.1.2-Scraping-individual-product-links-from-Watsons-Homepage)
            * [1.1.3 Scraping Information from each individual product links](#1.1.3-Scraping-Information-from-each-individual-product-links)
            * [1.1.4 First look at scraped data](#1.1.4-First-look-at-scraped-data)
            * [1.1.5 Scraping earliest review dates from each product link](#1.1.5-Scraping-earliest-review-dates-from-each-product-link)

## 1. Data Scraping
### 1.1 Scraping of Shopee product listings
[*Jump to table of contents*](#Table-of-Contents)

In this section we will scrape the information for each product listing on the Watsons Official store ([source](https://shopee.sg/shop/195238920/search)). The data is scraped 4 December 2021.
The following features will be scraped from the individual product listings: 
**Product information**
- Product Link
- Product Name
- Original Price
- Sale Price
- Discount
- Free Shpping
- Vouchers
- Product Category
- Product Description

**Ratings**
- Ratings
- Number Of Ratings
- Number of 5 star ratings
- Number of 4 star ratings
- Number of 3 star ratings
- Number of 2 star ratings
- Number of 1 star ratings
- Number of Ratings with Comments
- Number of Ratings with Media

**Stocks**
- Brand
- Quantity Sold
- Stocks available
- Number of users who Favourited the product

**Shop Description**
- Shop
- Total Ratings
- Total Products
- Total Followers


#### 1.1.1 Import Libraries

In [1]:
# Import packages
import pandas as pd
import csv
import sys
import re

import time
from datetime import date, timedelta

from getpass import getpass
from time import sleep

from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.action_chains import ActionChains
from tqdm import tqdm

In [2]:
# To install webdriver package, only to run once.
# !pip install webdriver-manager

In [32]:
# import driver manager to manage chrome driver and chrome versions
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(ChromeDriverManager().install())



Current google-chrome version is 96.0.4664
Get LATEST chromedriver version for 96.0.4664 google-chrome
Trying to download new driver from https://chromedriver.storage.googleapis.com/96.0.4664.45/chromedriver_mac64.zip
Driver has been saved in cache [/Users/juneteo/.wdm/drivers/chromedriver/mac64/96.0.4664.45]
  driver = webdriver.Chrome(ChromeDriverManager().install())


#### 1.1.2 Scraping individual product links from Watsons Homepage

In [3]:
# Watsons ShopeeMall link
driver.get("https://shopee.sg/shop/195238920/search")
sleep(5)

In [19]:
# Scrape total number of pages to ensure that driver scrapes data from all pages on Watsons homepage.
pages = driver.find_element(By.XPATH, '/html/body/div[1]/div/div[3]/div/div/div[2]/div/div[1]/div[2]/div/span[2]').text
n = int(pages)
n

100

In [20]:
# Creating a for loop to scrape all the products links of watsons.
link_list = []
for pages in range(1,n+1):
    for i in range(1,41):
        try:
            link_list.append(driver.find_element(By.XPATH, '/html/body/div[1]/div/div[3]/div/div/div[2]/div/div[2]/div/div['+str(i)+']/a').get_attribute('href'))
        except:
            pass
    next_page_button = driver.find_element(By.XPATH, '/html/body/div[1]/div/div[3]/div/div/div[2]/div/div[1]/div[2]/button[2]')
    next_page_button.click()      # For driver to click next page
    print(pages)                  # To show which page it has scraped
    driver.implicitly_wait(10)    # Allow driver to wait to allow page to load before scraping
    sleep(1)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100


In [21]:
# Check for duplicated links.
print("Unique Links:", len(set(link_list)))
print("Total Links Scraped:", len(link_list))

Unique Links: 3000
Total Links Scraped: 3000


In [23]:
# Convert to dataframe and save it to CSV to reload later.
df = pd.DataFrame(link_list, columns = ['links'])
df.to_csv('watsons_links1.csv')

In [22]:
# Load links in CSV file
df= pd.read_csv('watsons_links1.csv')
# Convert dataframe to list.
link = df['links'].tolist()

In [18]:
# To check links in list
link[-9:-1]

['https://shopee.sg/DUMEX-MAMIL-GOLD-Dumex-Mamil-Gold-Stage-5-Growing-Up-Kid-Milk-Formula-850g-i.195238920.8527415194?sp_atk=9d84ec1f-e1d1-4c54-a843-b45f3c598cc8',
 'https://shopee.sg/Durex-Together-3s-i.195238920.8517634703?sp_atk=b0ec8a64-60b4-4828-8249-1fb88ce0bd44',
 'https://shopee.sg/Natures-Way-Adult-Vita-Gummies-Vitamin-C-120S-i.195238920.8517619729?sp_atk=c99af5d8-fdcd-4894-b50b-eb4d27d59fa6',
 'https://shopee.sg/Kose-Cosmeport-Clear-Turn-Premium-Royal-Jelly-Mask-Highly-Concentrated-Hyaluronic-Acid-4S-i.195238920.8516869727?sp_atk=1ba7296e-dc20-40aa-b3e8-1dfec18ed29c',
 'https://shopee.sg/Durex-Extra-Safe-3s-i.195238920.8417623651?sp_atk=e203cdf1-80e6-419e-9b63-300c85e8025c',
 "https://shopee.sg/NATURE'S-ESSENCE-Brite-Eyes-Vegetarian-Caplets-50s-i.195238920.8331566354?sp_atk=fb8de16f-279a-4d6c-a979-37ceb80fcadc",
 "https://shopee.sg/Kinohimitsu-D'Tox-Plum-Juice-(Flush-Out-Toxins)-30ml-x-6s-i.195238920.8117641315?sp_atk=4bb9624a-32e7-44f0-a59b-e2c8cd7c1074"]

#### 1.1.3 Scraping Information from each individual product links

In [11]:
# Creating a loop to scrape data from individual product links and adding it into a dictionary
# TQDM is used to show the percentage of products scraped.
allproducts=[]
for i in tqdm(link):
    driver.get(i)
    html = driver.find_element(By.TAG_NAME, 'html')
    html.send_keys(Keys.PAGE_DOWN)
    driver.implicitly_wait(12)
    html.send_keys(Keys.PAGE_DOWN)
    driver.implicitly_wait(12)
    html.send_keys(Keys.PAGE_DOWN)
   # For loop to extract information.

#   Product Details
    ProductName = driver.find_elements(By.XPATH, '/html/body/div[1]/div/div[2]/div[2]/div/div[2]/div[3]/div/div[1]/span')
    OriginalPrice = driver.find_elements(By.XPATH, '/html/body/div[1]/div/div[2]/div[2]/div/div[2]/div[3]/div/div[3]/div/div/div[1]/div/div[1]')
    SalePrice = driver.find_elements(By.XPATH, '/html/body/div[1]/div/div[2]/div[2]/div/div[2]/div[3]/div/div[3]/div/div/div[1]/div/div[2]/div[1]')
    Discount = driver.find_elements(By.XPATH, '/html/body/div[1]/div/div[2]/div[2]/div/div[2]/div[3]/div/div[3]/div/div/div[1]/div/div[2]/div[2]')
    FreeShipping = driver.find_elements(By.XPATH, '/html/body/div[1]/div/div[2]/div[2]/div/div[2]/div[3]/div/div[4]/div/div[2]/div/div[1]/div[2]')
    Vouchers = driver.find_elements(By.XPATH, '/html/body/div[1]/div/div[2]/div[2]/div/div[2]/div[3]/div/div[4]/div/div[1]/div/div[2]/div')                

#   Ratings
    Ratings = driver.find_elements(By.XPATH,'/html/body/div[1]/div/div[2]/div[2]/div/div[2]/div[3]/div/div[2]/div[1]/div[1]')
    NoOfRatings = driver.find_elements(By.XPATH,'/html/body/div[1]/div/div[2]/div[2]/div/div[2]/div[3]/div/div[2]/div[2]/div[1]')
                                                /html/body/div[1]/div/div[2]/div[2]/div/div[2]/div[3]/div/div[2]/div[2]/div[1]
                                                /html/body/div[1]/div/div[2]/div[2]/div/div[2]/div[3]/div/div[2]/div[2]/div[1]
    Star5 = driver.find_elements(By.XPATH,'/html/body/div[1]/div/div[2]/div[2]/div/div[3]/div[2]/div[1]/div[2]/div/div[2]/div[2]/div[2]')
    Star4 = driver.find_elements(By.XPATH,'/html/body/div[1]/div/div[2]/div[2]/div/div[3]/div[2]/div[1]/div[2]/div/div[2]/div[2]/div[3]')
    Star3 = driver.find_elements(By.XPATH,'/html/body/div[1]/div/div[2]/div[2]/div/div[3]/div[2]/div[1]/div[2]/div/div[2]/div[2]/div[4]')
    Star2 = driver.find_elements(By.XPATH,'/html/body/div[1]/div/div[2]/div[2]/div/div[3]/div[2]/div[1]/div[2]/div/div[2]/div[2]/div[5]')
    Star1 = driver.find_elements(By.XPATH,'/html/body/div[1]/div/div[2]/div[2]/div/div[3]/div[2]/div[1]/div[2]/div/div[2]/div[2]/div[6]')
    Comments = driver.find_elements(By.XPATH,'/html/body/div[1]/div/div[2]/div[2]/div/div[3]/div[2]/div[1]/div[2]/div/div[2]/div[2]/div[7]')
    Media = driver.find_elements(By.XPATH,'/html/body/div[1]/div/div[2]/div[2]/div/div[3]/div[2]/div[1]/div[2]/div/div[2]/div[2]/div[8]')
#   Stocks
    Brand = driver.find_elements(By.XPATH,'/html/body/div[1]/div/div[2]/div[2]/div/div[3]/div[2]/div[1]/div[1]/div[1]/div[2]/div[2]/a')
    QtySold = driver.find_elements(By.XPATH,'/html/body/div[1]/div/div[2]/div[2]/div/div[2]/div[3]/div/div[2]/div[3]/div[1]')
    Stocks = driver.find_elements(By.XPATH,'/html/body/div[1]/div/div[2]/div[2]/div/div[3]/div[2]/div[1]/div[1]/div[1]/div[2]/div[3]/div')
    Favourites = driver.find_elements(By.XPATH,'/html/body/div[1]/div/div[2]/div[2]/div/div[2]/div[2]/div[2]/div[2]/div')
#   Shop description
    Shop = driver.find_elements(By.XPATH,'/html/body/div[1]/div/div[2]/div[2]/div/div[3]/div[1]/div[1]/div/div[1]')
    TotalRatings = driver.find_elements(By.XPATH,'/html/body/div[1]/div/div[2]/div[2]/div/div[3]/div[1]/div[2]/div[1]/div/span')
    TotalProducts = driver.find_elements(By.XPATH,'/html/body/div[1]/div/div[2]/div[2]/div/div[3]/div[1]/div[2]/div[1]/div/span')
    TotalFollowers = driver.find_elements(By.XPATH,'/html/body/div[1]/div/div[2]/div[2]/div/div[3]/div[1]/div[2]/div[3]/div[2]/span') 
    Category = driver.find_elements(By.XPATH,'/html/body/div[1]/div/div[2]/div[2]/div/div[3]/div[2]/div[1]/div[1]/div[1]/div[2]/div[1]/div')
    ProductDescription = driver.find_elements(By.XPATH,'/html/body/div[1]/div/div[2]/div[2]/div/div[3]/div[2]/div[1]/div[1]/div[2]/div[2]/div/span')

#   If information is not found, indicate as none instead of throwing an error.
    if not ProductName: ProductName = "None"
    else: ProductName = ProductName[0].text

    if not OriginalPrice: OriginalPrice = "None"
    else: OriginalPrice = OriginalPrice[0].text
        
    if not SalePrice: SalePrice = "None"
    else: SalePrice = SalePrice[0].text

    if not Discount: Discount = "None"
    else: Discount = Discount[0].text

    if not FreeShipping: FreeShipping = "None"
    else: FreeShipping = FreeShipping[0].text

    if not Vouchers: Vouchers = "None"
    else: Vouchers = Vouchers[0].text
        
    if not Ratings: Ratings = "None"
    else: Ratings = Ratings[0].text    

    if not NoOfRatings: NoOfRatings = "None"
    else: NoOfRatings = NoOfRatings[0].text    

    if not Brand: Brand = "None"
    else: Brand = Brand[0].text
        
    if not Favourites: Favourites = "None"
    else: Favourites = Favourites[0].text   
        
    if not Star5: Star5 = "None"
    else: Star5 = Star5[0].text  
       
    if not Star4: Star4 = "None"
    else: Star4 = Star4[0].text  
        
    if not Star3: Star3 = "None"
    else: Star3 = Star3[0].text  
        
    if not Star2: Star2 = "None"
    else: Star2 = Star2[0].text  
        
    if not Star1: Star1 = "None"
    else: Star1 = Star1[0].text  
      
    if not Comments: Comments = "None"
    else: Comments = Comments[0].text 
        
    if not Media: Media = "None"
    else: Media = Media[0].text

    if not QtySold: QtySold = "None"
    else: QtySold = QtySold[0].text 
    
    if not Stocks: Stocks = "None"
    else: Stocks = Stocks[0].text 
        
    if not ProductDescription: ProductDescription = "None"
    else: ProductDescription = ProductDescription[0].text 

    if not Shop: Shop = "None"
    else: Shop = Shop[0].text  
      
    if not TotalRatings: TotalRatings = "None"
    else: TotalRatings = TotalRatings[0].text 
        
    if not TotalProducts: TotalProducts = "None"
    else: TotalProducts = TotalProducts[0].text

    if not TotalFollowers: TotalFollowers = "None"
    else: TotalFollowers = TotalFollowers[0].text 
    
    if not Category: Category = "None"
    else: Category = Category[0].text 

    dic = {
#   Product Details
    'ProductName':ProductName,
    'OriginalPrice':OriginalPrice,
    'SalePrice':SalePrice,
    'Discount':Discount,
    'FreeShpping':FreeShipping,
    'Vouchers':Vouchers,
#   Ratings
    'Ratings':Ratings,
    'NoOfRatings':NoOfRatings,
    'Star5':Star5,
    'Star4':Star4,
    'Star3':Star3,
    'Star2':Star2,
    'Star1':Star1,
    'Comments':Comments,
    'Media':Media,
#   Stocks
    'Brand':Brand,
    'QtySold':QtySold,
    'Stocks':Stocks,
    'Favourites':Favourites,
#   Shop Description
    'Shop':Shop,
    'TotalRatings':TotalRatings,
    'TotalProducts':TotalProducts,
    'TotalFollowers':TotalFollowers,
    'Category':Category,
    'ProductDescription':ProductDescription,
    'link':i}
    allproducts.append(dic)


100%|██████████| 3000/3000 [28:27:41<00:00, 34.15s/it]    


'    x = True\n    while x: \n        try:\n            driver.implicitly_wait(10)\n            next_page_reviews_button = driver.find_element(By.CSS_SELECTOR,\'button.shopee-icon-button.shopee-icon-button--right \')\n            next_page_reviews_button.click()\n            driver.implicitly_wait(30)\n        except:\n            review_dates = driver.find_elements(By.CLASS_NAME,\'shopee-product-rating__time\') # Find all dates in reviews\n            reviews = [] # Creates list\n            for dates in review_dates:\n                date = dates.text # convert Web Elements in list to .text\n                reviews.append(date)\n                reviews.sort() \n#             reviews = reviews[0] \n            if not reviews: reviews = "None"\n            else: reviews = reviews[0] # Pick the oldest date as a proxy.\n            x = False'

In [14]:
# Save all scraped data into a dataframe and csv file.
df = pd.DataFrame(allproducts)
df.to_csv('watsons_test1.csv')

In [15]:
# load CSV file as df
df = pd.read_csv('watsons_test1.csv')

#### 1.1.4 First look at scraped data

In [16]:
df.T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,2990,2991,2992,2993,2994,2995,2996,2997,2998,2999
Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,2990,2991,2992,2993,2994,2995,2996,2997,2998,2999
ProductName,SD BIOSENSOR Standard Q Covid-19 AG Home Test ...,Kodomo Baby Wipes 70'S,Dr Tung's Smart Dental Floss 27M,Watsons Goat Milk Cream Bath 1L,WATSONS Square Puffs Facial Cotton - 3x160s (E...,ORITA Dehumidifier Charcoal 650ml x 48s (Per C...,Watsons Rose Cream Bath 1L,WATSONS Sugar Free Vitamin C + Zinc Orange Fla...,Rohto Cool Eye Drops 13Ml,Icm Pharma Icm Pharma Hygin-X Antiseptic Handr...,...,Herbal Essences Bio Renew Smooth Golden Moring...,DUMEX MAMIL GOLD Dumex Mamil Gold Stage 5 Grow...,Durex Together 3s,Natures Way Adult Vita Gummies Vitamin C 120S,Kose Cosmeport Clear Turn Premium Royal Jelly ...,Durex Extra Safe 3s,NATURE'S ESSENCE Brite Eyes Vegetarian Caplets...,Kinohimitsu D'Tox Plum Juice (Flush Out Toxins...,Leaders Insolution Mediu Amino Lifting Mask 5S,Kose Cosmeport Clear Turn Premium Royal Gelee ...
OriginalPrice,,$2.00,$5.10,$6.90,$4.90,$88.00,$6.90,$19.90,$7.90,$9.90,...,$12.90,$31.90,$4.35,$29.90,$15.90,$6.25,$36.45,$39.90,$12.00,$15.90
SalePrice,$7.30,,$4.80,$2.45,,$64.70,$2.45,$15.90,$7.00,,...,,,,$25.35,$13.45,,,$33.85,$10.20,$13.45
Discount,44% OFF,,6% OFF,64% OFF,,26% OFF,64% OFF,20% OFF,11% OFF,,...,,,,15% OFF,15% OFF,,,15% OFF,15% OFF,15% OFF
FreeShpping,62 piece available,63 piece available,Free shipping for orders over $40.00,Free shipping for orders over $40.00,Free shipping for orders over $40.00,27 piece available,Free shipping for orders over $40.00,Free shipping for orders over $40.00,Free shipping for orders over $40.00,Free shipping for orders over $40.00,...,Free shipping for orders over $40.00,7 piece available,Free shipping for orders over $40.00,Free shipping for orders over $40.00,Free shipping for orders over $40.00,Free shipping for orders over $40.00,Free shipping for orders over $40.00,Free shipping for orders over $40.00,Free shipping for orders over $40.00,Free shipping for orders over $40.00
Vouchers,,,$6 OFF\n$12 OFF\n$25 OFF,$6 OFF\n$12 OFF\n$25 OFF,$6 OFF\n$12 OFF\n$25 OFF,,$6 OFF\n$12 OFF\n$25 OFF,$6 OFF\n$12 OFF\n$25 OFF,$6 OFF\n$12 OFF\n$25 OFF,$6 OFF\n$12 OFF\n$25 OFF,...,$6 OFF\n$12 OFF\n$25 OFF,,$6 OFF\n$12 OFF\n$25 OFF,$6 OFF\n$12 OFF\n$25 OFF,$6 OFF\n$12 OFF\n$25 OFF,$6 OFF\n$12 OFF\n$25 OFF,$6 OFF\n$12 OFF\n$25 OFF,$6 OFF\n$12 OFF\n$25 OFF,$6 OFF\n$12 OFF\n$25 OFF,$6 OFF\n$12 OFF\n$25 OFF
Ratings,4.9,5.0,5.0,5.0,5.0,5.0,4.9,5.0,5.0,4.9,...,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0
NoOfRatings,830,251,70,154,143,100,122,96,66,90,...,1,1,1,1,1,1,1,1,1,1
Star5,5 Star (792),5 Star (245),5 Star (68),5 Star (151),5 Star (140),5 Star (96),5 Star (118),5 Star (95),5 Star (66),5 Star (87),...,5 Star (1),5 Star (1),5 Star (1),5 Star (1),5 Star (1),5 Star (1),5 Star (1),5 Star (1),5 Star (1),5 Star (1)


#### 1.1.5 Scraping earliest review dates from each product link [To be improved]

In [33]:
# Creating a loop to scrape review dates from all links
review_dates=[]
for i in tqdm(link):
    driver.get(i)
    html = driver.find_element(By.TAG_NAME, 'html')
    html.send_keys(Keys.PAGE_DOWN)
    driver.implicitly_wait(30)
    html.send_keys(Keys.PAGE_DOWN)
    driver.implicitly_wait(30)
    html.send_keys(Keys.PAGE_DOWN)
    driver.implicitly_wait(30)
    html.send_keys(Keys.PAGE_DOWN)
   # For loop to extract information 
        
#Scrape earliest review date
#     next_page_reviews_button = driver.find_element(By.CSS_SELECTOR,'button.shopee-icon-button.shopee-icon-button--right ')
    #     next_page_reviews_button = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'button.shopee-icon-button.shopee-icon-button--right ')))
    #     if 'inactive' in next_page_reviews_button.get_attribute('class'):
    x = True
    while x: 
        try:
            driver.implicitly_wait(10)
            next_page_reviews_button = driver.find_element(By.CSS_SELECTOR,'button.shopee-icon-button.shopee-icon-button--right ')
            next_page_reviews_button.click()
            driver.implicitly_wait(30)
        except:
            html = driver.find_element(By.TAG_NAME, 'html')
            html.send_keys(Keys.PAGE_DOWN)
            review_dates = driver.find_elements(By.CLASS_NAME,'shopee-product-rating__time') # Find all dates in reviews
            reviews = [] # Creates list
            for dates in review_dates:
                date = dates.text # convert Web Elements in list to .text
                reviews.append(date)
                reviews.sort() 
            if not reviews: reviews = "None"
            else: reviews = reviews[0] # Pick the oldest date as a proxy.
            x = False
            dic = {
        #   Review dates
            'reviews':reviews,
            'link':i}
            review_dates.append(dic)


100%|██████████| 3000/3000 [57:17:49<00:00, 68.76s/it]       


In [34]:
# Save all scraped data into a dataframe and csv file.
df = pd.DataFrame(review_dates)
df.to_csv('watsons_dates.csv')

In [37]:
df

Unnamed: 0,0
0,<selenium.webdriver.remote.webelement.WebEleme...
1,"{'reviews': '2021-06-13 12:32', 'link': 'https..."


In [39]:
review_dates

[<selenium.webdriver.remote.webelement.WebElement (session="dfa76599fe35e970be2f42d4742f1eb5", element="00447cb8-577b-40e7-8011-cec347762585")>,
 {'reviews': '2021-06-13 12:32',
  'link': 'https://shopee.sg/Kose-Cosmeport-Clear-Turn-Premium-Royal-Gelee-Mask-Vitamin-C-i.195238920.7879773923?sp_atk=8286cc5b-28a1-494d-b0d3-c0b23bd256e7'}]

\- **END OF DATA SCRAPING** -