> # Scrape Nike Reviews

In this notebook, our goal is to download all customer reviews related to a selected product from Nike's website.

To achieve this, we will create a function named `scrape()` with the following functionalities:

1. Accept a query (a word or short phrase) as a parameter.


2. Use Selenium to perform the following tasks:
   - Submit the query to the website's search box.
   - Retrieve the list of matching products.
   - Access the first product on the list.
   - Download all its reviews into a CSV file.


3. For each review, the function should extract and store the following information:
   - Text of the review.
   - Rating given by the customer.
   - Date of the review.

The resulting CSV file will have one line per review, with three fields per line.


**Import Libraries**

In [1]:
#import libraries
import pandas as pd
import numpy as np

#!pip3 install -U selenium
from selenium import webdriver #allows you to launch/initiate a browser
from selenium.webdriver.support.ui import WebDriverWait #allows you to wait for a page to load
from selenium.webdriver.support import expected_conditions as EC #specify what you are looking for on a specific page in order to determine that the webpage has loaded.
from selenium.webdriver.common.by import By #allows you to search for things using specific parameters
from selenium.common.exceptions import NoSuchElementException#handles the elements that are not existiing
from selenium.webdriver.chrome.service import Service

#!pip3 install webdriver-manager
from webdriver_manager.chrome import ChromeDriverManager

from datetime import datetime
#!pip install googletrans 
from googletrans import Translator
import re, time,csv
import urllib
import requests
from termcolor import colored

import tkinter as tk
from tkinter import simpledialog

from PIL import Image
from io import BytesIO

**Select a Nike's product**

In [2]:
#input window
root = tk.Tk().withdraw()
    
#product selected from input box
product = simpledialog.askstring(title = 'Nike product',prompt = "Type a Nike's Product") 
product

'airforce 1 07'

**Download reviews for selected product**

In [3]:
def scrape(product:str #query of the choosen product
          ,delay:int = 5 #number of seconds to wait
          ):
    
    '''
    Input: 
    A query  containing the product we have chosen

    Function: 
    Opens a web driver, gooes to Nike's website and finds the reviews for the first product based on the query.
    Next, downloads all the reviews, in a CSV file, by scanning all pages
    
    Output:
    A CSV file, containg all the reviews for the given product
    '''    
    #select webdriver
    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
    
    #maximize window
    driver.maximize_window()
    
    #create the url
    url = 'https://www.nike.com/gr/'
    driver.get(url)

    #accept cookies
    cookies = driver.find_element(By.ID,"hf_cookie_text_cookieAccept")
    cookies.click()
    
    #find search box
    search_box = driver.find_element(By.ID,"VisualSearchInput")
    
    #insert product into search box
    search_box.send_keys(product)
    
    #find search button
    search_button = driver.find_element(by=By.CSS_SELECTOR, value='[data-var="vsButton"]')
    
    #click search button
    search_button.click()
    
    #access the first product of the list
    first_prod = driver.find_element(by=By.CSS_SELECTOR, value='[data-product-position="1"]')
    
    #click the first product of the list
    first_prod.click()
    
    #open reviews ribbon
    open_reviews_ribbon = driver.find_element(by=By.CSS_SELECTOR, value='[data-test="reviewsAccordionClick"]')
    
    #click reviews ribbon
    open_reviews_ribbon.click()
    
    #find reviews button
    more_reviews_button = driver.find_element(by=By.CSS_SELECTOR, value='[data-test="more-reviews"]')

    #click reviews button
    more_reviews_button.click()    
    
    #wait for a few seconds to load page
    time.sleep(10)
    
    #save screenshot of entire page in order to create a picture with photo, product name & price
    png = driver.get_screenshot_as_png() 
    
    #open image in memory with PIL library
    im = Image.open(BytesIO(png))
    
    #define crop points
    left, top, right, bottom = 0, 0, 300, 150
    
    #crop the image to create a new picture with photo, product name & price
    im = im.crop((left, top, right, bottom))
    
    #save new cropped image
    im.save('./images_for_summary/screenshot.png')
        
    #open a new csv writer
    fw = open('nike '+ product + ' reviews.csv','w',encoding='utf8')
    writer = csv.writer(fw,lineterminator='\n')
    writer.writerow(['rating','content','review_date'])    
           
    while True: #keep going until there are no more review pages
    
        #scroll down
        driver.execute_script('window,scrollTo(0,document.body.scrollHeight)')
        
        #get all the reviews in the page
        reviews = driver.find_elements(by=By.CSS_SELECTOR, value='[class="tt-c-review"]')
            
        for review in reviews:
                
            #initialize key attributes
            rating, content,  review_date = 'NA', 'NA', 'NA'
                
            ######  ratings  #####

            #try to find the rating box
            try: 
                ratingBox = review.find_element(by=By.CSS_SELECTOR, value='[class="tt-u-clip-hide"]')
                                
            except NoSuchElementException:
                ratingBox = None
            
            #box found
            if ratingBox: 
                
                #find all the non-empty stars, indicating the rating
                ratings = review.find_elements(by=By.CSS_SELECTOR, \
                          value='[class="tt-o-icon tt-o-icon--star--full tt-o-icon--lg tt-c-rating__icon tt-c-rating__icon"]')
            
                #count the non-empty stars
                rating = len(ratings)              
                
            ##### content #####
                
            try: # try to find the content box
                contentBox = review.find_element(by=By.CSS_SELECTOR, value='[class="tt-c-review__text-content"]')
                
            except NoSuchElementException:
                contentBox = None
                
            #box found, extract text
            if contentBox: content = contentBox.text
                
            ##### review_date #####
                
            try: # try to find the review_date box
                review_dateBox = review.find_element(by=By.CSS_SELECTOR, value='[class="tt-c-review__date tt-u-mb--sm"]')
                
            except NoSuchElementException:
                review_dateBox = None
                
            #box found, extract text
            if review_dateBox: review_date = review_dateBox.text
                
            #write a new row
            writer.writerow([rating, content,  review_date])
                
        #wait until the next Button loads
        nextButton = driver.find_element(by=By.CSS_SELECTOR, value='[aria-label="Μετάβαση σε reviews μετά"]')
        
        if 'disabled' in nextButton.get_attribute('class'): # final page reached, 'next' button is disabled on this page
            break    
            
        #click on the next Button
        nextButton.click()
        
        #wait for a few seconds
        time.sleep(3)
         
    fw.close()
    
    print(colored('Reviews have been successfully downloaded', 'green'))
    
    #import csv
    reviews_eng = pd.read_csv(rf'.\nike {product} reviews.csv')
    
    #define translator
    translator = Translator()
    
    #function to translate all the reviews in english
    def translate_reviews_in_english(reviews_eng):
    
        #define function for translation
        def en_translator(content):

            #translate each review
            a = translator.translate(content , dest ='en').text

            return a

        #translate reviews in english
        reviews_eng['content_translated'] = reviews_eng.content.apply(en_translator)
        
        #keep wanted columns
        reviews_eng = reviews_eng.loc[:,('rating','content_translated','review_date')]
        
        #rename columns
        reviews_eng.columns = ['rating','content','review_date']
        
        #export to csv
        reviews_eng.to_csv(rf'.\nike {product} reviews.csv', index = False)

        return print(colored('Reviews have been translated in english succesfully', 'green'))
    
    #run function for translation
    reviews_eng = translate_reviews_in_english(reviews_eng)
    
    return 

#start time
start = datetime.now()

#scraping
_ = scrape(product)

#end time
end = datetime.now()

#total execution time
execution_time = end-start
print('The total execution time was:',execution_time)

[WDM] - Downloading: 100%|████████████████████████████████████████████████████████| 6.58M/6.58M [00:00<00:00, 13.0MB/s]


[32mReviews have been successfully downloaded[0m
[32mReviews have been translated in english succesfully[0m
The total execution time was: 0:06:53.678566


**Preview downloaded reviews**

In [4]:
#import csv
reviews = pd.read_csv(rf'.\nike {product} reviews.csv')
reviews

Unnamed: 0,rating,content,review_date
0,5,"These are simply the best shoes of all time, e...",19 Νοε 2022
1,4,An iconic pair,17 Νοε 2022
2,5,I like the shoes because they are comfortable ...,16 Νοε 2022
3,5,Best shoes ever.,16 Νοε 2022
4,1,best shoe ever but i need size mens 19s,7 Νοε 2022
...,...,...,...
832,4,The shoes look cool and are extremely comforta...,22 Ιουν 2016
833,2,The sole inside gets crashed after 5 months. I...,6 Ιουν 2016
834,5,I'm a mother of 4 boys who are older now (24-3...,6 Ιουν 2016
835,5,One of the best shoes nike has ever made. A cl...,20 Μαΐ 2016
