# Review Summarization

1. Select an English-speaking website that hosts customer reviews on products (or services, businesses, movies, events, etc).

2. Make sure that the website includes a free-text search box that users can use to search for products.

3. Email me your selection at ted@aueb.gr. Each student should work on a different website, so I will maintain the list of selected websites at the top of our Wiki. First come, first served.

4. Create a first Python Notebook with a function called scrape( ). The function should accept as a parameter a query (a word or short phrase).  The function should then use selenium to:

   * submit the query to the website's search box and retrieve the list of matching products.
   * access the first product on the list and download all its reviews into a csv file. For each review, the function should get the text, the rating, and the date. One line per review, 3 fields per line.

5. Create a second Python Notebook with a function called summarize( ). The function should accept as a parameter the path to a csv file created by the first Notebook. It should then create a 1-page pdf file that includes a summary of all the reviews in the csv.

The nature of the summary is entirely up to you. It can be text-based, visual-based, or a combination of both.
It is also up to you to define what is important enough to be included in the summary.
Focus on creating a summary that you think would be the most informative for customers.
The creation of the pdf should be done through the notebook.
You can use whatever Python-based library that you want.


---

> Chalkiopoulos Georgios, Electrical and Computer Engineer NTUA <br />
> Data Science postgraduate Student <br />
> gchalkiopoulos@aueb.gr

## Install Libraries

In [None]:
"""
!pip install -U selenium
!pip install webdriver-manager
"""

## Imports

In [36]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.webdriver import WebDriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common import NoSuchElementException
from selenium.webdriver.support import expected_conditions as EC

from pathlib import Path
import csv
from typing import List, TextIO

In [55]:



class AmazonScrapper:
    """Class that scrapes amazon reviews. Searches for the first product (user defined) and saves the rating text, the rating score and the date"""
    website: str = "https://www.amazon.co.uk/"


    def __init__(self,
                 query: str,
                 driver: WebDriver,
                 output_file: str = "amazon_reviews.csv",
                 wait: int = 5
                 ):
        self.query = query
        self.driver = driver
        self.output_file: Path = Path(output_file)
        self.wait = wait

    def _writer(self) -> csv.writer:
        """Initiates a csv.writer method and returns it"""

        # open a new csv writer
        fw: TextIO = self.output_file.open(mode="w",encoding="utf8")
        writer = csv.writer(fw,lineterminator="\n")
        writer.writerow(["text", "rating", "date"])
        return writer


    def get_reviews(self) -> None:
        """Main method that performs needed steps to get the reviews"""
        self._load_main_page()
        self._accept_cookies()
        self._apply_query()

        self._find_product()

        # self.driver.quit()


    def _load_main_page(self) -> None:
        """Loads main page"""
        print(f"Initialize website: {self.website}.")
        self.driver.maximize_window()
        self.driver.get(self.website)


    def _accept_cookies(self) -> None:
        """Try to accept cookies"""
        WebDriverWait(WebDriver, self.wait)
        accept_box = self.driver.find_element(by=By.ID, value="sp-cc-accept")
        accept_box.click()


    def _apply_query(self) -> None:
        """Find the search box and apply the query"""

        # find search box
        search_box = self.driver.find_element(by=By.ID, value="twotabsearchtextbox")
        search_box.send_keys(self.query)

        # press search button
        search = self.driver.find_element(by=By.ID, value="nav-search-submit-button")
        search.click()

    def _find_product(self) -> None:
        """finds the first non-sponsored product"""
        items = WebDriverWait(driver,self.wait).until(EC.presence_of_all_elements_located((By.XPATH, '//div[contains(@class, "s-result-item s-asin")]')))
        # products = self.driver.find_elements(by=By.CSS_SELECTOR, value="[data-component-type='s-search-result']")
        for product in items:

            try:
                product.find_element(by=By.XPATH, value='//div/div/div/div/div/div/div/div/div/h2/a/span[1]')
                print(product.text)
            except NoSuchElementException:
                print("not found class_name")
                pass





In [None]:
query: str = "adidas"
driver: WebDriver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))

AmazonScrapper(query=query, driver=driver).get_reviews()

Initialize website: https://www.amazon.co.uk/.
