# Web Scraping Amazon Product for Sentiment Analysis


For this project, I will be web scraping an Amazon product for the usage of sentiment analysis. The data will be scraped and made into a dataset that will be primarily comprised of the User's profile name, the reviewer star rating, the review, and the review of the summary. Using the VADER and Roberta model, I should be able to analyze the sentiment of User's review and compare the reviewer star rating and the sentiment of the review.


For this project, I will primarily focus on one product, [COSRX Snail Mucin](https://www.amazon.com/COSRX-Repairing-Hydrating-Secretion-Phthalates/dp/B00PBX3L7K/ref=cm_cr_arp_d_product_top?ie=UTF8), for which I will do web scraping and sentiment anaylsis on.

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np

In [4]:
# Functions to extract data

# Function to extract Product Title
def get_title(soup):
    try:
        # Outer Tag Object
        title = soup.find('span', attrs = {'class':'a-size-large product-title-word-break'})

        # Inner NavigatableString Object
        title_value = title.txt

        # Title as a string value
        title_string = title_value.strip()

    except AttributeError:
        title_string = ""

    return title_string

# Function to extract Profile Name
def get_profile_name(soup):
    try:
        # Outer Tag Object
        profile = soup.find('span', attrs = {'class':'a-profile-name'})

        # Inner NavigatableString Object
        profile_value = profile.string()

        # Profile as a string value
        profile_string = profile_value.strip()

    except AttributeError:
        profile_string = ""

    return title_string

# Function to extract Reviewer Star Rating

def get_rating(soup):
    try:
        rating = soup.find('i', attrs = {'data-hook':'review-star-rating'})

    except AttributeError:
        rating = ''

# Function to extract Review

def get_review(soup):
    try:
        # Outer Tag Object
        review = soup.find('span', attrs = {'class':'a-size-base review-text review-text-content'})

        # Inner NavigatableString Object
        review_value = review.string()

        # Profile as a string value
        review_string = review_value.strip()

    except AttributeError:
        review_string = ""

    return review_string

# Function to extract Review Summary

def get_review_summary(soup):
    try:
        # Outer Tag Object
        summary = soup.find('span', attrs = {'class':'a-letter-space'})

        # Inner NavigatableString Object
        summary_value = summary.string()

        # Profile as a string value
        summary_string = summary.strip()

    except AttributeError:
        summary_string = ""

    return summary_string

# Extracting Data

Data will be extracted here in a clean way by going through each page of the reviews and putting them into a dataframe


In [None]:
__name__ == '__main__'

# User Agent

HEADERS = ({'User-Agent': '', 'Accept-Lanugage': 'en-US, en;q=0.5'})

# Webpage URL

URL = "https://www.amazon.com/COSRX-Repairing-Hydrating-Secretion-Phthalates/dp/B00PBX3L7K/ref=cm_cr_arp_d_product_top?ie=UTF8"
REVIEWS_URL = "https://www.amazon.com/COSRX-Repairing-Hydrating-Secretion-Phthalates/product-reviews/B00PBX3L7K/ref=cm_cr_arp_d_paging_btm_next_2?ie=UTF8&reviewerType=all_reviews&pageNumber=1"

# HTTP Request

 webpage = requests.get(URL, headers=HEADERS)

# Soup Object containing all data

soup = BeautifulSoup(webpage.content, "html.parser")