Skip to content

This is the repository for the Amazon Review Sentiment Analysis project that I did as my 5th semester minor project. This project incorporates a scrapper that scrapes the Amazon Reviews of any desired product using its ASIN number, and performs sentiment analysis on the scraped reviews.

harshtandiya/amazon-review-sentiment-project

Repository files navigation

Amazon Review Sentiment Analysis

November 2022 | Harsh Tandiya

In this project, the idea is to extract the reviews of any amazon product and perform sentiment analysis over it. This can help determine how the customers are feelings

This project has 2 parts to it:

  • Scraping Reviews
  • Sentiment Analysis

1. Scraping Reviews

The project has a function to scrape the reviews for any required Amazon product. For this, the program requires the ASIN number of the Amazon product. ASIN numbers are unique ID provided by Amazon to each of their products (except the e-books). ASIN number is usually found in the "Additional Information" section of a product as shown below: image

It can also be found in the amazon website url of the product: image

We have used BeautifulSoup library for pulling out and parsing the HTML & XML data of the webpages. By doing this, we can directly get the review texts from the HTML of the website. Requests module is used to create HTML sessions with the target websites and URLs. Pandas is used throughout the project for performing the standard operations on dataframes.

2. Sentiment Analysis

After scraping all the reviews for a particular product and storing it in a .csv or .xlsx format. We read this excel file into a dataframe, and further perform sentiment analysis on this review data that we extracted.

We have performed sentiment analysis by using the pretrained roBERTa model by HuggingFace. Different pipelines are provided by HuggingFace, that are used in project for accurate sentiment analysis. The output of this sentiment analysis is then visualized using seaborn and matplotlib libraries.

How does it work?

The working of this model is straightforward.

  • import the pretrained Roberta model to be used.
  • model should be given the pretrained twitter sentiment data to learn from.
  • Read the excel data that was extracted with teh helo if the scraper
  • For each review, run it through a scoring function which will perform sentiment analysis on it and give it the respective positive, neutral and negative points along with a 'POSITIVE' or 'NEGATIVE' remark
  • Store this review score data in a dataframe, and merge it with the review dataframe with ID as common column
  • Remove the inaccurate results, such as 1 star positive review and 5 star negative reviews
  • Visualise the remaining data, to show meaningful data

About

This is the repository for the Amazon Review Sentiment Analysis project that I did as my 5th semester minor project. This project incorporates a scrapper that scrapes the Amazon Reviews of any desired product using its ASIN number, and performs sentiment analysis on the scraped reviews.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published