# AMAZON CUSTOMER REVIEW

## 1.0 BUSINESS UNDERSTANDING

In the online market of today, customer reviews are an essential part of purchasing decisions. Amazon, being a giant online store, collects millions of product reviews that indicate customer satisfaction, product quality, and overall user experience. It is not efficient, however, to process such vast data manually and is time-consuming.

Sentiment analysis enables companies to analyze customers' feedback automatically, extract meaningful information, and make knowledgeable

## 1.1 PROBLEM STATEMENT

Amazon gets millions of reviews, and it's not possible to read them manually. We need an automated system for sentiment analysis to categorize the reviews as positive, negative, or neutral and also to gain insightful information too.

## 1.2 OBJECTIVES

## 1.2.1 Main Objectives

To accurately determine the overall emotional tone (positive, negative, or neutral) of customer reviews.

## 1.2.2 Specific Objectives

* Identify trends in customer satisfaction.

* Improve customer experience by addressing negative feedback.

* Help businesses optimize their product offerings based on user sentiment.


## 1.3 Business Questions

* What percentage of customer reviews are positive, negative, or neutral?
* Are there specific features or keywords associated with  reviews?
* Can sentiment analysis help predict potential or customer dissatisfaction?
* Can the  business use sentiment insights to improve product quality and customer support?


## 1.4 Metric of Success

# 2.0 DATA UNDERSTANDING

The dataset used for this sentiment analysis project consists of Amazon product reviews, which provide insights into customer opinions about various products. It contains 1,597 records with 27 columns, capturing details about the product, review content and user feedback.


The dataset comprises of the following columns:

id → Unique identifier for each review.

asins → Amazon Standard Identification Number (ASIN) of the product.

brand → Brand of the product.

categories → Product categories (e.g., "Amazon Devices").

colors → Available colors of the product (often missing).

dateAdded → Date the review was added to the dataset.

dateUpdated → Date the review was last updated.

dimension → Physical dimensions of the product.

manufacturer → Manufacturer of the product.

manufacturerNumber → Manufacturer’s product number.

name → Product name.

prices → Pricing details of the product.

reviews.date → Date when the review was posted.

reviews.doRecommend → Whether the reviewer recommends the product (Yes/No).

eviews.numHelpful → Number of users who found the review helpful.

reviews.rating → Star rating given by the reviewer (1 to 5).

reviews.sourceURLs → URL of the original review page.

reviews.text → Full text of the review (Main feature for sentiment analysis).

reviews.title → Title of the review (Summary of the review).

reviews.username → Username of the reviewer.

reviews.userCity → City of the reviewer (Mostly missing).

reviews.userProvince → Province of the reviewer (Mostly missing).

sizes → Available sizes of the product (Mostly empty).

upc → Universal Product Code (UPC).
                                        
weight → Weight of the product.

### 2.1 Exploring The Dataset

In [13]:
##import the relevant libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings("ignore")

from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.metrics import roc_curve, auc
pd.set_option('display.max_colwidth', None)