# üìù Customer Review Analysis 
This notebook analyzes customer reviews from a womens clothing e-commerce to extract sentiment and insights using Python.

In [1]:
# üì¶ Import libraries

# Data handling
import pandas as pd
import numpy as np

# NLP and Text Processing
import re
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from textblob import TextBlob
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from nrclex import NRCLex  
from transformers import pipeline

# Vectorization and Topic Modeling 
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.decomposition import LatentDirichletAllocation, PCA


# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
from wordcloud import WordCloud


In [2]:
# üì• Load dataset
df = pd.read_csv('/Users/olgabencomo/Desktop/Proyectos Portafolio/Womens Clothing E-Commerce Reviews.csv')
df.head()

Unnamed: 0.1,Unnamed: 0,Clothing ID,Age,Title,Review Text,Rating,Recommended IND,Positive Feedback Count,Division Name,Department Name,Class Name
0,0,767,33,,Absolutely wonderful - silky and sexy and comf...,4,1,0,Initmates,Intimate,Intimates
1,1,1080,34,,Love this dress! it's sooo pretty. i happene...,5,1,4,General,Dresses,Dresses
2,2,1077,60,Some major design flaws,I had such high hopes for this dress and reall...,3,0,0,General,Dresses,Dresses
3,3,1049,50,My favorite buy!,"I love, love, love this jumpsuit. it's fun, fl...",5,1,0,General Petite,Bottoms,Pants
4,4,847,47,Flattering shirt,This shirt is very flattering to all due to th...,5,1,6,General,Tops,Blouses


## üßπ Data Cleaning 

In [3]:
df = df.drop ('Unnamed: 0', axis = 1)

In [4]:
df.apply(lambda x: sum(x.isnull()), axis=0)

Clothing ID                   0
Age                           0
Title                      3810
Review Text                 845
Rating                        0
Recommended IND               0
Positive Feedback Count       0
Division Name                14
Department Name              14
Class Name                   14
dtype: int64

In [5]:
# Remove rows with missing or blank reviews
df = df.dropna(subset=["Review Text"])
df = df[df["Review Text"].str.strip() != ""]
print("Dataset shape after removing blank reviews:", df.shape)

# Remove rows with missing Division/Department/Class 
df.dropna(subset=["Division Name", "Department Name", "Class Name"], inplace=True)

Dataset shape after removing blank reviews: (22641, 10)


Some rows in the dataset have missing values in the key categorical columns:

- `Division Name`  
- `Department Name`  
- `Class Name`  

These columns are important for grouping and segmenting reviews by product category. When performing sentiment analysis or building dashboards in Power BI, missing values in these columns would:

- Prevent proper aggregation by category.  
- Lead to incomplete or misleading visualizations.  
- Make filters and segmentations inconsistent.

Since these rows have no metadata at all for their `Clothing ID`, they cannot be reliably filled or used for category-level analysis.  

**Therefore, we remove them** to ensure that the dataset used for analysis contains only reviews with complete metadata, improving data quality and the reliability of our results.

In [6]:
df.apply(lambda x: sum(x.isnull()), axis=0)

Clothing ID                   0
Age                           0
Title                      2966
Review Text                   0
Rating                        0
Recommended IND               0
Positive Feedback Count       0
Division Name                 0
Department Name               0
Class Name                    0
dtype: int64

In [7]:
# Clean the review text
stop_words = set(stopwords.words('english'))  

def clean_text(text):
    text = str(text).lower()                       # lowercase
    text = re.sub(r"http\S+|www\S+|https\S+", '', text)  # remove URLs
    text = re.sub(r"[^a-zA-Z\s]", '', text)       # remove punctuation/numbers
    text = re.sub(r"\s+", ' ', text).strip()      # remove extra spaces
    
   
    
    tokens = word_tokenize(text)
    
    tokens = [word for word in tokens if word not in stop_words]
    
   
    return ' '.join(tokens)

df['Clean_Review'] = df['Review Text'].apply(clean_text)

In [8]:
df

Unnamed: 0,Clothing ID,Age,Title,Review Text,Rating,Recommended IND,Positive Feedback Count,Division Name,Department Name,Class Name,Clean_Review
0,767,33,,Absolutely wonderful - silky and sexy and comf...,4,1,0,Initmates,Intimate,Intimates,absolutely wonderful silky sexy comfortable
1,1080,34,,Love this dress! it's sooo pretty. i happene...,5,1,4,General,Dresses,Dresses,love dress sooo pretty happened find store im ...
2,1077,60,Some major design flaws,I had such high hopes for this dress and reall...,3,0,0,General,Dresses,Dresses,high hopes dress really wanted work initially ...
3,1049,50,My favorite buy!,"I love, love, love this jumpsuit. it's fun, fl...",5,1,0,General Petite,Bottoms,Pants,love love love jumpsuit fun flirty fabulous ev...
4,847,47,Flattering shirt,This shirt is very flattering to all due to th...,5,1,6,General,Tops,Blouses,shirt flattering due adjustable front tie perf...
...,...,...,...,...,...,...,...,...,...,...,...
23481,1104,34,Great dress for many occasions,I was very happy to snag this dress at such a ...,5,1,0,General Petite,Dresses,Dresses,happy snag dress great price easy slip flatter...
23482,862,48,Wish it was made of cotton,"It reminds me of maternity clothes. soft, stre...",3,1,0,General Petite,Tops,Knits,reminds maternity clothes soft stretchy shiny ...
23483,1104,31,"Cute, but see through","This fit well, but the top was very see throug...",3,0,1,General Petite,Dresses,Dresses,fit well top see never would worked im glad ab...
23484,1084,28,"Very cute dress, perfect for summer parties an...",I bought this dress for a wedding i have this ...,3,1,2,General,Dresses,Dresses,bought dress wedding summer cute unfortunately...


## üìà Sentiment Analysis

VADER Sentiment

In [9]:
analyzer = SentimentIntensityAnalyzer()

df[['vader_neg','vader_neu','vader_pos','vader_compound']] = df['Clean_Review'].apply(lambda x: pd.Series(analyzer.polarity_scores(str(x))))

In [10]:
df.head()

Unnamed: 0,Clothing ID,Age,Title,Review Text,Rating,Recommended IND,Positive Feedback Count,Division Name,Department Name,Class Name,Clean_Review,vader_neg,vader_neu,vader_pos,vader_compound
0,767,33,,Absolutely wonderful - silky and sexy and comf...,4,1,0,Initmates,Intimate,Intimates,absolutely wonderful silky sexy comfortable,0.0,0.154,0.846,0.8991
1,1080,34,,Love this dress! it's sooo pretty. i happene...,5,1,4,General,Dresses,Dresses,love dress sooo pretty happened find store im ...,0.0,0.503,0.497,0.971
2,1077,60,Some major design flaws,I had such high hopes for this dress and reall...,3,0,0,General,Dresses,Dresses,high hopes dress really wanted work initially ...,0.037,0.698,0.264,0.9062
3,1049,50,My favorite buy!,"I love, love, love this jumpsuit. it's fun, fl...",5,1,0,General Petite,Bottoms,Pants,love love love jumpsuit fun flirty fabulous ev...,0.171,0.185,0.644,0.9464
4,847,47,Flattering shirt,This shirt is very flattering to all due to th...,5,1,6,General,Tops,Blouses,shirt flattering due adjustable front tie perf...,0.0,0.494,0.506,0.9062


In [11]:
#Define Vader labels from score

def get_vader_label(text, pos_thresh=0.05, neg_thresh=-0.05):
    score = analyzer.polarity_scores(text)["compound"]
    if score >= pos_thresh:
        return "positive"
    elif score <= neg_thresh:
        return "negative"
    else:
        return "neutral"

df["vader_score"] = df["Clean_Review"].apply(lambda x: analyzer.polarity_scores(str(x))["compound"])
df["vader_label"] = df["Clean_Review"].apply(get_vader_label)

Transformers Pipeline for sentiment prediction

In [12]:
# Hugging Face
sent_pipeline = pipeline(
    "sentiment-analysis",
    model="cardiffnlp/twitter-roberta-base-sentiment",  # has neutral
    framework="pt"
)

df["hf_sentiment"] = df["Clean_Review"].apply(lambda x: sent_pipeline(str(x))[0])
df["hf_label"] = df["hf_sentiment"].apply(lambda x: x['label'])


Xformers is not installed correctly. If you want to use memorry_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


TextBlob: Subjectivity

In [13]:

df['Subjectivity'] = df['Clean_Review'].apply(lambda x: TextBlob(x).sentiment.subjectivity)

#  Subjectivity Label
def subjectivity_label(s):
    if s > 0.5:
        return "Subjective"
    else:
        return "Objective"

df['Subjectivity_Label'] = df['Subjectivity'].apply(subjectivity_label)

In [14]:
df.head()

Unnamed: 0,Clothing ID,Age,Title,Review Text,Rating,Recommended IND,Positive Feedback Count,Division Name,Department Name,Class Name,...,vader_neg,vader_neu,vader_pos,vader_compound,vader_score,vader_label,hf_sentiment,hf_label,Subjectivity,Subjectivity_Label
0,767,33,,Absolutely wonderful - silky and sexy and comf...,4,1,0,Initmates,Intimate,Intimates,...,0.0,0.154,0.846,0.8991,0.8991,positive,"{'label': 'LABEL_2', 'score': 0.9709229469299316}",LABEL_2,0.933333,Subjective
1,1080,34,,Love this dress! it's sooo pretty. i happene...,5,1,4,General,Dresses,Dresses,...,0.0,0.503,0.497,0.971,0.971,positive,"{'label': 'LABEL_2', 'score': 0.9810240268707275}",LABEL_2,0.725,Subjective
2,1077,60,Some major design flaws,I had such high hopes for this dress and reall...,3,0,0,General,Dresses,Dresses,...,0.037,0.698,0.264,0.9062,0.9062,positive,"{'label': 'LABEL_2', 'score': 0.5937286615371704}",LABEL_2,0.345866,Objective
3,1049,50,My favorite buy!,"I love, love, love this jumpsuit. it's fun, fl...",5,1,0,General Petite,Bottoms,Pants,...,0.171,0.185,0.644,0.9464,0.9464,positive,"{'label': 'LABEL_2', 'score': 0.9854844212532043}",LABEL_2,0.625,Subjective
4,847,47,Flattering shirt,This shirt is very flattering to all due to th...,5,1,6,General,Tops,Blouses,...,0.0,0.494,0.506,0.9062,0.9062,positive,"{'label': 'LABEL_2', 'score': 0.9338776469230652}",LABEL_2,0.658333,Subjective


Get Emotion with NRCLex

In [15]:
# Get all detected emotions 
def get_emotions_only(text):
    emotion_obj = NRCLex(text)
    return [e[0] for e in emotion_obj.top_emotions] if emotion_obj.top_emotions else []

# Get the primary emotion
def top_emotion(text):
    emotion_obj = NRCLex(text)
    if emotion_obj.raw_emotion_scores:
        return max(emotion_obj.raw_emotion_scores, key=emotion_obj.raw_emotion_scores.get)
    else:
        return None


df['Emotions'] = df['Clean_Review'].apply(get_emotions_only)

df['Primary_Emotion'] = df['Clean_Review'].apply(top_emotion)


In [16]:
df

Unnamed: 0,Clothing ID,Age,Title,Review Text,Rating,Recommended IND,Positive Feedback Count,Division Name,Department Name,Class Name,...,vader_pos,vader_compound,vader_score,vader_label,hf_sentiment,hf_label,Subjectivity,Subjectivity_Label,Emotions,Primary_Emotion
0,767,33,,Absolutely wonderful - silky and sexy and comf...,4,1,0,Initmates,Intimate,Intimates,...,0.846,0.8991,0.8991,positive,"{'label': 'LABEL_2', 'score': 0.9709229469299316}",LABEL_2,0.933333,Subjective,"[trust, surprise, positive, joy]",joy
1,1080,34,,Love this dress! it's sooo pretty. i happene...,5,1,4,General,Dresses,Dresses,...,0.497,0.9710,0.9710,positive,"{'label': 'LABEL_2', 'score': 0.9810240268707275}",LABEL_2,0.725000,Subjective,[positive],positive
2,1077,60,Some major design flaws,I had such high hopes for this dress and reall...,3,0,0,General,Dresses,Dresses,...,0.264,0.9062,0.9062,positive,"{'label': 'LABEL_2', 'score': 0.5937286615371704}",LABEL_2,0.345866,Objective,[negative],negative
3,1049,50,My favorite buy!,"I love, love, love this jumpsuit. it's fun, fl...",5,1,0,General Petite,Bottoms,Pants,...,0.644,0.9464,0.9464,positive,"{'label': 'LABEL_2', 'score': 0.9854844212532043}",LABEL_2,0.625000,Subjective,"[positive, joy]",joy
4,847,47,Flattering shirt,This shirt is very flattering to all due to th...,5,1,6,General,Tops,Blouses,...,0.506,0.9062,0.9062,positive,"{'label': 'LABEL_2', 'score': 0.9338776469230652}",LABEL_2,0.658333,Subjective,"[positive, joy]",joy
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
23481,1104,34,Great dress for many occasions,I was very happy to snag this dress at such a ...,5,1,0,General Petite,Dresses,Dresses,...,0.616,0.8979,0.8979,positive,"{'label': 'LABEL_2', 'score': 0.9387944340705872}",LABEL_2,0.861111,Subjective,"[surprise, positive, negative, joy]",joy
23482,862,48,Wish it was made of cotton,"It reminds me of maternity clothes. soft, stre...",3,1,0,General Petite,Tops,Knits,...,0.320,0.7579,0.7579,positive,"{'label': 'LABEL_2', 'score': 0.8785752058029175}",LABEL_2,0.708333,Subjective,"[positive, joy]",joy
23483,1104,31,"Cute, but see through","This fit well, but the top was very see throug...",3,0,1,General Petite,Dresses,Dresses,...,0.491,0.9100,0.9100,positive,"{'label': 'LABEL_2', 'score': 0.7970527410507202}",LABEL_2,0.645833,Subjective,"[positive, anticipation]",anticipation
23484,1084,28,"Very cute dress, perfect for summer parties an...",I bought this dress for a wedding i have this ...,3,1,2,General,Dresses,Dresses,...,0.279,0.8272,0.8272,positive,"{'label': 'LABEL_1', 'score': 0.4779038429260254}",LABEL_1,0.525000,Subjective,[positive],positive


In [17]:
df.to_excel("reviews_sentiment_analysis.xlsx", index=False)