# What Makes A Good Product?

E-commerce has grown a lot and people have gone more and more reliant on it over brick and mortar stores. Mainly because of how easy it is to order, and it wastes very little time. However, the main drawback of ordering anything online is the customer's perception of the product and the actual thing, which is why people are more reliant on reviews to see if it's worth buying. Reviews are also important in the business to determine if the product is good enough and what can be improved. This can also indirectly affect sales as products with more negative reviews would less likely get sold, same logic can also be said for positive reviews.

This analysis will revolve around the performance of the product, and how other factors such as age groups and product category affect the overall quality.

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

#data visualization
import matplotlib.pyplot as plt 
import matplotlib.patches as mpatches #create custom legends
plt.style.use("ggplot")

#text pre-processing
from nltk.corpus import stopwords
from nltk.tokenize import WhitespaceTokenizer, word_tokenize #tokenization of words
from nltk.stem import WordNetLemmatizer 
stop = stopwords.words("english")

#text visualization
from wordcloud import WordCloud, STOPWORDS #wordcloud generator for reviews
from nltk import pos_tag #Part of Speech tagging which will be used alongside the wordcloud

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
        
        
np.random.seed(92384) #for repeatability
        
df = pd.read_csv("../input/womens-ecommerce-clothing-reviews/Womens Clothing E-Commerce Reviews.csv", index_col = 0 )

In [None]:
df.columns

In [None]:
df.head()

## Does the data make any sense?

Before any analysis will be done, let's check the quality of the data. In this section the following procedures will be done:

1. Check for any NaN values in the dataset.
2. Decesing to fill in or take out the missing values.
3. Text Pre-Processing for visual anlaysis.
4. Feature engineering by adding net promoter score. This will be used to understand the distribution of the ratings.

In [None]:
df.isnull().sum()/df.shape[0]*100

The plan here is to concatenate the Title and the Review Text columns and use this single parameter as part of the review analysis. Since there are NaN values on both of them, then they should be filled in so concatenation can happen.

In [None]:
#fill
df["Title"] = df["Title"].fillna("")
df["Review Text"] = df["Review Text"].fillna("")

#concatenate the title and review text columns
df["Product Review"] = df["Title"] + " " + df["Review Text"]

In [None]:
df.isnull().sum()/df.shape[0]*100

In [None]:
df[df.isnull().any(axis = 1)].head()

Although there are still missing data, I decided to leave it out because deleting it would result to the review text being deleted as well and it is essential to store as much text data as possible. When doing analysis handling the Division, Department, and Class Name, these will be ignored.

In [None]:
#text preprocessing
df["Product Review"] = df["Product Review"].str.lower() #lowercase text - normalization
df["Product Review"] = df["Product Review"].str.replace(r"[.!?\\,-]", "") #take out all punctuation marks - because they mess up your shit, walay labot ang comma sa taas

#tokenize using whitespace andlemmatize for normalization
def lemmatize_text(text):
    lemmatizer = WordNetLemmatizer()
    w_tokenizer = WhitespaceTokenizer() #oringially used a word tokenizer but there were too many word contractions that would affect POS tagging
    return [lemmatizer.lemmatize(w) for w in w_tokenizer.tokenize(text)] 

df["Product Review"] = df["Product Review"].apply(lemmatize_text)
df["Product Review"] = df["Product Review"].apply(lambda x: [word for word in x if word not in stop])

The plan for data visualization of the text data is through the use of wordclouds. Pre-processing text is important mainly for normalization of data, so all the words of the same meaning will be grouped together, taking out the noise (punctuation marks), and use it as input for visualization.

Here are the following steps:
1. Lowercase the corpus - Main purpose is for normalization of data.
2. Taking out punctuation marks - They don't add any value when doing analysis.
3. Tokenization - In this case, white space tokenization has been used because there are a lot of conjunction words
4. Lemmatization - It turns all the text into their root word. Lemmatization has been used here instead of stemming because there's very small data that is being played with.
5. Stop word removal - They don't add any value in the sentence.

In [None]:
#this will be used to determine score of the product
condlist = [df["Rating"] >= 4, df["Rating"] == 3, df["Rating"] <= 2]
choicelist = ["Positive", "Neutral", "Negative"]

df["Net Promoter Score"] = np.select(condlist, choicelist)

Net promoter score is a measure to understand customer satisfaction of the product. The ratings have been defined from 1 through 5, with being the lowest. It would be easier to visualize the data when they are being grouped together.

In this case, here are the groups:
1. Positive - Scores that are greater than or equal to 4
2. Neutral - Scores that are equal to 3.
3. Negative - Any rating that is less than or equal to 2.


Link to [Net Promoter Score](https://www.hotjar.com/net-promoter-score/)

In [None]:
condlist = [df["Age"] <= 25, ((df["Age"] > 25) & (df["Age"] <= 40)), df["Age"] >= 41]
choicelist = ["Gen Z", "Millenials", "Baby Boomers"]

df["Age Group"] = np.select(condlist, choicelist)

The use of the age group column is to analyze which segment gets the most reviews as well as understand the distribution of reviews, how many are negative or positive, which can possibly show that some outfits are better suited for a certain age group.

In [None]:
df.head()

Now that we have cleaned up, processed, and added in new features into the dataset, data exploration is next.

# Do you like your clothes?

It is time to see how these items are perceived by the consumers. The following questions will be asked:

1. Which clothing division has the most reviews?
2. Do people generally prefer 1 group category over the other?
3. Does age affect review ratings?
4. Which clothing department has the best and worst reviews?
5. Which clothing ID has the best and worst reviews?

In [None]:
#create a table that would describe the telll the median number of reviews per item and how many items are under those
summ_dict = {"Clothing ID":"count", "Rating":"mean", "Recommended IND":"sum", "Positive Feedback Count":"sum"}
clothing_summary = df.groupby("Clothing ID").agg(summ_dict).rename(columns={"Clothing ID":"Review counts"}).reset_index()

median_clothing_rev = clothing_summary["Review counts"].median()
clothing_items_less_med = clothing_summary[clothing_summary["Review counts"] < clothing_summary["Review counts"].median()].shape[0]

print("Median number of reviews for each clothing is {:.0f} with over {} items less than or equal to {:.0f} reviews.".format(median_clothing_rev, clothing_items_less_med, median_clothing_rev))

## Which Clothing Division Has The Most Reviews?

In [None]:
color_division = ["crimson", "forestgreen", "dodgerblue"]

df["Division Name"].value_counts().plot(kind = "pie", autopct = "%.2f", figsize = (5,5), colors = color_division, labeldistance = None)
plt.title("Number of Reviews by Division", fontsize = 15)  
plt.legend(bbox_to_anchor = (1.05, 1))
plt.ylabel("");

Most popular items are coming from the general division, followed by petite, and only a small section of the demographic created a review under intimates.

## How Does Each Clothing Group Perform Against One Another?

In [None]:
fig, (ax1, ax2, ax3) = plt.subplots(ncols = 3, figsize = (10, 5))

df[df["Division Name"] == "General"]["Net Promoter Score"].value_counts().plot(kind = "pie", ax = ax1, autopct = "%.2f", colors = color_division, labeldistance = None)
df[df["Division Name"] == "General Petite"]["Net Promoter Score"].value_counts().plot(kind = "pie", ax = ax2, autopct = "%.2f", colors = color_division, labeldistance = None)
df[df["Division Name"] == "Initmates"]["Net Promoter Score"].value_counts().plot(kind = "pie", ax = ax3, autopct = "%.2f", colors = color_division, labeldistance = None)

ax1.set_title("General")
ax2.set_title("General Petite")
ax3.set_title("Intimates")

ax1.set_ylabel("")
ax2.set_ylabel("")
ax3.set_ylabel("")

plt.suptitle("Division Group Ratings", fontsize = 15, y = 0.9)
plt.legend(bbox_to_anchor = (1.05, 1));

General and General Petite have fairly the same distribution. While Intimates have a slight lead with its positive reviews.

## Does Age Group Affect Ratings?

In [None]:
age_group_score = df.groupby(["Age Group", "Net Promoter Score"]).agg({"Net Promoter Score":"count"}).rename(columns = {"Net Promoter Score":"Review Count"}).unstack().reset_index()
age_summ = df.groupby("Age Group")["Net Promoter Score"].count().reset_index() #Net Promoter Score is the total number of reviews

age_group_score = age_group_score.merge(age_summ, on = "Age Group")

age_group_score["Negative %"] = age_group_score[("Review Count", "Negative")]/age_group_score["Net Promoter Score"]*100
age_group_score["Neutral %"] = age_group_score[("Review Count", "Neutral")]/age_group_score["Net Promoter Score"]*100
age_group_score["Positive %"] = age_group_score[("Review Count", "Positive")]/age_group_score["Net Promoter Score"]*100

In [None]:
fig, (ax1, ax2) = plt.subplots(ncols = 2, figsize = (10, 6))

age_group_score[["Age Group", "Net Promoter Score"]].plot(kind = "bar", ax = ax1, legend = False, color = "darkorange")
ax1.set_title("Total Number of Reviews by Age Group", fontsize = 15)
ax1.set_xticks(np.arange(3))
ax1.set_xticklabels(["Baby Boomers", "Gen Z", "Millenials"])


age_group_score[["Age Group", "Negative %", "Neutral %", "Positive %"]].plot(kind = "barh", ax = ax2, legend = False, color = ["tomato", "gold", "mediumseagreen"])
ax2.set_title("Review Distribution by Age Group", fontsize = 15)
ax2.set_yticks(np.arange(3))
ax2.set_yticklabels(["Baby Boomers", "Gen Z", "Millenials"])
ax2.set_xlabel("% Percentage")

plt.tight_layout()
plt.legend(bbox_to_anchor = (1.05, 1));

Most outspoken group is coming from baby boomers, and the lowest number of reviews, with less than 2000 reviews, are the Gen Z. However, the distribution of reviews pper age group is roughly the same all thourghout, this means that age doesn't affect the reviews much.

In [None]:
df[["Age", "Rating"]].corr()

Pearson correlation proves that age doesn't affect the rating of the clothing.

## Which Clothing Department Has The Best And Worst Reviews?

In [None]:
#a table used to identify the number of negative and positive reviews based off of their groups
review_scores = df.groupby(["Division Name", "Department Name", "Net Promoter Score"])["Net Promoter Score"].count().unstack(fill_value = 0).reset_index()

#used a percentage based column to normalize the data,since there will be some clothing groups that have more reviews over the others
review_scores["Total Reviews"] = review_scores["Negative"] + review_scores["Neutral"] + review_scores["Positive"]
review_scores["Negative Reviews %"] = review_scores["Negative"]/review_scores["Total Reviews"]*100
review_scores["Neutral Reviews %"] = review_scores["Neutral"]/review_scores["Total Reviews"]*100
review_scores["Positive Reviews %"] = review_scores["Positive"]/review_scores["Total Reviews"]*100

In [None]:
review_scores.head()

In [None]:
#a function that will be used to create a color scheme depenedent on the Division Name
def enum_list(dataframe):

    list_name = dataframe["Division Name"].tolist()
    color_list = []

    for index in list_name:
        if index == "General": color_list.append("crimson")
        elif index == "General Petite": color_list.append("forestgreen")
        else: color_list.append("dodgerblue")
    return color_list

#create a legend that would be dependent on the Division Name
color_dict = {"General":"crimson", "General Petite":"forestgreen", "Intimates":"dodgerblue"}
handles = []

for key, value in color_dict.items():
    patch = mpatches.Patch(color=value, label=key) # manually define a new patch 
    handles.append(patch) # handles is a list, so append manual patch

In [None]:
worst_reviews = review_scores.sort_values("Negative Reviews %", ascending = False)
best_reviews = review_scores.sort_values("Positive Reviews %", ascending = False)

fig, (ax1, ax2) = plt.subplots(ncols =2, figsize = (10,5))

worst_reviews.plot(kind = "bar", x = "Department Name",  y = "Negative Reviews %", ax=ax1, legend = False, color = enum_list(worst_reviews))
ax1.axhline(y = review_scores["Negative Reviews %"].median(), color = "gray", linestyle = ":")
ax1.set_title("Negative Reviews", fontsize = 15)
ax1.set_xlabel("")

best_reviews.plot(kind = "bar", x = "Department Name", y = "Positive Reviews %", ax=ax2, legend = False, color = enum_list(best_reviews))
ax2.axhline(y = review_scores["Positive Reviews %"].median(), color = "gray", linestyle = ":")
ax2.set_title("Positive Reviews", fontsize = 15)
ax2.set_xlabel("")

plt.legend(handles = handles, bbox_to_anchor = (1.05, 1))

Insights:

1. Generally, bottoms and intimates have the least % amount of negative reviews and correlates well with a high positive % review score.
2. Women's tops such as under the trend, dresses, tops and jackets are over the median negative %. We'll check this further through the use of wordclouds and POS tagging and find out why and what are certain charateristics of these items that have high negative reviews.

## Which Clothing ID Has The Best And Worst Reviews?

In [None]:
#please add more info regarding this table
clothing_df = pd.crosstab([df["Division Name"], df["Department Name"], df["Class Name"], df["Clothing ID"]], df["Net Promoter Score"]).reset_index()

clothing_df["Total Reviews"] = clothing_df["Negative"] + clothing_df["Neutral"] + clothing_df["Positive"] 
clothing_df["Negative Reviews %"] = clothing_df["Negative"]/clothing_df["Total Reviews"] *100
clothing_df["Neutral Reviews %"] = clothing_df["Neutral"]/clothing_df["Total Reviews"] *100
clothing_df["Positive Reviews %"] = clothing_df["Positive"]/clothing_df["Total Reviews"] *100

In [None]:
worst_items = clothing_df[(clothing_df["Total Reviews"] >= 15)].sort_values("Negative Reviews %", ascending = False)
best_items = clothing_df[(clothing_df["Total Reviews"] >= 15)].sort_values("Positive Reviews %", ascending = False)

fig, (ax1, ax2) = plt.subplots(ncols = 2, figsize = (10, 5))

worst_items.head(10).plot(kind = "bar", x = "Clothing ID", y = "Negative Reviews %", color = enum_list(worst_items), ax = ax1, legend = False)
ax1.axhline(y = review_scores["Negative Reviews %"].median(), color = "gray", linestyle = ":") #baseline
ax1.set_title("Worst Reviewed Items", fontsize = 15)
ax1.set_ylabel("Percentage")

best_items.head(10).plot(kind = "bar", x = "Clothing ID", y = "Positive Reviews %", color = enum_list(worst_items), ax = ax2)
ax2.axhline(y = review_scores["Positive Reviews %"].median(), color = "gray", linestyle = ":") #baseline
ax2.set_title("Best Reviewed Items", fontsize = 15)
ax2.set_ylabel("Percentage")

plt.tight_layout()
plt.legend(handles = handles, bbox_to_anchor = (1.05, 1))

In [None]:
clothing_df[clothing_df["Total Reviews"] >= 15].sort_values("Negative Reviews %", ascending = False).head(10)

Insights:

1. Over 70% of the mos negative reviews are coming from the General Division, followed by the General Petite.
2. There are over 3 items that have a 100% positive rating, which came from the General Division. A closer look at the items, they are under the Outerwear, Fine Gauge, and Knits Class Name.

Note: Clothing ID's have been filtered out with atleast 15 reviews to have enough data for analysis.

All these items will be further explored by taking a look at their reviews.

# What makes a good/bad product?

From the previous section, we have learned the following:

1. Breaking down the scores by their Division Name, they practically have the same score distribution except for Intimates having a very slight advantage with its positive reviews.
2. The distribution of NPS scores by Age groups are practically the same. Which means that the item's review rating and a persons age show very little correlation.
3. The negative reviews are usually coming from women's tops such as: Trend, Dresses, Tops, and Jackets.
4. A lot of these items are positively reviewed. However, there are some particular products that have gained a lot of negative reviews that contributed to pulling down the overall ratings of the category group.
5. We have also identified the worst and best performing items.
6. Most reviewed items are coming from the Baby Boomers, and there's hardly any interaction coming from the Gen Z.

Let's dig a little bit deeper now by understanding the review text. Visual analysis of the text would be done by word clouds. However, since we want to know more of **WHY** the products are doing well and **WHAT** is in the product that satisfies people then we have to find a way to separate the words out. POS tagging will be used where all noun words will be placed into a single list to understand the how, while the adjectives will be added under another list to understand the why of the product.

In [None]:
#a function that would output a joined list usable for wordclouds
def create_wordcloud(filt_df, pos, axis):
    
    filt_df = filt_df.apply(pos_tag)
    flatlist = []
    
    for index in filt_df:
        for df_index in index:
            if pos == "adjective":
                if (df_index[1] == "JJ" or df_index[1] == "JJR" or df_index[1] == "JJS"):
                    flatlist.append(df_index[0])
            elif pos == "noun":
                if (df_index[1] == "NN" or df_index[1] == "NNS" or df_index[1] == "NNP" or df_index[1] == "NNPS" or df_index[1] == "PRP" or df_index[1] == "PRP$"):
                    flatlist.append(df_index[0])
    
    wordcloud = WordCloud(
        max_words = 30, width = 3000, height = 2000, background_color = "white").generate(" ".join(flatlist))
    axis.set_xticks([])
    axis.set_yticks([])
    axis.imshow(wordcloud, interpolation = "bilinear")

Description of the function:

1. filt_df is the filtered dataframe that we want to slice and pos is the tagged that will be used for the word cloud and in this case only the "noun" and the "adjective" will be used.
2. A empty list will be instantiated, and then the words will be appended depending on the pos tag.
3. The returned list, will then be used as an input for the word cloud.

## Worst items

In [None]:
fig, (ax1, ax2) = plt.subplots(ncols = 2, figsize = (12,10), facecolor = "w", edgecolor = "k")


create_wordcloud(df[(df["Clothing ID"].isin(worst_items.head(10)["Clothing ID"].tolist())) & (df["Net Promoter Score"] == "Negative")]["Product Review"], "noun", ax1)
create_wordcloud(df[(df["Clothing ID"].isin(worst_items.head(10)["Clothing ID"].tolist())) & (df["Net Promoter Score"] == "Negative")]["Product Review"], "adjective", ax2)

ax1.set_title("What Makes It Bad?")
ax2.set_title("Why Is It Bad?")

plt.suptitle("Worst Items", fontsize = 20, y = 0.8)
plt.tight_layout()
plt.show()


In [None]:
df[(df["Clothing ID"].isin(worst_items.head(10)["Clothing ID"].tolist())) & (df["Net Promoter Score"] == "Negative")]["Review Text"].sample(5)

The worst reviewed items have an issue with the overall quality and the fit of the product. Materials are known to have a cheap feel and is very thin. It probably has bad proportions, which is why the fit of the clothing is really bad.

## Best Items

In [None]:
fig, (ax1, ax2) = plt.subplots(ncols = 2, figsize = (12,10), facecolor = "w", edgecolor = "k")

create_wordcloud(df[(df["Clothing ID"].isin(best_items.head(10)["Clothing ID"].tolist())) & (df["Net Promoter Score"] == "Positive")]["Product Review"], "noun", ax1)
create_wordcloud(df[(df["Clothing ID"].isin(best_items.head(10)["Clothing ID"].tolist())) & (df["Net Promoter Score"] == "Positive")]["Product Review"], "adjective", ax2)

ax1.set_title("What Makes It Good?")
ax2.set_title("Why Is It Good?")

plt.suptitle("Best Items", fontsize = 20, y = 0.75)
plt.tight_layout()
plt.show()


In [None]:
df[(df["Clothing ID"].isin(best_items.head(10)["Clothing ID"].tolist())) & (df["Net Promoter Score"] == "Positive")]["Review Text"].sample(5)

Apparently, the very same things that people hate on the worst reviewed products, are the very same things that have been said on the most positive reviewed items. This could possibly mean that the raw materials from those items that have been negatively reviewed, have overall bad quality.

Note: I originally added in here the word cloud of the worst performing departments, but I decided to take it out mainly because its the different Clothing ID's are the reason for the bad ratings.

# What's Next?

The main focus of this analysis is to understand how different customers affect clothing ratings. We have learned that age doesn't necessarily affect the review ratings, breaking down the Net Promoter Score by its division group doesn't show any difference, the most outspoken group came from the baby boomers, and we've identified the best and worst performing Clothing Id's. Going further into that, we saw that the very same reasons for why people hate the product on the worst reviewed section are the same things why people love the product on the best items section. This leads me to believe that the reason for the bad products came from the source of the raw materials as well as the assmebly plant of the clothes. Those should be determined and be changed or altogether take out the specific clothing ID's as negative reviews can directly affect the sales of the product.


