<h1 style="color:white; background-color:black; text-align:center; padding:10px; border-radius:10px;">
<b>Milestone 4 Project : Customer Sentiment Analysis</b>
</h1>

<h2 style="color:orange">Table of Content</h2>

1. [Objective](#objective)
2. [Libraries and Tools](#Libraries-and-Tools)
3. [Data Collection](#Data-Collection)
4. [Data Cleaning and Preprocessing](#Data-Cleaning-and-Preprocessing)
5. [Sentiment Analysis](#Sentiment-Analysis)
6. [Data Visualization & Insights](#Data-Visualization-&-Insights)
7. [Reporting](#Reporting)

<h2 id="objective" style="color:orange">Objective</h2>

As a Data Analyst at **[Flower Aura](https://www.floweraura.com/)**, you have been tasked with gauging customer sentiment towards flowers and gifts. The primary goal of this project is to analyze public perception and evaluate customer reactions by performing sentiment analysis on user-posted reviews. By extracting and processing these customer reviews, you will derive insights into the overall sentiment (positive or negative) surrounding the products. These insights will help inform decision-making, enhance customer experience, and identify key areas for product improvement.


<h2 style="color:orange">Libraries and Tools</h2>

<b>Selenium:</b> For automating the web scraping process.

<b>BeautifulSoup:</b> For parsing HTML and extracting review details.

<b>Pandas:</b> For data cleaning, processing, and analysis.

<b>TextBlob:</b> For performing sentiment analysis on the review text.

<b>Matplotlib/Seaborn:</b> For visualizations like sentiment distribution and word clouds.

<h2 style="color:orange">Data Collection</h2>
   
**Tool:** BeautifulSoup
   
**Task:** *Scrape customer reviews from FlowerAura’s product pages for any flowers or gifts item.*

### Import <span style="background-color:yellow;">all the library</span>  that is needed.

In [132]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from textblob import TextBlob
from wordcloud import WordCloud, STOPWORDS

In [133]:
#Empty Lists that will be used to store the scraped data.

Names = []
Ratings = []
Reviews = []
Cities = []
PostedOn = []
Occassions = []

## Scraping <span style="background-color:yellow;">Customer Reviews </span> from Floweraura Using Requests and BeautifulSoup

In [None]:
url = "https://www.floweraura.com/reviews/p/6617/10-red-roses-bouquet?page="


for i in range(1,51):
    cnp = url+str(i)
    url_new = cnp
    r = requests.get(url_new)

    soup = BeautifulSoup (r.text, "html.parser")
    
    main = soup.find("div", {"class":"review-left-container"})
    
    sub = main.find_all ("div", {"class":"new-review-card-container"})
    
    for i in sub:

        #scrape Name of the Reviewer
        name= i.find("span", {"class":"review-author-name"})
        Names.append(name.text.title())
    
        #scraped City of the Reviewer
        city = i.find_all ("span", {"class":"review-meta-details"})
        Cities.append(city[0].text.title())
        
        try:
            Occassions.append(city[1].text.title())
        except: 
            Occassions.append(np.nan)
    
        #scraped Date of the Reviewing
        date = i.find_all("span")
        try:
            PostedOn.append(date[4]. text)
        except:
            PostedOn.append(nan)

        #scraped Ratings of the Reviewing
        rating = i.find("span", {"class":"star-count-container"})
        Ratings.append(rating.text)
    
         #scraped Reviews Written by the Reviewer
        review = i.find_all("div")
        Reviews.append(review[-1].text)

## <span style="background-color:yellow;">Creating a DataFrame  </span> of Scraped Floweraura Reviews

In [None]:
df = pd.DataFrame({'Names':Names , 'Cities':Cities , 'Posted_On':PostedOn , 'Occasions':Occassions , 'Rating': Ratings , 'Reviews': Reviews })
df

<h2  style="color:orange">Data Cleaning and Preprocessing</h2>
   
**Tool:** Pandas
   
**Task:** *Clean and preprocess the scraped data for analysis.*

### 1. Extracting and Cleaning from <span style="background-color:yellow;">Posted_On</span> and <span style="background-color:yellow;">Occassions</span> Columns

In [None]:
def extract(value):
    try:
        x=value.index(':')
        return value[x+2:]
    except:
        return np.nan

df['Posted_On'] = df['Posted_On'].apply(extract)
df['Occasions'] = df['Occasions'].apply(extract)

### 2. Removing <span style="background-color:yellow;">(th, rd,st,nd)</span> from <span style="background-color:yellow;">Posted_On</span> Columns

In [None]:
rep = ['th', 'rd', 'st', 'nd']
for i in rep:
    df['Posted_On'] = df['Posted_On'].str.replace(i, "")

In [None]:
df

### 3. Checking the <span style="background-color:yellow;">datatype</span> of each.

In [None]:
df.info()

### 4. Changing <span style="background-color:yellow;">Posted_On</span> to Datetime & <span style="background-color:yellow;">"Rating Datatype"</span> to int.

In [None]:
df['Posted_On'] = pd.to_datetime(df['Posted_On'])
df['Rating'] = df['Posted_On'].astype("int")

In [None]:
df.info()

<h2 style="color:orange">Sentiment Analysis</h2>
   
**Tool:** TextBlob
   
**Task:** *Analyze the sentiment of each review to classify them as either positive or negative.*

### <span style="background-color:yellow;">Polarity Score</span>

In [None]:
df['Polarity'] = [TextBlob(i).sentiment.polarity for i in df['Reviews']]
df['Polarity'] = df['Polarity'].round(2)
df

### <span style="background-color:yellow;">Subjectivity Score</span>

In [None]:
df['Subjectivity'] = [TextBlob(i).sentiment.subjectivity for i in df['Reviews']]
df['Subjectivity'] = df['Subjectivity'].round(2)
df

<h2 style="color:orange">Data Visualization & Insights</h2>
   
**Tool:** Pandas and Matplotlib/Seaborn for visualization
   
**Task:** *Perform an analysis on the sentiment of reviews and extract actionable insights.*

In [None]:
def score (value):
    if value <= -0.3:
        return "Negative"
    else:
        return "Positive"

df['Score'] = df ["Polarity"].apply(score)
df

## Plots figure for <span style="background-color:yellow;">Sentiment Distribution</span> based on Sentiment Category

In [None]:
ax = sns.countplot(x=df['Score'], data = df, color='orange')

ax.bar_label(container = ax.containers[0])
plt.title('Sentiment Distribution')
plt.xlabel('Sentiment Category')
plt.ylabel('Frequency')
plt.show()

## Visualizing <span style="background-color:yellow;">Positive Customer</span> Reviews Using WordCloud 

In [None]:
df_pos = df.loc[df["Score"] == "Positive"]
all_text = " ".join(text for text in df_pos["Reviews"])

wordcloud = WordCloud(width=800, height=400, background_color='white', colormap="Greens").generate(all_text)

plt.figure(figsize=(10, 5))
plt.title('Positive Reviews')
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()

## Visualizing <span style="background-color:yellow;">Negative Customer</span> Reviews Using WordCloud

In [None]:
df_neg = df.loc[df["Score"] == "Negative"]
all_text = " ".join(text for text in df_neg["Reviews"])

wordcloud = WordCloud(width=800, height=400, background_color='white', colormap="Reds").generate(all_text)

plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()

## <span style="background-color:yellow;">Average Rating </span> vs <span style="background-color:yellow;">Sentiment Polarity</span>

In [None]:
# Group properly by rating
rating_sentiment = df.groupby('Rating')['Polarity'].mean().reset_index()
rating_sentiment['Polarity'] = rating_sentiment['Polarity'].round(2)

plt.figure(figsize=(12,6))
sns.boxplot(data=rating_sentiment, x='Polarity', y='Rating', hue = 'Polarity' ,palette='coolwarm')
plt.title('Rating vs Average Polarity')
plt.xlabel('Average Polarity')
plt.xticks(rotation=90)
plt.ylabel(' Rating (1-5 stars) ')
plt.show()

## <span style="background-color:yellow;">Review Length</span> vs <span style="background-color:yellow;">Sentiment Polarity</span>

In [None]:
df['review_length'] = df['Reviews'].apply(lambda x: len(str(x).split()))

plt.figure(figsize=(12,5))
sns.boxplot(data=df, x='review_length', y='Polarity', hue = 'Score', palette='Set2')
plt.title('Review Length vs Sentiment Polarity')
plt.xlabel('Review Length (Word Count)')
plt.ylabel('Sentiment Polarity')
plt.show()

## Correlation between <span style="background-color:yellow;"> Review Length</span> and <span style="background-color:yellow;">Sentiment Polarity</span>

In [None]:
length_correlation = df['review_length'].corr(df['Polarity'])
print(f"Correlation between Review Length and Sentiment Polarity: {length_correlation:.2f}")

<b>The Correlation value is -0.14 -</b> The minus sign means when review length increases, the positivity slightly decreases.

<b>But the number is small (0.14) - </b> That means the connection is very weak.

<b>So overall - </b> Longer reviews are a little less positive, but not by much — review length doesn’t really affect sentiment much.

<h2 style="color:orange">Reporting</h2>

## <span style="background-color:yellow;">Overview: </span>

<p>Reviews were scraped from FlowerAura. Data was cleaned, and analyzed using TextBlob to classify sentiments as extremely positive, positive, neutral, negative, or extremely negative. </p>

## <span style="background-color:yellow;">Results:</span>
<p> Most reviews were positive, showing high customer satisfaction. Negative or neutral reviews were mainly about delivery delays or packaging problems. </p>


## <span style="background-color:yellow;">Insights:</span>

<b>Positive Highlights:</b> 

1. Fast and timely delivery
2. Fresh and beautiful flowers
3. Good service and bouquet designs
4. Common words: "Good," "Thank," "fresh," "beautiful," "service"

<b>Common Issues:</b> 

1. Late deliveries
2. Weather-related delays
3. Poor flower quality in some cases
4. <b>Common words:</b> "bad," "weather," "despite," "delivering"


## <span style="background-color:yellow;">Recommendations:</span>

1. Ensure on-time delivery.
2. Prepare for weather-related delays.
3. Improve flower quality checks.
4. Communicate delays clearly.
5. Strengthen customer support.
6. Monitor and fix common issues.
7. Offer rewards for loyal customers.