<a href="https://colab.research.google.com/github/richardp123456/PersonalProjects/blob/main/IA4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
pip install vaderSentiment


Collecting vaderSentiment
  Downloading vaderSentiment-3.3.2-py2.py3-none-any.whl.metadata (572 bytes)
Downloading vaderSentiment-3.3.2-py2.py3-none-any.whl (125 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/126.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m126.0/126.0 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: vaderSentiment
Successfully installed vaderSentiment-3.3.2


In [3]:
# Import necessary libraries
import pandas as pd
import numpy as np
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

# Load the CSV files
sales_data = pd.read_csv('/content/daily_sales (2).csv')  # Use your actual file path
reviews_data = pd.read_csv('/content/water_product_reviews_500_actual_names (1) (1).csv')  # Use your actual file path

### Task 1: Sales Revenue Analysis ###
# Clean sales data
sales_data_cleaned = sales_data.iloc[2:].reset_index(drop=True)
sales_data_cleaned.columns = ['Date', 'Daily Units Sold', 'Daily Unit Price', 'Daily Temperature (C)']

# Convert columns to numeric where applicable
sales_data_cleaned['Daily Units Sold'] = pd.to_numeric(sales_data_cleaned['Daily Units Sold'], errors='coerce')
sales_data_cleaned['Daily Unit Price'] = pd.to_numeric(sales_data_cleaned['Daily Unit Price'], errors='coerce')
sales_data_cleaned['Daily Temperature (C)'] = pd.to_numeric(sales_data_cleaned['Daily Temperature (C)'], errors='coerce')

# Add a new column for daily revenue, Daily Units sold x sales data cleaned
sales_data_cleaned['Daily Revenue'] = sales_data_cleaned['Daily Units Sold'] * sales_data_cleaned['Daily Unit Price']

# Extract month from date and group by month
sales_data_cleaned['Date'] = pd.to_datetime(sales_data_cleaned['Date'])
sales_data_cleaned['Month'] = sales_data_cleaned['Date'].dt.strftime('%Y-%m')

# Calculate monthly revenue and average temperature
monthly_data = sales_data_cleaned.groupby('Month').agg(
    Monthly_Revenue=('Daily Revenue', 'sum'),
    Avg_Temperature=('Daily Temperature (C)', 'mean')
).reset_index()

# Identify months where revenue is below $40,000
below_target_months = monthly_data[monthly_data['Monthly_Revenue'] < 40000]

# Display below target months
print("Months where revenue is below $40,000 and their respective average temperature:")
print(below_target_months)

### Task 2: Customer Review Sentiment Analysis ###
# Initialize SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

# Define a function to label sentiment based on compound score
def label_sentiment(review_text):
    compound_score = analyzer.polarity_scores(review_text)['compound']
    if compound_score >= 0.05:
        return 'Positive'
    elif compound_score <= -0.05:
        return 'Negative'
    else:
        return 'Neutral'

# Apply sentiment labeling to reviews
reviews_data['sentiment'] = reviews_data['Review Text'].apply(label_sentiment)

# Calculate the average rating for each sentiment category
avg_rating_by_sentiment = reviews_data.groupby('sentiment').agg(Avg_Rating=('Rating', 'mean'))
print("\nAverage rating by sentiment category:")
print(avg_rating_by_sentiment)

# Summarize total number of positive, neutral, and negative reviews for each month
reviews_data['Review Date'] = pd.to_datetime(reviews_data['Review Date'])
reviews_data['Month'] = reviews_data['Review Date'].dt.strftime('%Y-%m')
review_summary_by_month = reviews_data.groupby(['Month', 'sentiment']).agg(Review_Count=('Rating', 'size')).unstack(fill_value=0)
print("\nReview count by month and sentiment:")
print(review_summary_by_month)

# Calculate the average rating of reviews for each month
avg_rating_by_month = reviews_data.groupby('Month').agg(Avg_Rating=('Rating', 'mean'))
print("\nAverage rating by month:")
print(avg_rating_by_month)

# Save the dataframe with sentiment column to a CSV file
sentiment_file_path = 'Pilz_R_sentiment.csv'
reviews_data.to_csv(sentiment_file_path, index=False)
print(f"\nData with sentiment saved to {sentiment_file_path}")



Months where revenue is below $40,000 and their respective average temperature:
     Month  Monthly_Revenue  Avg_Temperature
5  2024-06          35793.0        15.433333
6  2024-07          35137.0        13.838710
7  2024-08          22200.0        16.225806

Average rating by sentiment category:
           Avg_Rating
sentiment            
Negative     1.451923
Neutral      3.000000
Positive     4.187500

Review count by month and sentiment:
          Review_Count                 
sentiment     Negative Neutral Positive
Month                                  
2023-01             13       3       15
2023-02             14       4       10
2023-03             10       3       18
2023-04              6       2       22
2023-05             19       0       12
2023-06             11       1       18
2023-07             17       3       11
2023-08             13       2       16
2023-09             19       0       11
2023-10              9       3       19
2023-11             11       1   

### **Summary** ###
- The sales revenue data identified months where sales were below $40,000, particularly in colder months like June, July, and August (in Australia).
- Sentiment analysis of customer reviews categorized them as positive, neutral, or negative, with average ratings calculated for each category.
- Review summary per month showed how customer sentiment varied over time, along with average ratings for each of the months.

### **Interpretation** ###
- The sales performance suggests a correlation between colder weather and lower sales, indicating that the WaterCure product might not be as appealing in cooler months.
- Customer sentiment is mixed, with both positive and negative reviews. By examining the sentiment trends, WaterPro could improve product satisfaction based on the feedback.
- The overall customer satisfaction allows us to gain insights into product improvement areas by looking at the average ratings.