# Sample Input/Output Demo - Customer Sentiment Analysis

This notebook demonstrates the functionality of the sentiment analysis system using sample data.
You can run this without web scraping to see how the system works.

## 1. Setup

In [None]:
import sys
import warnings
warnings.filterwarnings('ignore')

sys.path.append('../src')

import pandas as pd
from preprocessor import ReviewPreprocessor
from sentiment_analyzer import SentimentAnalyzer, print_sentiment_report
from visualizer import SentimentVisualizer

print("Libraries imported successfully!")

## 2. Sample Input Data

Here are 20 sample reviews to demonstrate the system functionality.

In [None]:
# Sample review data
sample_reviews = [
    {"username": "Rahul K", "rating": 5.0, "review_text": "Excellent phone! The camera quality is outstanding and battery life is amazing. Very happy with this purchase."},
    {"username": "Priya S", "rating": 5.0, "review_text": "Best iPhone ever! Super fast performance, beautiful display, and the build quality is premium. Highly recommended!"},
    {"username": "Amit P", "rating": 4.0, "review_text": "Good phone with great features. Camera is really good but the price is a bit high. Overall satisfied."},
    {"username": "Sneha M", "rating": 5.0, "review_text": "Love this phone! The design is sleek and elegant. Face ID works perfectly and iOS is very smooth."},
    {"username": "Vikram T", "rating": 2.0, "review_text": "Disappointed with battery backup. Phone heats up during gaming. Expected better for this price point."},
    {"username": "Anjali R", "rating": 5.0, "review_text": "Amazing product! Fast delivery, genuine product. Camera quality in low light is superb. Worth every penny."},
    {"username": "Karthik N", "rating": 4.0, "review_text": "Very good phone. Performance is excellent. Only issue is the battery drains quickly with heavy usage."},
    {"username": "Divya L", "rating": 5.0, "review_text": "Perfect phone! The A16 chip is lightning fast. Love the new camera features and the video quality is incredible."},
    {"username": "Rohan G", "rating": 3.0, "review_text": "Decent phone but overpriced. Battery life is average. Camera is good but not a huge upgrade from iPhone 14."},
    {"username": "Meera B", "rating": 5.0, "review_text": "Absolutely love it! The dynamic island is so cool and useful. Best purchase of the year!"},
    {"username": "Arjun D", "rating": 1.0, "review_text": "Very bad experience. Phone has heating issues and battery drains within 4-5 hours. Not worth the money."},
    {"username": "Pooja V", "rating": 4.0, "review_text": "Good phone overall. Display is vibrant, performance is smooth. Wish it had better battery backup."},
    {"username": "Sanjay K", "rating": 5.0, "review_text": "Excellent build quality and premium feel. Camera is top-notch. Very happy with the purchase!"},
    {"username": "Nisha A", "rating": 2.0, "review_text": "Not satisfied. Phone gets hot during video calls. Battery life is poor compared to android phones in same price."},
    {"username": "Aditya M", "rating": 5.0, "review_text": "Best smartphone I've owned! Super fast, amazing camera, and the ecosystem is unbeatable. Love it!"},
    {"username": "Kavya S", "rating": 4.0, "review_text": "Nice phone with good features. Camera quality is excellent. Storage could have been more for the price."},
    {"username": "Deepak R", "rating": 5.0, "review_text": "Outstanding phone! The display is gorgeous, performance is blazing fast. Totally worth the investment."},
    {"username": "Swati P", "rating": 3.0, "review_text": "Average experience. Phone is good but expected more. Battery backup is not great. Camera is the only saving grace."},
    {"username": "Nikhil J", "rating": 5.0, "review_text": "Superb phone! Everything works perfectly. Great camera, smooth performance, and elegant design. Highly recommend!"},
    {"username": "Riya C", "rating": 4.0, "review_text": "Very good phone. Love the camera and display. Battery could be better but overall a great purchase."}
]

# Create DataFrame
df = pd.DataFrame(sample_reviews)

print(f"Created sample dataset with {len(df)} reviews\n")
print("Sample Input Data:")
print("=" * 100)
df

## 3. Data Preprocessing Demo

Demonstrate text cleaning and preprocessing.

In [None]:
# Initialize preprocessor
preprocessor = ReviewPreprocessor()

# Preprocess the data
print("Preprocessing sample reviews...\n")
processed_df = preprocessor.preprocess_dataframe(df.copy())

print("\n" + "=" * 100)
print("PREPROCESSING OUTPUT EXAMPLES")
print("=" * 100)

# Show 3 examples of preprocessing
for i in range(3):
    print(f"\nExample {i+1}:")
    print("-" * 100)
    print(f"Original Text:\n{processed_df.iloc[i]['review_text']}")
    print(f"\nCleaned Text:\n{processed_df.iloc[i]['cleaned_text']}")
    print(f"\nPreprocessed (Tokenized, Stop words removed, Lemmatized):\n{processed_df.iloc[i]['preprocessed_text']}")
    print("-" * 100)

# Show summary
summary = preprocessor.get_preprocessing_summary(processed_df)
print("\n" + "=" * 100)
print("PREPROCESSING SUMMARY")
print("=" * 100)
for key, value in summary.items():
    print(f"{key}: {value}")

## 4. Sentiment Analysis Demo

Perform sentiment analysis on the sample reviews.

In [None]:
# Initialize sentiment analyzer
analyzer = SentimentAnalyzer(polarity_threshold=0.1)

# Perform sentiment analysis
print("Performing sentiment analysis...\n")
sentiment_df = analyzer.analyze_dataframe(processed_df, text_column='review_text')

print("=" * 100)
print("SENTIMENT ANALYSIS OUTPUT")
print("=" * 100)
print("\nReviews with Sentiment Scores:\n")
sentiment_df[['username', 'rating', 'review_text', 'polarity', 'subjectivity', 'sentiment', 'sentiment_category']].head(10)

In [None]:
# Get sentiment summary
summary = analyzer.get_sentiment_summary(sentiment_df)
print_sentiment_report(summary)

In [None]:
# Show sentiment by rating
print("\nSentiment Analysis by Rating:")
print("=" * 100)
sentiment_by_rating = analyzer.analyze_sentiment_by_rating(sentiment_df)
sentiment_by_rating

In [None]:
# Show extreme reviews
extreme_reviews = analyzer.get_extreme_reviews(sentiment_df, n=3)

print("\n" + "=" * 100)
print("TOP 3 MOST POSITIVE REVIEWS")
print("=" * 100)
for idx, row in extreme_reviews['most_positive'].iterrows():
    print(f"\nUser: {row['username']}")
    print(f"Rating: {row['rating']} | Polarity: {row['polarity']:.3f} | Sentiment: {row['sentiment']}")
    print(f"Review: {row['review_text']}")
    print("-" * 100)

print("\n" + "=" * 100)
print("TOP 3 MOST NEGATIVE REVIEWS")
print("=" * 100)
for idx, row in extreme_reviews['most_negative'].iterrows():
    print(f"\nUser: {row['username']}")
    print(f"Rating: {row['rating']} | Polarity: {row['polarity']:.3f} | Sentiment: {row['sentiment']}")
    print(f"Review: {row['review_text']}")
    print("-" * 100)

## 5. Visualization Demo

Create visualizations from the sample data.

In [None]:
# Initialize visualizer
visualizer = SentimentVisualizer(figsize=(12, 5))

# Sentiment distribution
print("Creating sentiment distribution visualization...")
visualizer.plot_sentiment_distribution(sentiment_df)

In [None]:
# Sentiment by rating
print("Creating sentiment by rating visualization...")
visualizer.plot_sentiment_by_rating(sentiment_df)

In [None]:
# Polarity distribution
print("Creating polarity distribution visualization...")
visualizer.plot_polarity_distribution(sentiment_df)

In [None]:
# Word clouds
print("Creating word cloud for positive reviews...")
visualizer.create_wordcloud(sentiment_df, sentiment_type='Positive')

In [None]:
print("Creating word cloud for negative reviews...")
visualizer.create_wordcloud(sentiment_df, sentiment_type='Negative')

## 6. Key Insights from Sample Data

In [None]:
# Calculate insights
total_reviews = len(sentiment_df)
positive_count = len(sentiment_df[sentiment_df['sentiment'] == 'Positive'])
negative_count = len(sentiment_df[sentiment_df['sentiment'] == 'Negative'])
avg_rating = sentiment_df['rating'].mean()
avg_polarity = sentiment_df['polarity'].mean()
correlation = sentiment_df['rating'].corr(sentiment_df['polarity'])

print("=" * 100)
print("KEY INSIGHTS FROM SAMPLE DATA")
print("=" * 100)
print(f"\n1. Dataset Overview:")
print(f"   Total Reviews: {total_reviews}")
print(f"   Positive Reviews: {positive_count} ({positive_count/total_reviews*100:.1f}%)")
print(f"   Negative Reviews: {negative_count} ({negative_count/total_reviews*100:.1f}%)")

print(f"\n2. Average Metrics:")
print(f"   Average Rating: {avg_rating:.2f} / 5.0")
print(f"   Average Polarity: {avg_polarity:.3f}")

print(f"\n3. Rating-Sentiment Correlation:")
print(f"   Correlation Coefficient: {correlation:.3f}")
if correlation > 0.7:
    print("   Strong positive correlation between ratings and sentiment")
elif correlation > 0.4:
    print("   Moderate positive correlation between ratings and sentiment")
else:
    print("   Weak correlation between ratings and sentiment")

print(f"\n4. Sentiment Categories:")
for category, count in sentiment_df['sentiment_category'].value_counts().items():
    print(f"   {category}: {count} ({count/total_reviews*100:.1f}%)")

print("\n" + "=" * 100)

## 7. Common Words Analysis

In [None]:
from collections import Counter

# Positive reviews word frequency
positive_reviews = sentiment_df[sentiment_df['sentiment'] == 'Positive']['preprocessed_text']
positive_words = ' '.join(positive_reviews.astype(str)).split()
positive_word_freq = Counter(positive_words).most_common(10)

print("Top 10 Words in Positive Reviews:")
print("=" * 60)
for word, freq in positive_word_freq:
    print(f"{word:20s} : {freq:3d}")

# Negative reviews word frequency
negative_reviews = sentiment_df[sentiment_df['sentiment'] == 'Negative']['preprocessed_text']
if len(negative_reviews) > 0:
    negative_words = ' '.join(negative_reviews.astype(str)).split()
    negative_word_freq = Counter(negative_words).most_common(10)

    print("\nTop 10 Words in Negative Reviews:")
    print("=" * 60)
    for word, freq in negative_word_freq:
        print(f"{word:20s} : {freq:3d}")

## Conclusion

This demo successfully demonstrates:

1. **Input**: Sample customer reviews with username, rating, and review text
2. **Preprocessing**: Text cleaning, tokenization, stop word removal, and lemmatization
3. **Sentiment Analysis**: Polarity scoring and sentiment classification using TextBlob
4. **Output**: Sentiment scores, classifications, and detailed analytics
5. **Visualizations**: Multiple charts showing sentiment patterns and distributions
6. **Insights**: Actionable insights about customer sentiment

The system can be scaled to analyze 300+ reviews from actual web scraping as shown in the main notebook.