# Customer Sentiment Analysis - iPhone 15 128GB

## Project Overview

This notebook performs comprehensive sentiment analysis on customer reviews for the iPhone 15 128GB model from Flipkart. The analysis includes:

1. **Data Collection**: Web scraping using Selenium and BeautifulSoup
2. **Data Preprocessing**: Cleaning and preparing text data using Pandas and NLTK
3. **Sentiment Analysis**: Analyzing sentiment using TextBlob
4. **Data Visualization**: Creating insightful visualizations using Matplotlib, Seaborn, and WordCloud

**Author**: Data Analyst Team

**Date**: 2025

**Target**: Analyze 300+ customer reviews to understand public sentiment

## 1. Setup and Imports

Import all necessary libraries and custom modules.

In [None]:
# Standard library imports
import sys
import os
import warnings
warnings.filterwarnings('ignore')

# Add src directory to path
sys.path.append('../src')

# Data manipulation
import pandas as pd
import numpy as np

# Custom modules
from scraper import FlipkartReviewScraper, save_reviews, load_reviews
from preprocessor import ReviewPreprocessor, save_preprocessed_data, load_preprocessed_data
from sentiment_analyzer import SentimentAnalyzer, save_sentiment_results, print_sentiment_report
from visualizer import SentimentVisualizer

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', 100)

print("All libraries imported successfully!")

## 2. Data Collection - Web Scraping

Scrape customer reviews from Flipkart using Selenium and BeautifulSoup.

### 2.1 Configure Scraper

In [None]:
# Flipkart product URL for iPhone 15 128GB
PRODUCT_URL = "https://www.flipkart.com/apple-iphone-15-black-128-gb/p/itm6d16e1cf03604"

# Target number of reviews to scrape
TARGET_REVIEWS = 300

print(f"Target: Scrape {TARGET_REVIEWS} reviews from Flipkart")
print(f"Product URL: {PRODUCT_URL}")

### 2.2 Scrape Reviews

**Note**: This cell will take several minutes to complete as it needs to navigate through multiple pages and extract reviews.

**Alternative**: If you have already scraped reviews, skip this cell and load the saved data in the next section.

In [None]:
# Initialize scraper (set headless=False to watch the browser in action)
scraper = FlipkartReviewScraper(PRODUCT_URL, headless=True)

# Scrape reviews
print("Starting web scraping process...\n")
reviews_df = scraper.scrape_reviews(target_count=TARGET_REVIEWS, max_pages=30)

# Save scraped data
if not reviews_df.empty:
    save_reviews(reviews_df, '../data/raw_reviews.csv')
    print(f"\nSuccessfully scraped {len(reviews_df)} reviews!")
else:
    print("\nNo reviews were scraped. Please check the scraper configuration.")

### 2.3 Load Scraped Data

In [None]:
# Load the scraped reviews
reviews_df = load_reviews('../data/raw_reviews.csv')

# Display basic information
print(f"\nDataset Shape: {reviews_df.shape}")
print(f"Columns: {list(reviews_df.columns)}")
print(f"\nFirst 5 reviews:")
reviews_df.head()

### 2.4 Initial Data Exploration

In [None]:
# Data summary
print("Dataset Information:")
print("=" * 60)
reviews_df.info()

print("\nMissing Values:")
print(reviews_df.isnull().sum())

print("\nRating Distribution:")
print(reviews_df['rating'].value_counts().sort_index())

print("\nBasic Statistics:")
reviews_df.describe()

## 3. Data Preprocessing

Clean and preprocess the review text for analysis.

### 3.1 Initialize Preprocessor

In [None]:
# Initialize preprocessor
preprocessor = ReviewPreprocessor()

print("Preprocessor initialized successfully!")

### 3.2 Apply Preprocessing Pipeline

In [None]:
# Preprocess the data
preprocessed_df = preprocessor.preprocess_dataframe(reviews_df)

# Display results
print("\nPreprocessed Dataset Shape:", preprocessed_df.shape)
preprocessed_df.head()

### 3.3 Compare Original vs Preprocessed Text

In [None]:
# Show examples of text preprocessing
print("Examples of Text Preprocessing:")
print("=" * 100)

for i in range(3):
    print(f"\nExample {i+1}:")
    print(f"Original: {preprocessed_df.iloc[i]['review_text'][:200]}...")
    print(f"Cleaned: {preprocessed_df.iloc[i]['cleaned_text'][:200]}...")
    print(f"Preprocessed: {preprocessed_df.iloc[i]['preprocessed_text'][:200]}...")
    print("-" * 100)

### 3.4 Preprocessing Summary

In [None]:
# Get preprocessing summary
summary = preprocessor.get_preprocessing_summary(preprocessed_df)

print("Preprocessing Summary:")
print("=" * 60)
for key, value in summary.items():
    print(f"{key}: {value}")

# Save preprocessed data
save_preprocessed_data(preprocessed_df, '../data/preprocessed_reviews.csv')

## 4. Sentiment Analysis

Perform sentiment analysis using TextBlob.

### 4.1 Initialize Sentiment Analyzer

In [None]:
# Initialize sentiment analyzer with threshold of 0.1
analyzer = SentimentAnalyzer(polarity_threshold=0.1)

print("Sentiment Analyzer initialized!")
print(f"Polarity threshold: {analyzer.polarity_threshold}")
print("Polarity >= 0.1 → Positive")
print("Polarity < 0.1 → Negative")

### 4.2 Perform Sentiment Analysis

In [None]:
# Analyze sentiment
sentiment_df = analyzer.analyze_dataframe(preprocessed_df, text_column='review_text')

# Display results
print("\nSentiment Analysis Complete!")
print(f"Dataset Shape: {sentiment_df.shape}")
print(f"\nNew columns added: {['polarity', 'subjectivity', 'sentiment', 'sentiment_category']}")

# Show sample results
sentiment_df[['username', 'rating', 'review_text', 'polarity', 'sentiment']].head(10)

### 4.3 Sentiment Analysis Summary

In [None]:
# Get summary statistics
sentiment_summary = analyzer.get_sentiment_summary(sentiment_df)

# Print formatted report
print_sentiment_report(sentiment_summary)

### 4.4 Sentiment Analysis by Rating

In [None]:
# Analyze sentiment by rating
sentiment_by_rating = analyzer.analyze_sentiment_by_rating(sentiment_df)

print("Sentiment Analysis by Rating:")
print("=" * 80)
print(sentiment_by_rating)

# Correlation analysis
correlation = analyzer.analyze_correlation(sentiment_df)

### 4.5 Extreme Reviews (Most Positive and Negative)

In [None]:
# Get extreme reviews
extreme_reviews = analyzer.get_extreme_reviews(sentiment_df, n=5)

print("Top 5 Most Positive Reviews:")
print("=" * 100)
for idx, row in extreme_reviews['most_positive'].iterrows():
    print(f"\nUser: {row['username']} | Rating: {row['rating']} | Polarity: {row['polarity']:.3f}")
    print(f"Review: {row['review_text'][:200]}...")
    print("-" * 100)

print("\n\nTop 5 Most Negative Reviews:")
print("=" * 100)
for idx, row in extreme_reviews['most_negative'].iterrows():
    print(f"\nUser: {row['username']} | Rating: {row['rating']} | Polarity: {row['polarity']:.3f}")
    print(f"Review: {row['review_text'][:200]}...")
    print("-" * 100)

### 4.6 Save Sentiment Analysis Results

In [None]:
# Save results
save_sentiment_results(sentiment_df, '../data/sentiment_analysis_results.csv')

print("Sentiment analysis results saved successfully!")

## 5. Data Visualization

Create comprehensive visualizations to understand sentiment patterns.

### 5.1 Initialize Visualizer

In [None]:
# Initialize visualizer
visualizer = SentimentVisualizer(figsize=(14, 6))

print("Visualizer initialized successfully!")

### 5.2 Sentiment Distribution

In [None]:
# Plot sentiment distribution
visualizer.plot_sentiment_distribution(sentiment_df)

### 5.3 Sentiment by Rating

In [None]:
# Plot sentiment by rating
visualizer.plot_sentiment_by_rating(sentiment_df)

### 5.4 Polarity Distribution

In [None]:
# Plot polarity distribution
visualizer.plot_polarity_distribution(sentiment_df)

### 5.5 Review Length Analysis

In [None]:
# Plot review length analysis
visualizer.plot_review_length_analysis(sentiment_df)

### 5.6 Word Clouds

In [None]:
# Create word cloud for positive reviews
visualizer.create_wordcloud(sentiment_df, sentiment_type='Positive')

In [None]:
# Create word cloud for negative reviews
visualizer.create_wordcloud(sentiment_df, sentiment_type='Negative')

### 5.7 Detailed Sentiment Categories

In [None]:
# Plot sentiment categories
visualizer.plot_sentiment_categories(sentiment_df)

## 6. Key Insights and Findings

### 6.1 Overall Sentiment Analysis

In [None]:
# Calculate key metrics
total_reviews = len(sentiment_df)
positive_count = len(sentiment_df[sentiment_df['sentiment'] == 'Positive'])
negative_count = len(sentiment_df[sentiment_df['sentiment'] == 'Negative'])
avg_rating = sentiment_df['rating'].mean()
avg_polarity = sentiment_df['polarity'].mean()

print("KEY INSIGHTS - iPhone 15 128GB Customer Sentiment")
print("=" * 80)
print(f"\n1. Overall Sentiment Distribution:")
print(f"   - Total Reviews Analyzed: {total_reviews}")
print(f"   - Positive Reviews: {positive_count} ({positive_count/total_reviews*100:.1f}%)")
print(f"   - Negative Reviews: {negative_count} ({negative_count/total_reviews*100:.1f}%)")
print(f"\n2. Average Metrics:")
print(f"   - Average Rating: {avg_rating:.2f} / 5.0")
print(f"   - Average Sentiment Polarity: {avg_polarity:.3f}")
print(f"\n3. Rating-Sentiment Correlation:")
print(f"   - Correlation Coefficient: {correlation:.3f}")

if correlation > 0.7:
    print("   - Strong positive correlation: Higher ratings align with positive sentiment")
elif correlation > 0.4:
    print("   - Moderate positive correlation: Some alignment between ratings and sentiment")
else:
    print("   - Weak correlation: Ratings may not fully reflect sentiment")

### 6.2 Common Topics in Positive Reviews

In [None]:
# Extract common words from positive reviews
from collections import Counter

positive_reviews = sentiment_df[sentiment_df['sentiment'] == 'Positive']['preprocessed_text']
positive_words = ' '.join(positive_reviews.astype(str)).split()
positive_word_freq = Counter(positive_words).most_common(20)

print("Top 20 Words in Positive Reviews:")
print("=" * 60)
for word, freq in positive_word_freq:
    print(f"{word:20s} : {freq:4d}")

### 6.3 Common Topics in Negative Reviews

In [None]:
# Extract common words from negative reviews
negative_reviews = sentiment_df[sentiment_df['sentiment'] == 'Negative']['preprocessed_text']
negative_words = ' '.join(negative_reviews.astype(str)).split()
negative_word_freq = Counter(negative_words).most_common(20)

print("Top 20 Words in Negative Reviews:")
print("=" * 60)
for word, freq in negative_word_freq:
    print(f"{word:20s} : {freq:4d}")

## 7. Recommendations

Based on the sentiment analysis results, here are actionable recommendations:

In [None]:
print("RECOMMENDATIONS FOR AMAZON - iPhone 15 128GB")
print("=" * 80)

# Determine recommendation based on sentiment
positive_percentage = (positive_count / total_reviews) * 100

print("\n1. Product Performance:")
if positive_percentage > 70:
    print("   - Overall customer sentiment is HIGHLY POSITIVE")
    print("   - Product is well-received by customers")
    print("   - Continue emphasizing positive features in marketing")
elif positive_percentage > 50:
    print("   - Overall customer sentiment is MODERATELY POSITIVE")
    print("   - Product has good reception but room for improvement")
    print("   - Address common negative feedback points")
else:
    print("   - Overall customer sentiment is MIXED or NEGATIVE")
    print("   - Significant improvements needed")
    print("   - Investigate and address major customer concerns")

print("\n2. Marketing Strategy:")
print("   - Highlight most frequently mentioned positive features")
print("   - Use positive customer testimonials in campaigns")
print("   - Address common concerns proactively in product descriptions")

print("\n3. Customer Experience Improvements:")
print("   - Focus on issues mentioned in negative reviews")
print("   - Provide better product information to set accurate expectations")
print("   - Improve post-purchase support for identified pain points")

print("\n4. Product Page Optimization:")
print("   - Feature verified positive reviews prominently")
print("   - Create FAQ section addressing common concerns")
print("   - Use customer language in product descriptions")

print("\n5. Inventory and Pricing Strategy:")
if avg_rating >= 4.0:
    print("   - High customer satisfaction supports premium positioning")
    print("   - Ensure adequate inventory to meet demand")
else:
    print("   - Consider promotional strategies to address sentiment concerns")
    print("   - Bundle with accessories to enhance value perception")

print("\n" + "=" * 80)

## 8. Export Final Report

In [None]:
# Create comprehensive visualization dashboard
print("Creating comprehensive visualization dashboard...")
visualizer.create_comprehensive_dashboard(sentiment_df, output_dir='../data/visualizations')

print("\nAll analysis complete!")
print("\nGenerated Files:")
print("  - ../data/raw_reviews.csv")
print("  - ../data/preprocessed_reviews.csv")
print("  - ../data/sentiment_analysis_results.csv")
print("  - ../data/visualizations/ (all visualization files)")

## 9. Conclusion

This comprehensive sentiment analysis of iPhone 15 128GB customer reviews from Flipkart provides valuable insights into customer perception and satisfaction. The analysis workflow included:

1. **Web Scraping**: Successfully collected 300+ customer reviews using Selenium and BeautifulSoup
2. **Data Preprocessing**: Cleaned and processed text data using advanced NLP techniques
3. **Sentiment Analysis**: Applied TextBlob to classify reviews and extract sentiment scores
4. **Visualization**: Created comprehensive visual representations of sentiment patterns
5. **Insights Generation**: Derived actionable insights and recommendations

The results can be used by Amazon's product and marketing teams to:
- Understand customer sentiment trends
- Identify product strengths and weaknesses
- Optimize product positioning and marketing strategies
- Improve customer experience based on feedback
- Make data-driven decisions about inventory and pricing

### Next Steps:
1. Monitor sentiment trends over time with periodic analysis
2. Compare sentiment across different iPhone models
3. Analyze sentiment by customer demographics if data available
4. Implement real-time sentiment monitoring dashboard
5. Integrate sentiment analysis into product decision-making process