# NLP Sentiment Analysis - Quick Start Guide

**Author:** Gabriel Demetrios Lafis

This notebook demonstrates how to use the sentiment analysis pipeline with transformer models.

## 1. Setup and Imports

In [None]:
import sys
sys.path.append('..')

from src.models.sentiment_analyzer import SentimentAnalyzer, BaselineSentimentAnalyzer
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

## 2. Initialize Models

In [None]:
# Initialize transformer-based analyzer
analyzer = SentimentAnalyzer(model_name='distilbert-base-uncased-finetuned-sst-2-english')

# Initialize baseline VADER analyzer
baseline = BaselineSentimentAnalyzer()

print(f"Transformer Model: {analyzer.model_name}")
print(f"Device: {analyzer.device}")

## 3. Single Text Analysis

In [None]:
# Example text
text = "This product exceeded all my expectations! The quality is outstanding and delivery was fast."

# Analyze with transformer model
result = analyzer.predict(text, return_all_scores=True)

print(f"Text: {text}\n")
print(f"Sentiment: {result['sentiment']}")
print(f"Confidence: {result['confidence']:.2%}")
print(f"\nAll Scores:")
for sentiment, score in result['scores'].items():
    print(f"  {sentiment}: {score:.4f}")

## 4. Batch Analysis

In [None]:
# Sample texts
texts = [
    "This is the best product I've ever bought!",
    "Terrible quality. Complete waste of money.",
    "It's okay, nothing special.",
    "Amazing customer service and fast shipping!",
    "Very disappointed with this purchase.",
    "Average product, meets basic expectations."
]

# Analyze batch
results = analyzer.predict(texts, return_all_scores=True)

# Create DataFrame
df = pd.DataFrame([
    {
        'text': text,
        'sentiment': result['sentiment'],
        'confidence': result['confidence']
    }
    for text, result in zip(texts, results)
])

print(df.to_string(index=False))

## 5. Model Comparison

In [None]:
# Compare transformer vs VADER
test_text = "The product is good but the delivery was slow."

transformer_result = analyzer.predict(test_text, return_all_scores=True)
vader_result = baseline.predict(test_text)

print(f"Text: {test_text}\n")
print(f"Transformer Model:")
print(f"  Sentiment: {transformer_result['sentiment']}")
print(f"  Confidence: {transformer_result['confidence']:.2%}")
print(f"\nVADER Baseline:")
print(f"  Sentiment: {vader_result['sentiment']}")
print(f"  Confidence: {vader_result['confidence']:.2%}")

## 6. Visualization

In [None]:
# Visualize sentiment distribution
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Sentiment counts
sentiment_counts = df['sentiment'].value_counts()
axes[0].bar(sentiment_counts.index, sentiment_counts.values, color=['red', 'gray', 'green'])
axes[0].set_title('Sentiment Distribution', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Sentiment')
axes[0].set_ylabel('Count')

# Confidence distribution
axes[1].hist(df['confidence'], bins=10, color='skyblue', edgecolor='black')
axes[1].set_title('Confidence Score Distribution', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Confidence')
axes[1].set_ylabel('Frequency')

plt.tight_layout()
plt.show()

## 7. Prediction Explanation

In [None]:
# Get explanation for a prediction
text_to_explain = "This product is absolutely fantastic and exceeded my expectations!"

explanation = analyzer.explain_prediction(text_to_explain)

print(f"Text: {text_to_explain}\n")
print(f"Predicted Sentiment: {explanation['sentiment']}")
print(f"Confidence: {explanation['confidence']:.2%}\n")

if 'top_tokens' in explanation:
    print("Most Important Tokens:")
    for i, token_info in enumerate(explanation['top_tokens'][:5], 1):
        print(f"  {i}. '{token_info['token']}' - Importance: {token_info['importance']:.4f}")

## 8. Performance Benchmarking

In [None]:
import time

# Benchmark inference time
test_texts = ["This is a test sentence."] * 100

# Transformer model
start = time.time()
_ = analyzer.batch_predict(test_texts, batch_size=32, show_progress=False)
transformer_time = time.time() - start

# VADER baseline
start = time.time()
_ = baseline.predict(test_texts)
vader_time = time.time() - start

print(f"Processing 100 texts:")
print(f"  Transformer: {transformer_time:.3f}s ({transformer_time/100*1000:.2f}ms per text)")
print(f"  VADER: {vader_time:.3f}s ({vader_time/100*1000:.2f}ms per text)")
print(f"  Speedup: {transformer_time/vader_time:.2f}x slower")

## Conclusion

This notebook demonstrated:
- Single and batch sentiment analysis
- Model comparison (Transformer vs VADER)
- Visualization of results
- Prediction explanation
- Performance benchmarking

The transformer-based models provide higher accuracy at the cost of increased inference time compared to traditional methods like VADER.