# CommentWorks Usage Demo

This notebook demonstrates how to use CommentWorks for theme detection and assignment with both Python lists and pandas DataFrames.

**What you'll learn:**
- Basic theme detection with Python lists
- Theme assignment (single and batch)
- Working with pandas DataFrames
- Best practices for large datasets

In [None]:
import commentworks as cw
import pandas as pd

# Initialize the model (downloads automatically on first use, ~500MB)
print("Initializing commentworks model...")
model = cw.commentworks()
print("Model loaded!")

## Part 1: Basic Usage with Python Lists

CommentWorks works with simple Python lists - no pandas required!

### Example 1: Detect Themes

Let's analyze some restaurant reviews to discover common themes.

In [None]:
# Sample restaurant reviews
reviews = [
    "The pasta was perfectly cooked and the sauce was incredible. However, we waited nearly 40 minutes for our appetizers.",
    "Our waiter was attentive and gave great wine recommendations. The ambiance was romantic and cozy.",
    "Food was cold when it arrived at our table. Manager comped our meal and apologized profusely.",
    "Portions are huge, great value for money. The decor is a bit dated but who cares when the food is this good.",
    "Reservation system is a nightmare, tried calling for three days. Once we got in, the food was just okay.",
]

print("Analyzing", len(reviews), "reviews...\n")

# Detect themes across all reviews
themes = model.detect_themes(reviews)

print("Detected themes:")
for theme in themes:
    print(f"  - {theme}")

### Example 2: Assign Themes (Single Comment)

Once we have possible themes, we can tag individual comments.

In [None]:
# Define possible themes
possible_themes = ["food quality", "service", "ambiance", "pricing", "wait times"]

# Single comment
single_review = "Great food but service was slow and prices were high"

print(f"Review: {single_review}\n")
print(f"Possible themes: {', '.join(possible_themes)}\n")

assigned = model.assign_themes(single_review, possible_themes)

print(f"Assigned themes: {', '.join(assigned)}")

### Example 3: Batch Theme Assignment

For efficiency, you can process multiple comments at once.

In [None]:
batch_reviews = [
    "Amazing atmosphere and friendly staff",
    "Food was overpriced for the quality",
    "Long wait time but worth it for the quality"
]

print(f"Processing {len(batch_reviews)} reviews...\n")

batch_results = model.assign_themes(batch_reviews, possible_themes)

for i, (review, assigned_themes) in enumerate(zip(batch_reviews, batch_results), 1):
    print(f"Review {i}: {review}")
    print(f"Themes: {', '.join(assigned_themes)}\n")

## Part 2: Working with Pandas DataFrames

CommentWorks integrates seamlessly with pandas for analyzing larger datasets.

### Load Sample Data

In [None]:
# Load sample reviews
df = pd.read_csv('reviews.csv')

print(f"Loaded {len(df)} reviews\n")
print("Categories:")
print(df['category'].value_counts())
print("\nFirst few reviews:")
df.head()

### Detect Themes Across All Reviews

Convert DataFrame column to list using `.tolist()`

In [None]:
# Detect themes across all reviews
all_themes = model.detect_themes(df['review_text'].tolist())

print(f"Detected {len(all_themes)} unique themes:\n")
for theme in all_themes:
    print(f"  - {theme}")

### Detect Themes by Category

You can analyze different segments separately.

In [None]:
for category in df['category'].unique():
    category_reviews = df[df['category'] == category]['review_text'].tolist()
    themes = model.detect_themes(category_reviews)
    
    print(f"\n{category.title()} themes ({len(category_reviews)} reviews):")
    for theme in themes:
        print(f"  - {theme}")

### Assign Themes to Each Row

Use `.apply()` to tag each review with relevant themes.

In [None]:
# Define possible themes based on what we detected
possible_themes = [
    "food quality", "service", "wait times", "ambiance", "pricing",
    "product quality", "shipping", "customer service", "durability",
    "ease of use", "value for money"
]

print(f"Using {len(possible_themes)} possible themes...")
print("Processing reviews...\n")

# Assign themes to each review
df['assigned_themes'] = df['review_text'].apply(
    lambda x: model.assign_themes(x, possible_themes)
)

# Show results for a few reviews
print("Sample results:\n")
for idx in [0, 5, 10]:
    row = df.iloc[idx]
    print(f"Review {row['review_id']} ({row['category']}) - Rating: {row['rating']}/5")
    print(f"Text: {row['review_text'][:80]}...")
    print(f"Themes: {', '.join(row['assigned_themes'])}\n")

### Analyze Themes by Rating

See which themes appear in high vs low-rated reviews.

In [None]:
for rating in sorted(df['rating'].unique()):
    rating_df = df[df['rating'] == rating]
    
    # Flatten all themes for this rating
    all_themes_for_rating = [theme for themes in rating_df['assigned_themes'] for theme in themes]
    theme_counts = pd.Series(all_themes_for_rating).value_counts()
    
    print(f"\n{rating}-star reviews (n={len(rating_df)}):")
    print(theme_counts.head(3))

### Export Results

Convert theme lists to comma-separated strings for CSV export.

In [None]:
# Convert list of themes to comma-separated string
df['themes_csv'] = df['assigned_themes'].apply(lambda x: ', '.join(x))

# Save to CSV
output_file = 'reviews_with_themes.csv'
df[['review_id', 'category', 'review_text', 'rating', 'themes_csv']].to_csv(
    output_file, index=False
)

print(f"Results saved to: {output_file}")

## Part 3: Best Practices for Large Datasets

When working with 10,000+ comments, use sampling for theme detection.

In [None]:
# Example with a simulated large dataset
# In practice, you'd have df with 10k+ rows

# For theme detection, use a random sample (500-2000 comments)
sample_size = min(1000, len(df))  # Adjust based on your dataset
sample = df['review_text'].sample(n=sample_size, random_state=42).tolist()

print(f"Using random sample of {sample_size} comments for theme detection...")
themes = model.detect_themes(sample)

print(f"\nDetected themes: {', '.join(themes)}")

# Then assign themes to ALL rows (assignment is fast)
print(f"\nAssigning themes to all {len(df)} reviews...")
df['themes'] = df['review_text'].apply(
    lambda x: model.assign_themes(x, possible_themes=themes)
)

print("Done!")

## Summary

You've learned:
- ✅ How to initialize the commentworks model
- ✅ How to detect themes with `model.detect_themes()`
- ✅ How to assign themes with `model.assign_themes()` (single and batch)
- ✅ How to use commentworks with pandas DataFrames
- ✅ Best practices for large datasets (use sampling)

**Key takeaway:** Initialize the model once, then reuse it for all operations for better performance!

**Next steps:**
- Try with your own comment data
- Experiment with different sample sizes
- Analyze themes by categories, time periods, or other dimensions