# Using ChatGPT for Amazon Review Summarization with Rating Sentiment

This notebook demonstrates how to use the OpenAI ChatGPT API to generate structured summaries of Amazon reviews using the `rating_sentiment` column from the dataset.

## 1. Install Required Libraries

In [None]:
# Install required packages
!pip install openai pandas tqdm matplotlib seaborn

## 2. Import Libraries

In [1]:
import pandas as pd
import numpy as np
import openai
from tqdm.notebook import tqdm
import time
import matplotlib.pyplot as plt
import seaborn as sns
from collections import Counter
import re
from getpass import getpass

## 3. Set Up OpenAI API Key

In [2]:
# Set your OpenAI API key
api_key = getpass("Enter your OpenAI API key: ")
openai.api_key = api_key

Enter your OpenAI API key: ··········


## 4. Load the Dataset

In [33]:
# Load the processed Amazon reviews dataset
df = pd.read_csv('/content/final_categorized_reviews.csv')
df.head()

Unnamed: 0,reviews.text,cleaned_text,name,categories,sentiment_label,sentiment_score,reviews.rating,rating_sentiment,mapped_sentiment,product_category
0,I order 3 of them and one of the item is bad quality. Is missing backup spring so I have to put a pcs of aluminum to make the battery work.,i order 3 of them and one of the item is bad quality. is missing backup spring so i have to put a pcs of aluminum to make the battery work.,AmazonBasics AAA Performance Alkaline Batteries (36 Count),"AA,AAA,Health,Electronics,Health & Household,Camcorder Batteries,Camera & Photo,Batteries,Household Batteries,Robot Check,Accessories,Camera Batteries,Health and Beauty,Household Supplies,Batteries & Chargers,Health, Household & Baby Care,Health Personal Care",LABEL_0,0.90238,3,Neutral,Negative,Accessories
1,Bulk is always the less expensive way to go for products like these,bulk is always the less expensive way to go for products like these,AmazonBasics AAA Performance Alkaline Batteries (36 Count),"AA,AAA,Health,Electronics,Health & Household,Camcorder Batteries,Camera & Photo,Batteries,Household Batteries,Robot Check,Accessories,Camera Batteries,Health and Beauty,Household Supplies,Batteries & Chargers,Health, Household & Baby Care,Health Personal Care",LABEL_1,0.697489,4,Positive,Neutral,Accessories
2,Well they are not Duracell but for the price i am happy.,well they are not duracell but for the price i am happy.,AmazonBasics AAA Performance Alkaline Batteries (36 Count),"AA,AAA,Health,Electronics,Health & Household,Camcorder Batteries,Camera & Photo,Batteries,Household Batteries,Robot Check,Accessories,Camera Batteries,Health and Beauty,Household Supplies,Batteries & Chargers,Health, Household & Baby Care,Health Personal Care",LABEL_2,0.890748,5,Positive,Positive,Accessories
3,Seem to work as well as name brand batteries at a much better price,seem to work as well as name brand batteries at a much better price,AmazonBasics AAA Performance Alkaline Batteries (36 Count),"AA,AAA,Health,Electronics,Health & Household,Camcorder Batteries,Camera & Photo,Batteries,Household Batteries,Robot Check,Accessories,Camera Batteries,Health and Beauty,Household Supplies,Batteries & Chargers,Health, Household & Baby Care,Health Personal Care",LABEL_2,0.938524,5,Positive,Positive,Accessories
4,These batteries are very long lasting the price is great.,these batteries are very long lasting the price is great.,AmazonBasics AAA Performance Alkaline Batteries (36 Count),"AA,AAA,Health,Electronics,Health & Household,Camcorder Batteries,Camera & Photo,Batteries,Household Batteries,Robot Check,Accessories,Camera Batteries,Health and Beauty,Household Supplies,Batteries & Chargers,Health, Household & Baby Care,Health Personal Care",LABEL_2,0.965138,5,Positive,Positive,Accessories


In [23]:
# Check basic information about the dataset
print(f"Dataset shape: {df.shape}")
print(f"Number of reviews: {len(df)}")

# Examine the rating_sentiment column
print("\nUnique values in rating_sentiment:")
print(df['rating_sentiment'].value_counts())

Dataset shape: (28332, 10)
Number of reviews: 28332

Unique values in rating_sentiment:
rating_sentiment
Positive    25545
Negative     1581
Neutral      1206
Name: count, dtype: int64


## 5. Define the Summarization Function Using ChatGPT

In [34]:
def generate_summary_with_chatgpt(text, sentiment, model="gpt-3.5-turbo", max_retries=3, retry_delay=5):
    """
    Generate a structured summary for the given review text using ChatGPT and existing sentiment.

    Parameters:
    -----------
    text : str
        The review text to summarize
    sentiment : str
        The existing sentiment label (Positive, Negative, or Neutral)
    model : str
        The OpenAI model to use (default: gpt-3.5-turbo)
    max_retries : int
        Maximum number of retries in case of API errors
    retry_delay : int
        Delay between retries in seconds

    Returns:
    --------
    str
        The generated summary
    """
    # Skip empty or very short texts
    if pd.isna(text) or len(text.split()) < 5:
        return ""

    # Define the prompt with instructions and existing sentiment
    prompt = f"""
    You are tasked with summarizing Amazon product reviews. The sentiment of the review has already been determined as: {sentiment}.

    Your task is to summarize the main reason(s) behind this sentiment:
    - For positive reviews, highlight what features or qualities make the product good.
    - For negative reviews, focus on the issues that caused dissatisfaction. Deliver a list with the top 3 key areas of improvements.
    - For neutral reviews, highlight the balanced aspects mentioned.

    Provide the summary in a single sentence that starts with "{sentiment} review due to" and then explains the main reasons.

    Examples:
    Positive Review: "I love this blender! It's powerful, easy to use, and cleans up quickly."
    Summary: "Positive review due to power, ease of use, and easy cleanup."

    Negative Review: "The blender stopped working after two days, and it's too noisy."
    Summary: "Negative review due to malfunctioning and excessive noise; could be improved with better quality control and sound insulation."

    Neutral Review: "The blender works okay but is a bit loud; it's reasonably priced though."
    Summary: "Neutral review due to balance between acceptable functionality and reasonable price against noise issues."

    Now, summarize the following review:
    \"{text}\"

    Provide ONLY the summary sentence, nothing else. Your summary MUST start with "{sentiment} review due to".
    """

    # Initialize retry counter
    retries = 0

    while retries < max_retries:
        try:
            # Call the OpenAI API
            response = openai.chat.completions.create(
                model=model,
                messages=[
                    {"role": "system", "content": "You are a helpful assistant that summarizes product reviews using the provided sentiment label."},
                    {"role": "user", "content": prompt}
                ],
                temperature=0.3,  # Lower temperature for more consistent outputs
                max_tokens=100    # Limit response length
            )

            # Extract and return the summary
            summary = response.choices[0].message.content.strip()
            return summary

        except Exception as e:
            retries += 1
            if retries < max_retries:
                print(f"Error: {e}. Retrying in {retry_delay} seconds... (Attempt {retries}/{max_retries})")
                time.sleep(retry_delay)
            else:
                print(f"Failed after {max_retries} attempts: {e}")
                return f"Error generating summary: {str(e)}"


## 6. Test the ChatGPT Summarization on a Few Examples

In [35]:
# Test the model on a few examples
test_examples = df.head(5)

for i, row in test_examples.iterrows():
    text = row['cleaned_text']
    sentiment = row['rating_sentiment']

    print(f"\nExample {i+1}:")
    print(f"Original text: {text}")
    print(f"Rating: {row['reviews.rating']}")
    print(f"Rating sentiment: {sentiment}")
    summary = generate_summary_with_chatgpt(text, sentiment)
    print(f"ChatGPT Summary: {summary}")


Example 1:
Original text: i order 3 of them and one of the item is bad quality. is missing backup spring so i have to put a pcs of aluminum to make the battery work.
Rating: 3
Rating sentiment: Neutral
ChatGPT Summary: Neutral review due to one item being of bad quality with missing backup spring, requiring a makeshift solution for the battery to work.

Example 2:
Original text: bulk is always the less expensive way to go for products like these
Rating: 4
Rating sentiment: Positive
ChatGPT Summary: Positive review due to cost-effectiveness of buying in bulk for products like these.

Example 3:
Original text: well they are not duracell but for the price i am happy.
Rating: 5
Rating sentiment: Positive
ChatGPT Summary: Positive review due to affordability and satisfaction despite not being Duracell batteries.

Example 4:
Original text: seem to work as well as name brand batteries at a much better price
Rating: 5
Rating sentiment: Positive
ChatGPT Summary: Positive review due to performa

## 7. Generate Summaries for a Sample of Reviews

In [36]:
# Create a sample of the dataset for testing
sample_size = 50  # Adjust based on your API quota and budget
df_sample = df.sample(sample_size, random_state=42)

print(f"Processing {len(df_sample)} reviews...")

Processing 50 reviews...


In [37]:
# Generate summaries for the sample reviews
summaries = []
start_time = time.time()
delay_between_calls = 1  # seconds

for i, row in enumerate(tqdm(df_sample.iterrows())):
    text = row[1]['cleaned_text']
    sentiment = row[1]['rating_sentiment']
    summary = generate_summary_with_chatgpt(text, sentiment)
    summaries.append(summary)

    # Print progress every 10 reviews
    if (i + 1) % 10 == 0:
        elapsed = time.time() - start_time
        print(f"Processed {i+1} reviews in {elapsed:.2f} seconds ({(i+1)/elapsed:.2f} reviews/second)")

    # Add delay to avoid rate limits
    if i < len(df_sample) - 1:  # Don't delay after the last item
        time.sleep(delay_between_calls)

# Add summaries to the dataframe
df_sample['chatgpt_summary'] = summaries

0it [00:00, ?it/s]

Processed 10 reviews in 14.47 seconds (0.69 reviews/second)
Processed 20 reviews in 29.79 seconds (0.67 reviews/second)
Processed 30 reviews in 45.73 seconds (0.66 reviews/second)
Processed 40 reviews in 60.97 seconds (0.66 reviews/second)
Processed 50 reviews in 74.89 seconds (0.67 reviews/second)


## 8. Analyze the Results

In [38]:
# Calculate summary statistics
df_sample['original_length'] = df_sample['cleaned_text'].apply(lambda x: len(x.split()))
df_sample['summary_length'] = df_sample['chatgpt_summary'].apply(lambda x: len(x.split()) if isinstance(x, str) else 0)
df_sample['compression_ratio'] = df_sample['summary_length'] / df_sample['original_length']

print(f"Average original text length: {df_sample['original_length'].mean():.2f} words")
print(f"Average summary length: {df_sample['summary_length'].mean():.2f} words")
print(f"Average compression ratio: {df_sample['compression_ratio'].mean():.2f}")

Average original text length: 27.68 words
Average summary length: 16.44 words
Average compression ratio: 0.72


## 9. Analyze Summaries by Rating Sentiment

In [39]:
# Group by rating sentiment and analyze summary characteristics
sentiment_stats = df_sample.groupby('rating_sentiment').agg({
    'original_length': 'mean',
    'summary_length': 'mean',
    'compression_ratio': 'mean',
    'reviews.rating': 'mean',
    'rating_sentiment': 'count'
}).rename(columns={'rating_sentiment': 'count'})

sentiment_stats

Unnamed: 0_level_0,original_length,summary_length,compression_ratio,reviews.rating,count
rating_sentiment,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Negative,49.75,23.25,0.775603,1.0,4
Neutral,55.0,33.0,0.6,3.0,1
Positive,25.111111,15.466667,0.722296,4.755556,45


## 9. Extract Improvement Suggestions

Section to focus specifically on extracting improvement suggestions from negative reviews.

In [40]:
# Extract improvement suggestions from negative reviews
def extract_improvements(summary, sentiment):
    """Extract improvement suggestions from negative review summaries."""
    if not isinstance(summary, str) or not summary or sentiment != "Negative":
        return ""

    prefix = "Negative review due to"
    if prefix.lower() in summary.lower():
        # Find the prefix case-insensitively and remove it
        idx = summary.lower().find(prefix.lower()) + len(prefix)
        return summary[idx:].strip()
    else:
        # If the prefix isn't found, return the whole summary
        return summary if sentiment == "Negative" else ""

# Extract improvements from negative reviews only
df_sample['improvement_suggestions'] = df_sample.apply(
    lambda row: extract_improvements(row['chatgpt_summary'], row['rating_sentiment']), axis=1)

# Display examples of negative reviews with improvement suggestions
negative_examples = df_sample[df_sample['rating_sentiment'] == 'Negative']
if len(negative_examples) > 0:
    examples_with_improvements = negative_examples[['cleaned_text', 'chatgpt_summary', 'improvement_suggestions']].head(5)
    print(f"Found {len(negative_examples)} negative reviews with improvement suggestions")
    examples_with_improvements
else:
    print("No negative reviews found in the sample. Try increasing the sample size.")

Found 4 negative reviews with improvement suggestions


In [15]:
# Display examples with extracted reasons
examples_with_reasons = df_sample[['cleaned_text', 'rating_sentiment', 'chatgpt_summary', 'improvement_suggestions']].sample(50, random_state=42)
examples_with_reasons

Unnamed: 0,cleaned_text,rating_sentiment,chatgpt_summary,improvement_suggestions
9809,plenty of batteries so worth it.,Positive,Positive review due to value for money with plenty of batteries included.,
6203,an alkaline battery is an alkaline battery no matter whose brand name is on it.,Positive,Positive review due to recognizing the consistent quality of alkaline batteries across different brands.,
8902,all good,Positive,,
5763,i seriously don't think i will buy these from the store ever again. these are half the price and work well in all of my electronics.,Positive,Positive review due to affordability and good performance in various electronics.,
10646,nice batteries will but again,Positive,Positive review due to quality batteries and intention to repurchase.,
7460,the longevity of these batteries is nothing special. they do not compare to duracell or energizer alkalines. after reading the reviews i was greatly disappointed. i used them among others for a wall clock and remote controller. their longevity is no better than alkaline batteries marketed and sold by walgreen's for example. take this into consideration before you buy them.,Negative,"Negative review due to lackluster longevity compared to top brands like Duracell or Energizer, leading to disappointment; could be improved by enhancing battery lifespan to match competitors.","lackluster longevity compared to top brands like Duracell or Energizer, leading to disappointment; could be improved by enhancing battery lifespan to match competitors."
5206,this is a great buy! i plan on purchasing amazon batteries from here out fast cheap and they last as expected.,Positive,"Positive review due to affordability, fast delivery, and expected battery life.",
1989,good product service and price,Positive,Positive review due to good product service and price.,
8882,these seem as good as top tier brands and models.using these on professional audio equipment... the equipment ran the same amount of time.i think the only difference is they are vastly less expensive.,Positive,"Positive review due to comparable quality to top tier brands and models, suitable for professional audio equipment, and significantly lower cost.",
10871,same as purchasing any other brand name batteries. they have lasted as long as any other brand. love receiving all these non perishable items in bulk at my front door at a better price than any store without having to carry them or leaving my house.,Positive,"Positive review due to long-lasting performance comparable to other brands, convenient bulk delivery at a better price, and the ease of not having to carry items from the store.",


## 10. Select top products by Category


In [27]:
def analyze_products_by_category(df):
    """
    Analyze products by category to find top-rated and lowest-rated products.
    Uses product_category column instead of categories.
    """
    # Calculate average rating by product name
    product_ratings = df.groupby('name')['reviews.rating'].agg(['mean', 'count']).reset_index()
    product_ratings.columns = ['name', 'avg_rating', 'review_count']
    
    # Filter products with at least 5 reviews for more reliable ratings
    product_ratings = product_ratings[product_ratings['review_count'] >= 5]
    
    # Get unique product categories
    if 'product_category' in df.columns:
        # Get all unique product categories
        all_categories = set(df['product_category'].dropna().unique())
        
        # Create a mapping of category to products
        category_products = {}
        for category in all_categories:
            category = str(category).strip()
            if not category:  # Skip empty categories
                continue
                
            # Find products in this category
            products_in_category = set(df[df['product_category'] == category]['name'].unique())
            
            # Get ratings for these products
            category_ratings = product_ratings[product_ratings['name'].isin(products_in_category)]
            
            if len(category_ratings) > 0:
                # Get top 3 products by rating
                top_products = category_ratings.sort_values(['avg_rating', 'review_count'], 
                                                           ascending=[False, False]).head(3)
                
                # Get lowest rated product
                lowest_product = category_ratings.sort_values(['avg_rating', 'review_count'], 
                                                            ascending=[True, False]).head(1)
                
                category_products[category] = (top_products, lowest_product)
    else:
        print("No 'product_category' column found. Please check your dataset.")
        return {}
    
    return category_products

## 11. Generate improvement suggestions

In [None]:
def extract_improvements(summary, sentiment):
    """Extract improvement suggestions from negative review summaries."""
    if not isinstance(summary, str) or not summary or sentiment != "Negative":
        return ""
    
    prefix = "Negative review due to"
    if prefix.lower() in summary.lower():
        # Find the prefix case-insensitively and remove it
        idx = summary.lower().find(prefix.lower()) + len(prefix)
        return summary[idx:].strip()
    else:
        # If the prefix isn't found, return the whole summary
        return summary if sentiment == "Negative" else ""

In [28]:
def generate_improvements_for_lowest_rated(df, category_products, max_reviews=5):
    """
    Generate improvement suggestions for the lowest-rated product in each category.
    """
    improvement_suggestions = {}

    for category, (top_products, lowest_product) in category_products.items():
        if len(lowest_product) == 0:
            continue

        product_name = lowest_product['name'].iloc[0]
        avg_rating = lowest_product['avg_rating'].iloc[0]

        print(f"\nProcessing lowest-rated product in {category}: {product_name} (Avg Rating: {avg_rating:.2f})")

        # Get negative reviews for this product
        product_reviews = df[(df['name'] == product_name) & (df['rating_sentiment'] == 'Negative')]

        if len(product_reviews) == 0:
            print(f"No negative reviews found for {product_name}")
            continue

        # Limit the number of reviews to process
        if len(product_reviews) > max_reviews:
            product_reviews = product_reviews.sample(max_reviews, random_state=42)

        print(f"Processing {len(product_reviews)} negative reviews for {product_name}...")

        # Generate summaries for each review
        summaries = []
        for _, row in tqdm(product_reviews.iterrows(), total=len(product_reviews)):
            summary = generate_summary_with_chatgpt(row['cleaned_text'], row['rating_sentiment'])
            summaries.append(summary)
            time.sleep(1)  # Add delay to avoid rate limits

        # Extract improvement suggestions
        product_reviews['chatgpt_summary'] = summaries
        product_reviews['improvement_suggestions'] = product_reviews.apply(
            lambda row: extract_improvements(row['chatgpt_summary'], row['rating_sentiment']), axis=1)

        # Store the improvement suggestions
        improvement_suggestions[product_name] = {
            'category': category,
            'avg_rating': avg_rating,
            'reviews': product_reviews[['cleaned_text', 'chatgpt_summary', 'improvement_suggestions']].to_dict('records')
        }

    return improvement_suggestions


In [29]:
def generate_executive_summary(improvement_suggestions):
    """
    Generate an executive summary of improvement suggestions for each product.
    """
    executive_summaries = {}

    for product_name, data in improvement_suggestions.items():
        print(f"\nGenerating executive summary for {product_name}...")

        # Combine all improvement suggestions
        all_suggestions = ' '.join([review['improvement_suggestions'] for review in data['reviews']
                                   if review['improvement_suggestions']])

        if not all_suggestions.strip():
            print(f"No improvement suggestions found for {product_name}")
            continue

        # Prepare a prompt for ChatGPT
        improvement_prompt = f"""
        You are a product manager analyzing customer feedback for {product_name}.
        Below are extracted improvement suggestions from negative reviews:

        {all_suggestions[:3000]}  # Limit text length to avoid token limits

        Based on these suggestions, provide:
        1. A concise executive summary of the top 3-5 areas for product improvement
        2. Specific actionable recommendations for each area
        3. Potential impact on customer satisfaction if implemented

        Format your response as bullet points.
        """

        try:
            # Call the OpenAI API
            response = openai.chat.completions.create(
                model="gpt-3.5-turbo",
                messages=[
                    {"role": "system", "content": "You are a helpful product manager that analyzes customer feedback."},
                    {"role": "user", "content": improvement_prompt}
                ],
                temperature=0.3,
                max_tokens=500
            )

            # Store the executive summary
            executive_summary = response.choices[0].message.content.strip()
            executive_summaries[product_name] = executive_summary

            # Display the executive summary
            print(f"\n## Executive Summary for {product_name} (Avg Rating: {data['avg_rating']:.2f})\n")
            print(executive_summary)

        except Exception as e:
            print(f"Error generating executive summary for {product_name}: {str(e)}")

    return executive_summaries


## 12. Generate the article to MarkDown file

In [30]:
def generate_blog_posts(category_products, improvement_suggestions, executive_summaries, df):
    """
    Generate blog-style posts for each category with product ratings, improvement suggestions,
    and incorporating existing summarizations.
    """
    # Create output directory
    output_dir = 'product_blog_posts'
    os.makedirs(output_dir, exist_ok=True)
    
    # Initialize dictionary to store blog posts
    blog_posts = {}
    
    for category, (top_products, lowest_product) in category_products.items():
        print(f"Generating blog post for {category}...")
        
        # Skip if no products in this category
        if len(top_products) == 0 or len(lowest_product) == 0:
            continue
            
        # Get lowest product details
        lowest_product_name = lowest_product['name'].iloc[0]
        lowest_product_rating = lowest_product['avg_rating'].iloc[0]
        lowest_product_reviews = lowest_product['review_count'].iloc[0]
        
        # Get executive summary for lowest product if available
        executive_summary = executive_summaries.get(lowest_product_name, "No improvement suggestions available.")
        
        # Start building the blog post
        blog_post = f"# The Ultimate Guide to {category} Products\n\n"
        
        # Add introduction
        blog_post += f"""
## Introduction

Looking for the best {category} products? Our team has analyzed thousands of customer reviews to bring you the definitive guide to the top-rated options on the market today. We've also identified products you might want to avoid, along with detailed suggestions for how manufacturers could improve them.

This guide is based on real customer feedback and data-driven analysis, not paid promotions or sponsorships.

"""
        
        # Add top products section
        blog_post += f"## Top 3 {category} Products You Should Consider\n\n"
        blog_post += "Based on average customer ratings and review volume, these are the top three products in this category:\n\n"
        
        # Add each top product with actual review summaries
        for i, (_, row) in enumerate(top_products.iterrows()):
            product_name = row['name']
            avg_rating = row['avg_rating']
            review_count = row['review_count']
            
            blog_post += f"### {i+1}. {product_name}\n\n"
            blog_post += f"**Average Rating:** {'⭐' * int(avg_rating)}{' ½' if avg_rating % 1 >= 0.5 else ''} ({avg_rating:.2f}/5.0)\n\n"
            blog_post += f"**Number of Reviews:** {review_count}\n\n"
            
            # Get positive reviews for this product and their summaries
            positive_reviews = df[(df['name'] == product_name) & (df['rating_sentiment'] == 'Positive')].head(3)
            
            if len(positive_reviews) > 0:
                blog_post += "**What Customers Are Saying:**\n\n"
                for _, review in positive_reviews.iterrows():
                    # Generate summary if not already done
                    review_text = review['cleaned_text']
                    sentiment = review['rating_sentiment']
                    
                    # Check if we should generate a new summary
                    if 'chatgpt_summary' in review and pd.notna(review['chatgpt_summary']):
                        summary = review['chatgpt_summary']
                    else:
                        # Use the existing function to generate a summary
                        summary = generate_summary_with_chatgpt(review_text, sentiment)
                    
                    # Extract the key points from the summary (remove the "Positive review due to" prefix)
                    if summary.lower().startswith("positive review due to"):
                        key_points = summary[len("positive review due to"):].strip()
                    else:
                        key_points = summary
                        
                    blog_post += f"- {key_points.capitalize()}\n"
                
                blog_post += "\n"
            else:
                blog_post += f"**Why Customers Love It:** This product stands out for its exceptional quality and value. Customers particularly praise its durability, ease of use, and excellent performance.\n\n"
        
        # Add section for product to avoid
        blog_post += f"## {category} Product to Approach with Caution\n\n"
        blog_post += f"### {lowest_product_name}\n\n"
        blog_post += f"**Average Rating:** {'⭐' * int(lowest_product_rating)}{' ½' if lowest_product_rating % 1 >= 0.5 else ''} ({lowest_product_rating:.2f}/5.0)\n\n"
        blog_post += f"**Number of Reviews:** {lowest_product_reviews}\n\n"
        
        # Get negative reviews for this product and their summaries
        negative_reviews = df[(df['name'] == lowest_product_name) & (df['rating_sentiment'] == 'Negative')].head(3)
        
        if len(negative_reviews) > 0:
            blog_post += "**Common Customer Complaints:**\n\n"
            for _, review in negative_reviews.iterrows():
                # Generate summary if not already done
                review_text = review['cleaned_text']
                sentiment = review['rating_sentiment']
                
                # Check if we should generate a new summary
                if 'chatgpt_summary' in review and pd.notna(review['chatgpt_summary']):
                    summary = review['chatgpt_summary']
                else:
                    # Use the existing function to generate a summary
                    summary = generate_summary_with_chatgpt(review_text, sentiment)
                
                # Extract the key points from the summary
                improvement = extract_improvements(summary, sentiment)
                if improvement:
                    blog_post += f"- {improvement.capitalize()}\n"
                else:
                    blog_post += f"- {summary}\n"
            
            blog_post += "\n"
        
        # Add improvement suggestions section
        blog_post += "## How This Product Could Be Improved\n\n"
        blog_post += "Based on analysis of negative customer reviews, here are the key areas where this product needs improvement:\n\n"
        blog_post += f"{executive_summary}\n\n"
        
        # Add conclusion
        blog_post += "## Conclusion\n\n"
        blog_post += f"""
When shopping for {category} products, our analysis shows that the top three options offer excellent value and features. The {top_products['name'].iloc[0]} stands out as the best overall choice, while you may want to approach the {lowest_product_name} with caution due to the issues mentioned above.

Remember to consider your specific needs and budget when making your final decision. We hope this guide helps you find the perfect {category} product for your needs!
"""
        
        # Save the blog post
        filename = f"{output_dir}/{category.replace(' & ', '_').replace(' ', '_')}_guide.md"
        with open(filename, 'w', encoding='utf-8') as f:
            f.write(blog_post)
        
        print(f"Blog post saved to {filename}")
        
        # Store the blog post
        blog_posts[category] = blog_post
        
        # Display the first blog post in Jupyter notebook
        if len(blog_posts) == 1:
            display(Markdown(blog_post))
    
    return blog_posts

In [None]:
import os
from IPython.display import Markdown, display

# Make sure you have the necessary imports
import os
from IPython.display import Markdown, display

# Run the analysis to get top products, improvement suggestions, and executive summaries
print("Analyzing products by category...")
category_products = analyze_products_by_category(df)

# Display top products by category
print("\nTop 3 Products by Category:")
for category, (top_products, _) in category_products.items():
    print(f"\nCategory: {category}")
    for i, (_, row) in enumerate(top_products.iterrows()):
        print(f"{i+1}. {row['name']} - Avg Rating: {row['avg_rating']:.2f} ({row['review_count']} reviews)")

# Generate improvement suggestions for lowest-rated products
print("\nGenerating improvement suggestions for lowest-rated products...")
improvement_suggestions = generate_improvements_for_lowest_rated(df, category_products, max_reviews=5)

# Generate executive summaries
print("\nGenerating executive summaries...")
executive_summaries = generate_executive_summary(improvement_suggestions)

# Generate blog posts
print("\nGenerating blog posts...")
blog_posts = generate_blog_posts(category_products, improvement_suggestions, executive_summaries, df)

# Display a summary of all generated reports
print(f"\nGenerated {len(blog_posts)} blog posts for the following categories:")
for i, category in enumerate(blog_posts.keys()):
    print(f"{i+1}. {category}")

print("\nAll blog posts have been saved to the 'product_blog_posts' directory.")

## 13. Conclusion

In this notebook, we've used the OpenAI ChatGPT API to generate structured summaries of Amazon product reviews, leveraging the existing `rating_sentiment` column from the dataset. This approach focuses on explaining the reasons behind the sentiment rather than determining the sentiment itself.

Key advantages of this approach:
- Uses the sentiment derived directly from the product ratings
- Focuses on extracting the key reasons behind the sentiment
- Provides structured output in a consistent format
- Allows for analysis of common themes in positive, negative, and neutral reviews

Some potential applications:
- Identifying common reasons for positive and negative reviews
- Extracting actionable insights for product improvements
- Creating concise product descriptions highlighting key strengths and weaknesses
- Comparing customer sentiment across different product categories

Future work could include:
- Topic modeling on the extracted reasons to identify common themes
- Comparing reason extraction across different product categories
- Fine-tuning a model specifically for reason extraction from reviews