<a href="https://colab.research.google.com/github/kissflow/prompt2finetune/blob/main/amazon_review_mcp.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Amazon Review Summarizer

This notebook demonstrates how to use Firecrawl MCP (Model Context Protocol) server to scrape Amazon product reviews and OpenAI GPT-4o-mini to generate intelligent summaries using chain-of-thought prompting.

## What is Firecrawl?
Firecrawl is an MCP server that provides web scraping and crawling capabilities, making it easy to extract content from websites like Amazon product pages programmatically.


## Installation

First, install the required Python MCP SDK and OpenAI client:


In [None]:
%pip install mcp openai

## Setup and Configuration

Import necessary libraries and set up the MCP client and OpenAI:


In [None]:
import os
import asyncio
import json
import re
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
from openai import OpenAI

# Set your API keys
FIRECRAWL_API_KEY = "YOUR_FIRECRAWL_API_KEY"  # Replace with your actual Firecrawl API key
OPENAI_API_KEY = "YOUR_OPENAI_API_KEY"  # Replace with your actual OpenAI API key

os.environ["FIRECRAWL_API_KEY"] = FIRECRAWL_API_KEY
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

# Initialize OpenAI client
openai_client = OpenAI(api_key=OPENAI_API_KEY)

## Firecrawl MCP Client

Create a helper class to manage the Firecrawl MCP client connection:


In [None]:
class FirecrawlClient:
    def __init__(self, api_key):
        self.api_key = api_key
        self.session = None
        self.client = None

    async def __aenter__(self):
        # Configure server parameters
        server_params = StdioServerParameters(
            command="npx",
            args=["-y", "firecrawl-mcp"],
            env={
                "FIRECRAWL_API_KEY": self.api_key
            }
        )

        # Create stdio client
        stdio_transport = await stdio_client(server_params).__aenter__()
        self.client = stdio_transport

        # Create session
        self.session = await ClientSession(
            stdio_transport[0],
            stdio_transport[1]
        ).__aenter__()

        # Initialize the session
        await self.session.initialize()

        return self

    async def __aexit__(self, exc_type, exc_val, exc_tb):
        if self.session:
            await self.session.__aexit__(exc_type, exc_val, exc_tb)
        if self.client:
            await self.client.__aexit__(exc_type, exc_val, exc_tb)

    async def list_tools(self):
        """List available tools from the Firecrawl MCP server"""
        result = await self.session.list_tools()
        return result.tools

    async def call_tool(self, tool_name, arguments):
        """Call a tool with the given arguments"""
        result = await self.session.call_tool(tool_name, arguments)
        return result

    async def scrape_url(self, url, extract_reviews=True):
        """Scrape a URL and optionally extract reviews"""
        args = {"url": url}
        if extract_reviews:
            args["extract_reviews"] = True

        result = await self.call_tool("scrape", args)
        return result

    def extract_amazon_reviews(self, scraped_content):
        """Extract review data from scraped Amazon content"""
        reviews = []

        # This is a simplified extraction - in practice, you'd use more sophisticated parsing
        # Look for review patterns in the scraped content
        if isinstance(scraped_content, dict) and 'content' in scraped_content:
            content = scraped_content['content']

            # Extract review text and ratings using regex patterns
            review_pattern = r'(?:rating|star).*?(\d+).*?(?:out of|/).*?5'
            reviews_text = re.findall(r'"(?:review|comment).*?"(.*?)"', content, re.IGNORECASE)

            for i, review_text in enumerate(reviews_text[:10]):  # Limit to first 10 reviews
                if len(review_text.strip()) > 20:  # Only include substantial reviews
                    reviews.append({
                        'text': review_text.strip(),
                        'rating': None,  # Would need more sophisticated parsing
                        'review_id': i
                    })

        return reviews

## OpenAI Integration for Review Summarization

Create functions for chain-of-thought prompting with OpenAI GPT-4o-mini:


In [None]:
def summarize_reviews_chain_of_thought(reviews, detail_level="brief"):
    """
    Use chain-of-thought prompting to summarize Amazon reviews
    """
    if not reviews:
        return "No reviews found to analyze."

    # Prepare review text for analysis
    review_texts = [review['text'] for review in reviews if review['text']]
    combined_reviews = "\n\n".join(review_texts[:20])  # Limit to first 20 reviews

    if detail_level == "brief":
        prompt = f"""
Let's analyze these Amazon product reviews step by step:

Step 1: Identify the main themes and topics mentioned
Step 2: Assess the overall sentiment (positive, negative, mixed)
Step 3: Extract key insights about the product

Reviews to analyze:
{combined_reviews}

Please provide a brief summary (2-3 sentences) following this chain of thought.
"""
    else:  # detailed
        prompt = f"""
Let's analyze these Amazon product reviews step by step:

Step 1: Extract and categorize all review ratings (if mentioned)
Step 2: Identify common positive themes and what customers love
Step 3: Identify common negative themes and complaints
Step 4: Analyze sentiment distribution
Step 5: Generate comprehensive summary with pros/cons breakdown

Reviews to analyze:
{combined_reviews}

Please provide a detailed analysis following this chain of thought, including:
- Overall sentiment and rating trends
- Key positive themes
- Main complaints or issues
- Final recommendation summary
"""

    try:
        response = openai_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": "You are an expert at analyzing product reviews and providing insightful summaries."},
                {"role": "user", "content": prompt}
            ],
            max_tokens=1000 if detail_level == "brief" else 2000,
            temperature=0.3
        )
        return response.choices[0].message.content
    except Exception as e:
        return f"Error generating summary: {str(e)}"

def compare_products_summary(product_summaries):
    """
    Compare multiple product reviews using chain-of-thought
    """
    prompt = f"""
Let's compare these product reviews step by step:

Step 1: Analyze each product's strengths and weaknesses
Step 2: Compare overall sentiment and customer satisfaction
Step 3: Identify which product performs better in different categories
Step 4: Provide a comparative summary

Product Summaries:
{product_summaries}

Please provide a comparative analysis following this chain of thought.
"""

    try:
        response = openai_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": "You are an expert at comparing products based on customer reviews."},
                {"role": "user", "content": prompt}
            ],
            max_tokens=1500,
            temperature=0.3
        )
        return response.choices[0].message.content
    except Exception as e:
        return f"Error generating comparison: {str(e)}"

## Example 1: List Available Firecrawl Tools

First, let's see what tools are available from the Firecrawl MCP server:


In [None]:
async def list_available_tools():
    async with FirecrawlClient(FIRECRAWL_API_KEY) as client:
        tools = await client.list_tools()
        print("Available Firecrawl Tools:")
        print("=" * 50)
        for tool in tools:
            print(f"\nTool: {tool.name}")
            print(f"Description: {tool.description}")
            if hasattr(tool, 'inputSchema'):
                print(f"Input Schema: {json.dumps(tool.inputSchema, indent=2)}")
        return tools

# Run the async function
tools = await list_available_tools()

Process group termination failed for PID 90307: [Errno 1] Operation not permitted, falling back to simple terminate


McpError: Connection closed

## Example 2: Single Amazon Product Review Fetch

Scrape an Amazon product page and generate a brief summary using chain-of-thought prompting:


async def fetch_single_product_reviews():
    # Example Amazon product URL (replace with actual product URL)
    amazon_url = "https://www.amazon.com/dp/B08N5WRWNW"  # Example: Echo Dot
    
    async with FirecrawlClient(FIRECRAWL_API_KEY) as client:
        print(f"Scraping Amazon product: {amazon_url}")
        print("=" * 60)
        
        try:
            # Scrape the URL
            result = await client.scrape_url(amazon_url, extract_reviews=True)
            print("Raw scraped content preview:")
            print("-" * 40)
            if isinstance(result, dict) and 'content' in result:
                print(result['content'][:500] + "..." if len(result['content']) > 500 else result['content'])
            else:
                print(result)
            
            # Extract reviews from scraped content
            reviews = client.extract_amazon_reviews(result)
            print(f"\nExtracted {len(reviews)} reviews")
            print("-" * 40)
            
            # Display first few reviews
            for i, review in enumerate(reviews[:3]):
                print(f"\nReview {i+1}:")
                print(f"Text: {review['text'][:200]}...")
                if review['rating']:
                    print(f"Rating: {review['rating']}")
            
            # Generate brief summary using chain-of-thought
            print("\n" + "=" * 60)
            print("CHAIN-OF-THOUGHT SUMMARY:")
            print("=" * 60)
            summary = summarize_reviews_chain_of_thought(reviews, detail_level="brief")
            print(summary)
            
        except Exception as e:
            print(f"Error scraping or analyzing reviews: {e}")

# Run the async function
await fetch_single_product_reviews()

## Example 3: Detailed Review Analysis with Chain-of-Thought

Perform a comprehensive analysis of the same product using detailed chain-of-thought prompting:



In [None]:
async def fetch_ml_frameworks_docs():
    frameworks = [
        ("torch", "transformer model"),
        ("tensorflow", "keras"),
        ("jax", "neural network"),
        ("accelerate", "distributed training"),
        ("peft", "LoRA fine-tuning"),
        ("bitsandbytes", "quantization"),
        ("trl", "RLHF training"),
    ]

    async with Context7Client(CONTEXT7_API_KEY) as client:
        for framework, query in frameworks:
            print(f"\n{'=' * 60}")
            print(f"Framework: {framework}")
            print(f"Query: {query}")
            print('=' * 60)
            try:
                result = await client.get_documentation(framework, query=query)
                print(result)
            except Exception as e:
                print(f"Error fetching documentation: {e}")
            print("\n")

# Run the async function
await fetch_ml_frameworks_docs()


## Example 4: Multiple Products Comparison

Compare reviews from multiple Amazon products using chain-of-thought analysis:



In [None]:
async def compare_multiple_products():
    # Example Amazon product URLs (replace with actual product URLs)
    product_urls = [
        "https://www.amazon.com/dp/B08N5WRWNW",  # Echo Dot
        "https://www.amazon.com/dp/B07XJ8C8F7",  # Echo Show
        "https://www.amazon.com/dp/B07FZ8S74R"   # Echo Plus
    ]

    product_summaries = []

    async with FirecrawlClient(FIRECRAWL_API_KEY) as client:
        for i, url in enumerate(product_urls, 1):
            print(f"\n{'=' * 70}")
            print(f"Product {i}: {url}")
            print('=' * 70)

            try:
                # Scrape the URL
                result = await client.scrape_url(url, extract_reviews=True)
                reviews = client.extract_amazon_reviews(result)

                print(f"Found {len(reviews)} reviews")

                # Generate summary for this product
                summary = summarize_reviews_chain_of_thought(reviews, detail_level="brief")
                product_summaries.append(f"Product {i} Summary:\n{summary}")

                print(f"\nProduct {i} Summary:")
                print("-" * 50)
                print(summary)

            except Exception as e:
                print(f"Error analyzing product {i}: {e}")
                product_summaries.append(f"Product {i}: Error - {str(e)}")

        # Compare all products
        print("\n" + "=" * 70)
        print("COMPARATIVE ANALYSIS")
        print("=" * 70)

        comparison = compare_products_summary("\n\n".join(product_summaries))
        print(comparison)

# Run the async function
await compare_multiple_products()


## Example 5: Interactive Review Fetcher

Create a reusable function to fetch and summarize reviews for any Amazon product:



In [None]:
async def fetch_and_summarize_reviews(amazon_url, detail_level="brief"):
    """
    Interactive function to fetch and summarize reviews for any Amazon product.

    Args:
        amazon_url (str): Amazon product URL
        detail_level (str): "brief" or "detailed" summary level

    Returns:
        dict: Contains reviews, summary, and metadata
    """
    print(f"🔍 Fetching reviews from: {amazon_url}")
    print(f"📊 Analysis level: {detail_level}")
    print("=" * 70)

    async with FirecrawlClient(FIRECRAWL_API_KEY) as client:
        try:
            # Scrape the URL
            result = await client.scrape_url(amazon_url, extract_reviews=True)
            reviews = client.extract_amazon_reviews(result)

            print(f"✅ Successfully extracted {len(reviews)} reviews")

            if not reviews:
                return {
                    "success": False,
                    "error": "No reviews found",
                    "reviews": [],
                    "summary": "No reviews available for analysis"
                }

            # Generate summary using chain-of-thought
            print(f"\n🤖 Generating {detail_level} summary using chain-of-thought...")
            summary = summarize_reviews_chain_of_thought(reviews, detail_level=detail_level)

            print(f"\n📝 Summary ({detail_level}):")
            print("-" * 50)
            print(summary)

            return {
                "success": True,
                "reviews": reviews,
                "summary": summary,
                "review_count": len(reviews),
                "detail_level": detail_level
            }

        except Exception as e:
            print(f"❌ Error: {e}")
            return {
                "success": False,
                "error": str(e),
                "reviews": [],
                "summary": None
            }

# Example usage:
# result = await fetch_and_summarize_reviews("https://www.amazon.com/dp/B08N5WRWNW", "brief")
# result = await fetch_and_summarize_reviews("https://www.amazon.com/dp/B08N5WRWNW", "detailed")


## Notes and Best Practices

1. **API Keys**: Make sure to replace `YOUR_FIRECRAWL_API_KEY` and `YOUR_OPENAI_API_KEY` with your actual API keys
2. **Error Handling**: The examples include try-catch blocks to handle cases where scraping might fail or reviews might not be available
3. **Async/Await**: All MCP operations are asynchronous, so use `await` in Jupyter notebooks or async functions
4. **Resource Management**: The `FirecrawlClient` uses context managers to ensure proper cleanup of connections
5. **Chain-of-Thought Prompting**: The summarization uses structured prompting to ensure the AI follows a logical reasoning process
6. **Rate Limiting**: Be mindful of API rate limits for both Firecrawl and OpenAI services
7. **Amazon URL Format**: Ensure Amazon URLs are in the correct format (e.g., https://www.amazon.com/dp/PRODUCT_ID)

## Server Configuration

The Firecrawl MCP server is configured with:
```json
{
  "mcpServers": {
    "firecrawl-mcp": {
      "command": "npx",
      "args": ["-y", "firecrawl-mcp"],
      "env": {
        "FIRECRAWL_API_KEY": "YOUR-API-KEY"
      }
    }
  }
}
```

This configuration is automatically handled by the `FirecrawlClient` class in this notebook.

## Chain-of-Thought Prompting Benefits

The chain-of-thought approach provides several advantages:
- **Structured Analysis**: Forces the AI to follow a logical sequence of steps
- **Transparency**: Shows the reasoning process behind conclusions
- **Better Accuracy**: Step-by-step analysis typically produces more accurate results
- **Comprehensive Coverage**: Ensures all important aspects are considered
- **Debugging**: Easier to identify where analysis might have gone wrong

## Usage Examples

```python
# Brief summary
result = await fetch_and_summarize_reviews("https://www.amazon.com/dp/B08N5WRWNW", "brief")

# Detailed analysis
result = await fetch_and_summarize_reviews("https://www.amazon.com/dp/B08N5WRWNW", "detailed")

# Compare multiple products
await compare_multiple_products()
```

