# Groq + Firecrawl MCP: AI-Powered Web Scraping & Data Extraction
This notebook demonstrates how to empower Groq inference with enterprise-grade web scraping capabilities using Firecrawl's Model Context Protocol (MCP) server for intelligent data extraction, structured parsing, and deep web research.

We will achieve this through three simple steps:
1. Set up **Groq MCP client** for fast inference.
2. Set up **Firecrawl MCP server** for enterprise web scraping.
3. Seamlessly **connect the client to the server** through the Responses API.

---


In [None]:
# install dependencies
%pip install openai python-dotenv ipython


## Getting Started

Follow these steps to set up:
1. **Sign up** for Groq at [console.groq.com](https://console.groq.com/keys) to get your free API key.
2. **Sign up** for Firecrawl at [firecrawl.dev/app/api-keys](https://firecrawl.dev/app/api-keys) to get your API key.
3. **Copy your API keys** from your Groq and Firecrawl account dashboards.
4. **Paste your API keys** into the cell below and run the cell.

*Note: do **not** run the cell below if your keys are already configured in an .env file in this directory*


In [None]:
# To export your API keys into a .env file, run the following cell (replace with your actual keys):
!echo "GROQ_API_KEY=<your-groq-api-key>" >> .env
!echo "FIRECRAWL_API_KEY=<your-firecrawl-api-key>" >> .env


In [None]:
import json
import os
import time
from datetime import datetime
from dotenv import load_dotenv

from openai import OpenAI
from openai.types import responses as openai_responses

load_dotenv()

GROQ_API_KEY = os.getenv("GROQ_API_KEY")
FIRECRAWL_API_KEY = os.getenv("FIRECRAWL_API_KEY")

if not GROQ_API_KEY:
    print("Please set your Groq API key")
else:
    print("Groq API key configured successfully!")
if not FIRECRAWL_API_KEY:
    print("Please set your Firecrawl API key")
else:
    print("Firecrawl API key configured successfully!")
    
MODEL = "openai/gpt-oss-120b"

## Step 1: Set up the Groq client


In [None]:

client = OpenAI(base_url="https://api.groq.com/api/openai/v1", api_key=GROQ_API_KEY)


## Step 2: Set up Firecrawl's remote MCP server


In [None]:
# set up Firecrawl MCP server
tools = [
    openai_responses.tool_param.Mcp(
        server_label="firecrawl",
        server_url=f"https://mcp.firecrawl.dev/{FIRECRAWL_API_KEY}/v2/mcp",
        type="mcp",
        require_approval="never",
    )
]

## Step 3: Connect Groq to the Firecrawl MCP through Groq's Responses API


This will be our **main** function we use to call the Groq API. 

In [None]:
def call_groq_with_tools(client, tools, query):

    start_time = time.time()

    response = client.responses.create(
        model=MODEL,
        input=query,
        tools=tools,
        stream=False,
        temperature=0.1,
        top_p=0.4,
    )

    total_time = time.time() - start_time

    content = (
        response.output_text if hasattr(response, "output_text") else str(response)
    )

    executed_tools = []

    if hasattr(response, "output") and response.output:
        for output_item in response.output:
            if hasattr(output_item, "type") and output_item.type == "mcp_call":
                executed_tools.append(
                    {
                        "type": "mcp",
                        "arguments": getattr(output_item, "arguments", "{}"),
                        "output": getattr(output_item, "output", ""),
                        "name": getattr(output_item, "name", ""),
                        "server_label": getattr(output_item, "server_label", ""),
                    }
                )
    return {
        "content": content,
        "total_time": total_time,
        "mcp_calls_performed": executed_tools,
        "timestamp": datetime.now().isoformat(),
    }


Let's implement a helper function to display MCP tool calls and their results. This will provide transparency into which tools were called, their arguments, and outputs. 

(This is generally useful when debugging if you need to view raw MCP outputs, it's not vital for functionality)


In [None]:
def print_mcp_calls(mcp_calls):
    executed_tools = mcp_calls["mcp_calls_performed"]
    if executed_tools:
        print(f"\nFIRECRAWL MCP CALLS: Found {len(executed_tools)} tool call(s):")
        print("-" * 50)
        for i, tool in enumerate(executed_tools, 1):
            print(f"\nTool Call #{i}")
            print(f"   Type: {tool['type']}")
            print(f"   Tool Name: {tool['name']}")
            print(f"   Server: {tool['server_label']}")
            try:
                if tool["arguments"]:
                    args = (
                        json.loads(tool["arguments"])
                        if isinstance(tool["arguments"], str)
                        else tool["arguments"]
                    )
                    print(f"   Arguments: {args}")

                if tool["output"]:
                    output_data = (
                        json.loads(tool["output"])
                        if isinstance(tool["output"], str)
                        else tool["output"]
                    )
                    if isinstance(output_data, dict):
                        if "url" in output_data:
                            print(f"   URL Scraped: {output_data['url']}")
                        if "success" in output_data:
                            print(f"   Success: {output_data['success']}")
                        if "markdown" in output_data:
                            content_preview = output_data["markdown"][:200] + "..." if len(output_data["markdown"]) > 200 else output_data["markdown"]
                            print(f"   Content Preview: {content_preview}")
                    else:
                        print(f"   Output: {str(output_data)[:200]}...")
            except Exception as e:
                print(f"   Could not parse tool data: {e}")
    print(f"   Total time: {mcp_calls['total_time']:.2f} seconds")
    print(f"   Firecrawl MCP calls: {len(mcp_calls['mcp_calls_performed'])}")


# Examples

**Note:** Some queries may consume more tokens than others depending on the amount of tool calls the model makes. Please be aware of various rate limits that are tied to your API keys if you happen to run into any rate limit errors. 

---


## Demo 1: Website Analysis & Content Scraping

Let's build a web scraper that analyzes Anthropic's website to extract comprehensive company information, products, and recent announcements using Firecrawl's intelligent scraping capabilities.


In [None]:
from IPython.display import Markdown

website_analysis = call_groq_with_tools(
    client,
    tools,
    "Scrape and analyze the website https://console.groq.com/docs/models. Provide a comprehensive overview of all available models.",
)


Let's display the agent's response in markdown format.


In [None]:
Markdown(website_analysis["content"])

Let's examine the agent's intermediate steps, including how it calls different Firecrawl tools and configures tool arguments.

In [None]:
print_mcp_calls(website_analysis)

## Demo 2: Structured Data Extraction

Now we'll create a competitive analysis tool that extracts structured pricing data from multiple AI companies (OpenAI, Anthropic, Groq) and formats it into consistent JSON schemas for easy comparison.


In [None]:
structured_extraction = call_groq_with_tools(
    client,
    tools,
    """Use the firecrawl_extract tool to extract structured pricing information from these AI company websites:
    
    URLs: https://openai.com, https://anthropic.com, https://groq.com
    
    Extract the following data for each company in JSON format:
    {
        "company_name": "string",
        "pricing_plans": [
            {
                "plan_name": "string",
                "price": "string",
                "features": ["string"]
            }
        ],
        "contact_info": "string",
        "main_product": "string"
    }
    
    Focus on finding current pricing information and structure it consistently across all companies.""",
)


Let's display the structured extraction results.


In [None]:
Markdown(structured_extraction["content"])


Let's examine the agent's intermediate steps.


In [None]:
print_mcp_calls(structured_extraction)


## Demo 3: Deep Research & Multi-hop Analysis

Here we'll build an AI research agent that conducts comprehensive multi-hop research on AI inference trends, automatically discovering and analyzing multiple sources to create a detailed research report with proper citations.


In [None]:
deep_research = call_groq_with_tools(
    client,
    tools,
    """Conduct comprehensive deep research on "latest trends in AI model inference speed and performance" using the firecrawl_deep_research tool.
    
    Research should include:
    1. Recent developments in AI inference optimization (2024-2025)
    2. Key companies and technologies leading this space
    3. Performance benchmarks and comparison data
    4. Future trends and implications
    
    Use deep research capabilities to find and analyze multiple authoritative sources. Provide a comprehensive report with proper citations.""",
)


Let's display the deep research report.


In [None]:
Markdown(deep_research["content"])


Let's examine the deep research process.


In [None]:
print_mcp_calls(deep_research)


## Demo 4: Try it Yourself

Now it's your turn! Create your own custom web intelligence agent by replacing the query below with your specific web scraping, data extraction, or research task.


In [None]:
your_query = "Your Query Here"  # Change this!

custom_response = call_groq_with_tools(client, tools, your_query)


In [None]:
Markdown(custom_response["content"])


In [None]:
print_mcp_calls(custom_response)


## Available Firecrawl MCP Tools

Firecrawl MCP provides several powerful tools for web scraping, data extraction, and research:

| Tool | Description |
|------|-------------|
| **`firecrawl_scrape`** | Scrape content from a single URL with advanced options and formatting |
| **`firecrawl_batch_scrape`** | Scrape multiple URLs efficiently with built-in rate limiting and parallel processing |
| **`firecrawl_check_batch_status`** | Check the status of a batch operation and retrieve results |
| **`firecrawl_search`** | Search the web and optionally extract content from search results |
| **`firecrawl_crawl`** | Start an asynchronous crawl with advanced options for depth and link following |
| **`firecrawl_extract`** | Extract structured information from web pages using LLM capabilities and JSON schemas |
| **`firecrawl_deep_research`** | Conduct comprehensive deep web research with intelligent crawling and LLM analysis |
| **`firecrawl_generate_llmstxt`** | Generate standardized llms.txt files that define how LLMs should interact with a site |

### Key Benefits

1. **Enterprise-Grade Reliability**: Handles JavaScript, authentication, and dynamic content
2. **AI-Powered Intelligence**: Understands content semantically, not just structurally  
3. **Batch Processing**: Efficient parallel operations for production workloads
4. **Speed**: Sub-10 second responses when combined with Groq's fast inference
5. **Transparency**: Full visibility into tool calls and data sources

---

## Summary

You've just experienced fast AI-powered web intelligence that combines:
- **Fast responses** (3-10 seconds) via Groq inference
- **Enterprise web scraping** with Firecrawl's advanced capabilities  
- **Structured data extraction** using AI-powered parsing
- **Deep research** with multi-hop reasoning and source transparency

This approach enables you to build applications that need both speed and reliability for web data collection and analysis tasks.
