# Firecrawl Website Scraping Tool

This notebook demonstrates how to use the `FirecrawlScrapeWebsiteTool` to scrape websites and extract markdown content. The tool leverages the Firecrawl API to map and scrape URLs from a given base URL.

## Prerequisites

1. **Firecrawl API Key**: You need an API key to access the Firecrawl service. You can obtain this by signing up on the Firecrawl platform.

2. **Environment Setup**: Ensure that the `FIRECRAWL_API_KEY` environment variable is set with your API key. You can also pass the API key directly when initializing the tool.

3. **Python Environment**: Make sure you have Python installed along with the necessary packages. You can install the required packages using:

   ```bash
   pip install firecrawl-py langchain-core langchain-openai
   ```

In [None]:
# Import Required Libraries

from firecrawl import FirecrawlApp
from langchain_community.tools.firecrawl.tool import FirecrawlScrapeWebsiteTool

## Setting Up the Tool

### Initialize the Tool

Create an instance of the `FirecrawlScrapeWebsiteTool`. You can specify the API key and whether to limit the scraping rate:

In [None]:
scrape_tool = FirecrawlScrapeWebsiteTool(api_key="your_api_key", limit_rate=True)

Alternatively, set the `FIRECRAWL_API_KEY` environment variable and initialize the tool without parameters:

In [None]:
import os

os.environ["FIRECRAWL_API_KEY"] = "your_api_key"
scrape_tool = FirecrawlScrapeWebsiteTool()

## Using the Tool

### Synchronous Scraping

To scrape a website synchronously, use the `_run` method:

In [None]:
base_url = "https://example.com"
markdown_content = scrape_tool._run(base_url)
print(markdown_content)

### Asynchronous Scraping

For asynchronous scraping, use the `_arun` method:

In [None]:
import asyncio


async def scrape_website_async():
    base_url = "https://example.com"
    markdown_content = await scrape_tool._arun(base_url)
    print(markdown_content)


asyncio.run(scrape_website_async())

## Using with Agents

The `FirecrawlScrapeWebsiteTool` can be integrated with agents to automate the scraping process as part of a larger workflow. Here's how you can set it up:


In [None]:
from langchain_core.agents import initialize_agent, AgentType
from langchain_openai import OpenAI

# Initialize the language model
llm = OpenAI(temperature=0)

# Create an agent with the Firecrawl tool
agent = initialize_agent(
    tools=[scrape_tool],
    llm=llm,
    agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
)

# Run the agent with a task
result = agent.run(
    "Scrape the website at https://example.com and extract markdown content."
)
print(result)

## Understanding the Output

The tool returns markdown content extracted from the website. Each URL's content is prefixed with a header indicating the URL, followed by the markdown content, and separated by a line of dashes.

## Error Handling

The tool includes error handling for mapping and scraping URLs. If an error occurs, it logs the error and raises a `RuntimeError`.

## Logging

The tool uses Python's logging module to provide information about the scraping process, including rate limiting and any warnings or errors encountered.

## Conclusion

The `FirecrawlScrapeWebsiteTool` is a powerful utility for extracting markdown content from websites. By following the steps outlined in this guide, you can effectively use the tool to scrape and process web content.