---
sidebar_label: Hyperbrowser Crawl
---


# Crawl Tool

[Hyperbrowser](https://hyperbrowser.ai) is a platform for running and scaling headless browsers. It lets you launch and manage browser sessions at scale and provides easy to use solutions for any webscraping needs, such as scraping a single page or crawling an entire site.

Key Features:
- Instant Scalability - Spin up hundreds of browser sessions in seconds without infrastructure headaches
- Simple Integration - Works seamlessly with popular tools like Puppeteer and Playwright
- Powerful APIs - Easy to use APIs for scraping/crawling any site, and much more
- Bypass Anti-Bot Measures - Built-in stealth mode, ad blocking, automatic CAPTCHA solving, and rotating proxies

This notebook provides a quick overview for getting started with Hyperbrowser [document loader](https://python.langchain.com/docs/concepts/#document-loaders).

For more information about Hyperbrowser, please visit the [Hyperbrowser website](https://hyperbrowser.ai) or if you want to check out the docs, you can visit the [Hyperbrowser docs](https://docs.hyperbrowser.ai).

The `crawl_tool` is a powerful tool that can crawl entire websites, starting from a given URL. It supports configurable page limits and scraping options, making it perfect for gathering content from multiple pages of a website.

## Overview

### Integration details

| Tool         | Package                | Local | Serializable | [JS support](https://js.langchain.com/docs/integrations/document_loaders/web_loaders/langchain_hyperbrowser_loader) |
| :----------- | :--------------------- | :---: | :----------: | :-----------------------------------------------------------------------------------------------------------------: |
| Crawl Tool   | langchain-hyperbrowser |  ❌   |      ❌      |                                                         ❌                                                          |

## Setup

To access the crawl tool you'll need to install the `langchain-hyperbrowser` integration package, and create a Hyperbrowser account and get an API key.

### Credentials

Head to [Hyperbrowser](https://app.hyperbrowser.ai/) to sign up and generate an API key. Once you've done this set the HYPERBROWSER_API_KEY environment variable:

```bash
export HYPERBROWSER_API_KEY=<your-api-key>
```


### Installation

Install **langchain-hyperbrowser**.


In [None]:
%pip install -qU langchain-hyperbrowser

## Basic Usage

### Simple Crawling


In [None]:
from langchain_hyperbrowser import HyperbrowserCrawlTool

result = HyperbrowserCrawlTool().invoke(
    {
        "url": "https://example.com",
        "max_pages": 2,
        "scrape_options": {
            "formats": ["markdown"]
        }
    }
)
print(result)

## Advanced Usage

### With Custom Scraping Options


In [None]:
result = HyperbrowserCrawlTool().run(
    {
        "url": "https://example.com",
        "max_pages": 10,
        "scrape_options": {
            "formats": ["markdown", "html"],
            "include_tags": ["h1", "h2", "p", "a"],
            "exclude_tags": ["script", "style"],
            "include_metadata": True
        }
    }
)
print(result)

### With Custom Session Options


In [None]:
result = HyperbrowserCrawlTool().run(
    {
        "url": "https://example.com",
        "max_pages": 5,
        "scrape_options": {
            "formats": ["markdown"]
        },
        "session_options": {
            "use_proxy": True,
            "solve_captchas": True
        }
    }
)
print(result)

### Using in an Agent


In [None]:
from langchain.agents import AgentExecutor, create_openai_functions_agent
from langchain_openai import ChatOpenAI
from langchain_hyperbrowser import HyperbrowserCrawlTool
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder

# Initialize the crawl tool
crawl_tool = HyperbrowserCrawlTool()

# Create the agent with the crawl tool
llm = ChatOpenAI(temperature=0)
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant that crawls websites and extracts content.",
        ),
        ("human", "{input}"),
        MessagesPlaceholder(variable_name="agent_scratchpad"),
    ]
)
agent = create_openai_functions_agent(llm, [crawl_tool], prompt=prompt)
agent_executor = AgentExecutor(agent=agent, tools=[crawl_tool], verbose=True)

# Run the agent
result = agent_executor.invoke(
    {
        "input": "Crawl https://example.com and get content from up to 5 pages"
    }
)
print(result)

### Async Usage


In [None]:
async def crawl_data():
    tool = HyperbrowserCrawlTool()
    result = await tool.arun(
        {
            "url": "https://example.com",
            "max_pages": 5,
            "scrape_options": {
                "formats": ["markdown"]
            }
        }
    )
    return result


result = await crawl_data()

In [None]:
print(result)

## API reference

- [GitHub](https://github.com/hyperbrowserai/langchain-hyperbrowser/)
- [PyPi](https://pypi.org/project/langchain-hyperbrowser/)
- [Hyperbrowser Docs](https://docs.hyperbrowser.ai/)
