# Scrapingbee

The ScrapingBee web scraping API handles headless browsers, rotates proxies for you, and offers AI-powered data extraction.

This notebook provides a quick overview for getting started with Scrapingbee tool.

## Overview

### Integration details


| Class | Package | Serializable | JS support |  Package latest |
| :--- | :--- | :---: | :---: | :---: |
| [Scrapingbee](https://pypi.org/project/langchain-scrapingbee/) | [langchain-scrapingbee](https://pypi.org/project/langchain-scrapingbee/) | ✅ | ❌ |  ![PyPI - Version](https://img.shields.io/pypi/v/langchain-community?style=flat-square&label=%20) |

### Tool features

* ScrapeUrlTool - Scrape the contents of any public website. You can also use this to extract data, capture screenshots, interact with the page before scraping and capture the internal requests sent by the webpage.
* GoogleSearchTool - Search Google to obtain the following types of information regular search (classic), news, maps, and images.
* CheckUsageTool — Monitor your ScrapingBee credit or concurrency usage using this tool.

## Setup

```bash
pip install -U langchain-scrapingbee
```

### Credentials

You should configure credentials by setting the following environment variables:

* SCRAPINGBEE_API_KEY

In [2]:
import getpass
import os

# if not os.environ.get("SCRAPINGBEE_API_KEY"):
#     os.environ["SCRAPINGBEE_API_KEY"] = getpass.getpass("SCRAPINGBEE API key:\n")

It's also helpful (but not needed) to set up [LangSmith](https://smith.langchain.com/) for best-in-class observability:

In [3]:
# os.environ["LANGSMITH_TRACING"] = "true"
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass()

## Instantiation

All of the ScrapingBeee tools only require the API Key during instantiation. If not set up in environment vairable, you can provide it directly here.

Here we show how to instantiate an instance of the Scrapingbee tools:

In [None]:
from langchain_scrapingbee import (
    ScrapeUrlTool, 
    GoogleSearchTool, 
    CheckUsageTool,
)

scrape_tool = ScrapeUrlTool(api_key=os.environ.get("SCRAPINGBEE_API_KEY"))
search_tool = GoogleSearchTool(api_key=os.environ.get("SCRAPINGBEE_API_KEY"))
usage_tool = CheckUsageTool(api_key=os.environ.get("SCRAPINGBEE_API_KEY"))

## Invocation

### Invoke directly with args

**ScrapeUrlTool**

This tool accepts `url` (string) and `params` (dictionary) as argument. The `url` argument is necessary, and the `params` argument is optional. You can use `params` argument to customise the request. For example, to disable JavaScript Rendering, you can use the following as `params`:

```
{'render_js': False}
```

For a complete list of acceptable parameters, please visit the [HTML API documentation](https://www.scrapingbee.com/documentation/).

**GoogleSearchTool**

This tool accepts `search` (string) and `params` (dictionary) as argument. The `search` argument is necessary, and the `params` argument is optional. You can use `params` argument to customise the request. For example, to get news results, you can use the following as `params`:

```
{'search_type': 'news'}
```

For a complete list of acceptable parameters, please visit the [Google Search API documentation](https://www.scrapingbee.com/documentation/google/).

**CheckUsageTool**

This tool doesn't require any arguments. Invoking this tool will check your ScrapingBee API usage data and returns the following information:

* max_api_credit
* used_api_credit
* max_concurrency
* current_concurrency
* renewal_subscription_date

In [None]:
scrape_tool.invoke({
    'url': 'http://httpbin.org/html'
})

scrape_tool.invoke({
    'url': 'https://treaties.un.org/doc/publication/ctc/uncharter.pdf',
    'params': {'render_js': False} 
})

search_result = search_tool.invoke({
    'search': 'What is LangChain?'
})

usage_tool.invoke({})

### Example Using Agent

In [None]:
import os
from langchain_scrapingbee import (
    ScrapeUrlTool, 
    GoogleSearchTool, 
    CheckUsageTool,
)
from langchain_google_genai import ChatGoogleGenerativeAI
from langgraph.prebuilt import create_react_agent

if not os.environ.get("GOOGLE_API_KEY") or not os.environ.get("SCRAPINGBEE_API_KEY"):
    raise ValueError("Google and ScrapingBee API keys must be set in environment variables.")

llm = ChatGoogleGenerativeAI(temperature=0, model="gemini-2.5-flash")
scrapingbee_api_key = os.environ.get("SCRAPINGBEE_API_KEY")

tools = [
    ScrapeUrlTool(api_key=scrapingbee_api_key),
    GoogleSearchTool(api_key=scrapingbee_api_key),
    CheckUsageTool(api_key=scrapingbee_api_key),
]

agent = create_react_agent(llm, tools)

user_input = "If I have enough API Credits, search for pdfs about langchain and save 3 pdfs."

# Stream the agent's output step-by-step
for step in agent.stream(
    {"messages": user_input},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()

## API reference

For detailed documentation of all Scrapingbee features and configurations head to the API reference:
* [HTML API](https://www.scrapingbee.com/documentation/)
* [Google Search API](https://www.scrapingbee.com/documentation/google/)
* [Data Extraction](https://www.scrapingbee.com/documentation/data-extraction/)
* [JavaScript Scenario](https://www.scrapingbee.com/documentation/js-scenario/)