---
sidebar_label: Hyperbrowser Extract
---


# Extract Tool

The `extract_tool` is a powerful tool that uses AI to extract structured data from web pages. It can extract information based on natural language prompts or predefined schemas, making it perfect for gathering specific data from websites.

## Overview

### Integration details

| Tool         | Package                | Local | Serializable | [JS support](https://js.langchain.com/docs/integrations/document_loaders/web_loaders/langchain_hyperbrowser_loader) |
| :----------- | :--------------------- | :---: | :----------: | :-----------------------------------------------------------------------------------------------------------------: |
| Extract Tool | langchain-hyperbrowser |  ❌   |      ❌      |                                                         ❌                                                          |

## Setup

To access the extract tool you'll need to install the `langchain-hyperbrowser` integration package, and create a Hyperbrowser account and get an API key.

### Credentials

Head to [Hyperbrowser](https://app.hyperbrowser.ai/) to sign up and generate an API key. Once you've done this set the HYPERBROWSER_API_KEY environment variable:

```bash
export HYPERBROWSER_API_KEY=<your-api-key>
```


### Installation

Install **langchain-hyperbrowser**.


In [None]:
%pip install -qU langchain-hyperbrowser

## Basic Usage

### Simple Extraction


In [4]:
from langchain_hyperbrowser import HyperbrowserExtractTool

result = HyperbrowserExtractTool().invoke(
    {
        "url": "https://example.com",
        "extraction_prompt": "Extract the title of the page only.",
        "json_schema": None,
    }
)
print(result)

{'data': {'title': 'Example Domain'}, 'error': None}


## Advanced Usage

### With Custom Schema


In [7]:
from pydantic import BaseModel
from typing import List


class ProductSchema(BaseModel):
    title: str
    price: float


class ProductsSchema(BaseModel):
    products: List[ProductSchema]


result = HyperbrowserExtractTool().run(
    {
        "url": "https://dummyjson.com/products?limit=10",
        "extraction_prompt": "Extract the product details",
        "json_schema": ProductsSchema,
    }
)
print(result)

{'data': {'products': [{'price': 9.99, 'title': 'Essence Mascara Lash Princess'}, {'price': 19.99, 'title': 'Eyeshadow Palette with Mirror'}, {'price': 14.99, 'title': 'Powder Canister'}, {'price': 12.99, 'title': 'Red Lipstick'}, {'price': 8.99, 'title': 'Red Nail Polish'}, {'price': 49.99, 'title': 'Calvin Klein CK One'}, {'price': 129.99, 'title': 'Chanel Coco Noir Eau De'}, {'price': 89.99, 'title': "Dior J'adore"}, {'price': 69.99, 'title': 'Dolce Shine Eau de'}, {'price': 79.99, 'title': 'Gucci Bloom Eau de'}]}, 'error': None}


### With Custom Session Options


In [9]:
result = HyperbrowserExtractTool().run(
    {
        "url": "https://dummyjson.com/products?limit=10",
        "extraction_prompt": "Extract the product details",
        "json_schema": ProductsSchema,
        "session_options": {"session_options": {"use_proxy": True}},
    }
)
print(result)

{'data': {'products': [{'price': 9.99, 'title': 'Essence Mascara Lash Princess'}, {'price': 19.99, 'title': 'Eyeshadow Palette with Mirror'}, {'price': 14.99, 'title': 'Powder Canister'}, {'price': 12.99, 'title': 'Red Lipstick'}, {'price': 8.99, 'title': 'Red Nail Polish'}, {'price': 49.99, 'title': 'Calvin Klein CK One'}, {'price': 129.99, 'title': 'Chanel Coco Noir Eau De'}, {'price': 89.99, 'title': "Dior J'adore"}, {'price': 69.99, 'title': 'Dolce Shine Eau de'}, {'price': 79.99, 'title': 'Gucci Bloom Eau de'}]}, 'error': None}


### Using in an Agent


In [None]:
from langchain.agents import AgentExecutor, create_openai_functions_agent
from langchain_openai import ChatOpenAI
from langchain_hyperbrowser import HyperbrowserExtractTool
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder


# Initialize the extract tool with your API key
extract_tool = HyperbrowserExtractTool()
# Create the agent with the extract tool
llm = ChatOpenAI(temperature=0)
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant that extracts information from websites.",
        ),
        ("human", "{input}"),
        MessagesPlaceholder(variable_name="agent_scratchpad"),
    ]
)
agent = create_openai_functions_agent(llm, [extract_tool], prompt=prompt)
agent_executor = AgentExecutor(agent=agent, tools=[extract_tool], verbose=True)

# Run the agent
result = agent_executor.invoke(
    {
        "input": "Extract product information from https://dummyjson.com/products?limit=10"
    }
)
print(result)

hb_f73d5244f0c1a74514ef1340c0ca


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `hyperbrowser_extract_data` with `{'url': 'https://dummyjson.com/products?limit=10', 'extraction_prompt': 'Extract product information', 'json_schema': None}`


[0m[36;1m[1;3m{'data': {'price': 9.99, 'images': ['https://cdn.dummyjson.com/products/images/beauty/Essence%20Mascara%20Lash%20Princess/1.png'], 'category': 'beauty', 'currency': 'USD', 'productID': 'RCH45Q1A', 'description': 'The Essence Mascara Lash Princess is a popular mascara known for its volumizing and lengthening effects. Achieve dramatic lashes with this long-lasting and cruelty-free formula.', 'productName': 'Essence Mascara Lash Princess', 'availability': True}, 'error': None}[0m[32;1m[1;3mI have extracted product information from the website:

- Product Name: Essence Mascara Lash Princess
- Category: Beauty
- Price: $9.99
- Currency: USD
- Product ID: RCH45Q1A
- Description: The Essence Mascara Lash Princess

### Async Usage


In [10]:
async def extract_data():
    tool = HyperbrowserExtractTool()
    result = await tool.arun(
        {
            "url": "https://example.com",
            "extraction_prompt": "Extract the main content",
            "json_schema": None,
        }
    )
    return result


result = await extract_data()

In [11]:
print(result)

{'data': {'mainContent': 'This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.'}, 'error': None}


## API reference

- [GitHub](https://github.com/hyperbrowserai/langchain-hyperbrowser/)
- [PyPi](https://pypi.org/project/langchain-hyperbrowser/)
- [Hyperbrowser Docs](https://docs.hyperbrowser.ai/)
