# Steel Web Loader

This notebook demonstrates how to use Steel's browser automation capabilities with LangChain for web scraping and automation tasks.

Steel provides managed browser infrastructure with features like:
- Proxy network access
- Automated CAPTCHA solving
- Session management and debugging

## Installation

First, install the required packages:

In [None]:
!pip install langchain playwright

You'll need a Steel API key to use this loader. You can get one at [steel.dev](https://steel.dev). Set it as an environment variable:

In [None]:
import os
os.environ["STEEL_API_KEY"] = "your-api-key"  # Replace with your key

## Basic Usage

The Steel loader can be used to load web pages as documents:

In [None]:
from langchain_community.document_loaders import SteelWebLoader

# Create a loader for a specific URL
loader = SteelWebLoader(
    "https://example.com",
    extract_strategy="text"  # Can be 'text', 'markdown', or 'html'
)

# Load the page
documents = loader.load()

# Print the content
print(documents[0].page_content[:500])

## Advanced Features

The Steel loader supports several advanced features:

In [None]:
# Configure proxy and CAPTCHA solving
loader = SteelWebLoader(
    "https://example.com",
    use_proxy=True,      # Use Steel's proxy network
    solve_captcha=True,  # Enable automated CAPTCHA solving
    timeout=60000        # Increase timeout for complex pages
)

documents = loader.load()

# Access session information for debugging
print(f"Session ID: {documents[0].metadata['steel_session_id']}")
print(f"Session Viewer: {documents[0].metadata['steel_session_viewer_url']}")

## Using with LangChain Agents

The Steel loader can be used as part of a LangChain agent for web automation tasks:

In [None]:
from langchain.agents import initialize_agent, Tool
from langchain_openai import ChatOpenAI

def scrape_webpage(url: str) -> str:
    """Scrape content from a webpage using Steel."""
    loader = SteelWebLoader(url)
    documents = loader.load()
    return documents[0].page_content if documents else ""

# Create a tool for the agent
tools = [
    Tool(
        name="SteelWebScraper",
        func=scrape_webpage,
        description="Useful for scraping content from webpages. Input should be a URL."
    )
]

# Initialize the agent
llm = ChatOpenAI(temperature=0)
agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)

# Use the agent
agent.invoke(
    "What is the main heading on example.com?"
)

## Error Handling

The Steel loader includes proper error handling:

In [None]:
try:
    loader = SteelWebLoader(
        "https://non-existent-site.com",
        timeout=5000  # Short timeout for demo
    )
    documents = loader.load()
except Exception as e:
    print(f"Error: {e}")

## Best Practices

1. **Session Management**: Steel sessions are automatically managed (created and cleaned up)
2. **Error Handling**: The loader includes proper error handling and logging
3. **Timeouts**: Adjust timeouts based on page complexity
4. **Extraction Strategy**: Choose the appropriate strategy for your use case:
   - `text`: Clean text content (default)
   - `markdown`: Structured content with basic formatting
   - `html`: Full HTML content
5. **Debugging**: Use the session viewer URL for debugging failed loads