The AI-Scraper is an experimental scraping tool by Oxylabs AI Studio that extracts data from a single webpage using AI. It identifies and parses relevant information based on a natural language prompt, then delivers results in either structured JSON (for automation and APIs) or Markdown format (best for readable outputs and AI workflows).
This AI scraper removes the need for CSS/XPath selectors or custom parsers, so it integrates seamlessly with various automation pipelines. Automatic schema generation and flexible output formats provide users with an easy way to extract clean, structured data without ever needing to maintain parsing logic.
- Natural language prompt-based extraction – Define your needs in plain English, and the scrape agent will retrieve the relevant information.
- Multiple output formats – Choose JSON for structured workflows or Markdown for human-readable results and AI workflows.
- Automatic schema generation – Generate a schema automatically from a prompt or define it manually for precise JSON parsing.
- Works on any public webpage – Extract from e-commerce, news, blogs, or any other accessible source.
To scrape a webpage with AI-Scraper, follow these steps:
- Provide the webpage URL you want to scrape.
- Describe the data to extract in natural language (e.g. “Get all product names and prices”).
- Select the output format – structured JSON or Markdown.
- (Optional) Define a schema – Let AI-Scraper generate one automatically, or provide your own OpenAPI schema for the exact structure you desire.
To begin, make sure you have access to an AI Studio API key (or get a free trial with 1000 credits) and Python ver. 3.10
or above installed. You can install the oxylabs-ai-studio
package using pip:
pip install oxylabs-ai-studio
The following examples show how to use AiScraper
to extract data from a sample page.
from oxylabs_ai_studio.apps.ai_scraper import AiScraper
# Initialize the AI Scraper with your API key
scraper = AiScraper(api_key="YOUR_API_KEY")
# Generate a schema automatically from natural language
schema = scraper.generate_schema(prompt="want to parse developer, platform, type, price game title, and genre (array)")
print(f"Generated schema: {schema}")
# Scrape a specific webpage and extract structured data
url = "https://sandbox.oxylabs.io/products/3"
result = scraper.scrape(
url=url,
output_format="json",
schema=schema,
render_javascript=False,
geo_location="US",
)
# Print the scrape output
print("Results:")
for item in result.data['games']:
print(item, "\n")
Learn more about AI-Scraper and Oxylabs AI Studio Python SDK in our PyPI repository. You can also check out our AI Studio JavaScript SDK guide for JS users.
Parameter | Description | Default Value |
---|---|---|
url * |
Target URL to scrape | – |
output_format |
Output format (json , markdown ) |
markdown |
schema |
OpenAPI schema for structured extraction (mandatory for JSON) | – |
render_javascript |
Enable render JavaScript | False |
geo_location |
Proxy location in ISO2 format | – |
*
– mandatory parameters
The AI-Scraper can return parsed, ready-to-use output that is easy to integrate into your applications.
This is a structured JSON of the response output:
Results:
{'developer': 'Nintendo EAD Tokyo', 'platform': 'wii', 'type': 'singleplayer', 'price': 91.99, 'title': 'Super Mario Galaxy 2', 'genre': ['Action', 'Platformer', '2D']}
{'developer': 'Nintendo', 'platform': 'wii', 'type': 'singleplayer', 'price': 88.99, 'title': 'Metroid Prime 3: Corruption', 'genre': ['Action', 'Shooter', 'First-Person', 'Sci-Fi', 'Arcade']}
{'developer': 'Nintendo', 'platform': 'wii', 'type': 'singleplayer', 'price': 83.99, 'title': 'Tomena Sanner', 'genre': ['Action', 'General']}
{'developer': 'Eidos Interactive', 'platform': 'wii', 'type': 'singleplayer', 'price': 80.99, 'title': 'Death Jr.: Root of Evil', 'genre': ['Action', 'Platformer', '3D']}
{'developer': 'Nintendo', 'platform': 'wii', 'type': 'singleplayer', 'price': 87.99, 'title': "Kirby's Return to Dream Land", 'genre': ['General', 'Action', 'Platformer', '2D']}
Alternatively, you can use output_format=”markdown”
to receive Markdown results instead of parsed JSON.
Oxylabs AI-Scraper can be applied to a wide variety of data collection tasks:
- Extract product details – Gather product names, descriptions, and prices from e-commerce sites.
- Parse news articles – Retrieve article titles, dates, authors, and body text.
- Scrape pricing pages – Collect structured pricing information for competitor or market research.
- Extract job postings – Capture job titles, locations, salaries, and posting dates from recruitment portals.
AI-Scraper doesn’t rely on CSS/XPath selectors or custom parsing logic. Instead, it uses natural language prompts and AI-powered extraction, making it more adaptable to layout changes and much easier to set up.
Yes, you can scrape any public webpage as long as the page is publicly accessible. AI-Scraper also supports JavaScript rendering for dynamic pages. Private or login-protected content isn’t supported out of the box.
No, schema is not mandatory, but it’s required if you want structured JSON output. If you don’t provide one, AI-Scraper can generate a schema automatically based on your prompt.
Unlike traditional scrapers, AI-Scraper is more resilient to layout changes because it interprets content with AI. However, major changes may require you to adjust either your prompt or the schema.
Oxylabs AI Studio AI-Scraper is free to try by signing up for a free trial that includes 1,000 credits. After the trial, the monthly plans start at just $12/month with 3000 credits and 1 request/s, with higher plans offering more credits and higher request rates.
For a deeper dive into available parameters, advanced integrations, and additional examples, check out the AI Studio documentation.
If you have questions or need support, reach out to us at hello@oxylabs.io, through live chat, or join our Discord community.