Oxylabs’ Article Scraper is a data gathering solution allowing you to extract real-time information from any website effortlessly. This brief guide explains how an Article Scraper works and provides code examples to understand better how you can use it hassle-free.
You can get article results in HTML form by providing your list of URLs to our service.
The example below illustrates how you can obtain the HTML of an article from oxylabs.io.
import requests
from pprint import pprint
# Structure payload.
payload = {
'source': 'universal',
'url': 'https://oxylabs.io/blog/what-is-http-proxy'
}
# Get response.
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=('user', 'pass1'),
json=payload,
)
# Instead of response with job status and results url, this will return the
# JSON response with the result.
pprint(response.json())
Find code examples for other programming languages here
{
"results": [
{
"content": "<!DOCTYPE html><html lang=\"en\"><head><meta charSet=\"utf-8\" /><meta name=\"viewport\" content=\"width=de ... </html>",
"created_at": "2023-12-18 11:37:50",
"updated_at": "2023-12-18 11:38:00",
"page": 1,
"url": "https://oxylabs.io/blog/what-is-http-proxy",
"job_id": "7142478062620273665",
"status_code": 200
}
]
}
With our Article Scraper, you can seamlessly extract public data from any news or blog post. Gather critical information such as key facts, author attribution, or publication dates to expand your research and stay ahead of your competitors. If you have any queries or need assistance, don't hesitate to reach out to our support team through live chat or email us at hello@oxylabs.io.