v0.3.0
Added
-
scrape_do.async_apisub-package —ScrapeDoAsyncAPIClient(backed byhttpx.Client) andAsyncScrapeDoAsyncAPIClient(backed byhttpx.AsyncClient) covering the fullq.scrape.dosurface:create_job,get_job,list_jobs,get_task,cancel_job,get_user_info, plus polling helperswait_for_jobandsubmit_and_wait. Typed status-code error routing with automatic retries on transient gateway errors (429/502/503/504) and per-requestr_timeout/extensionsescape hatches. -
Polling configuration —
PollingStrategy(configurable exponential backoff with jitter, attempt count, and wall-clock budgets) and thePollingFunctiontype alias for fully-custom cadences. Both share the same(attempt, elapsed, job) -> floatsignature sowait_for_jobaccepts either interchangeably. -
SDK-native event hooks for the Async API —
AsyncAPIEventHooks(sync) andAsyncAPIAsyncEventHooks(async). Lifecycle coversrequest/response/retry/poll; thepollhook receives a parsedJobDetailssnapshot on every non-terminal polling iteration. -
scrape_do.pluginssub-package — typed*Parametersmodels for the Amazon and Google plugin gateways with cross-field validation. Companion*AsyncPluginadapters underscrape_do.async_api.models.pluginsplug intoJobCreationRequest.pluginvia a discriminated union. Every adapter (and theAsyncPluginunion itself) is also re-exported fromscrape_do.async_apiso the typical import pattern is two lines:from scrape_do.async_api import AsyncScrapeDoAsyncAPIClient, AmazonPdpAsyncPlugin+from scrape_do.plugins import AmazonPdpParameters. Also adds public Google localization constants. -
Typed Async-API exception hierarchy —
AsyncAPIError(base) and per-status-code subclasses,AsyncAPIUnparsableResponseErrorfor 2xx bodies the SDK can't parse,JobFailedError/JobCanceledError/TaskFailedError/TaskCanceledErrorfor terminal lifecycle states, andJobTimeoutErrorfor exhausted polling budgets.AsyncScrapeDoErrorMessageparses the gateway's{Error, Code}envelope. -
ScrapeDoJSONErrorMessage— pydantic model for the structured JSON error envelope returned by the synchronous gateway. Exposesstatus_code/messages/url/possible_causes/error_type/error_code/contact, plus anis_auth_throttleproperty for detecting the auth-throttle case. -
ScrapeDoResponseergonomics —__repr__/__str__for REPL inspection,to_dict()andto_json(**kwargs)for serialization, and a fixedjson(raw_response=False)that extracts thecontentkey from the Scrape.do JSON envelope when present. -
scrape_do.models.validators— public helpers for parameter cross-validation (check_geo_code,check_postal_code,check_geo_exclusion, screenshot / return-json / play-with-browser dependency rules, etc.) usable standalone without instantiating a parameters model.
Changed
-
APIResponseErrornow usesScrapeDoJSONErrorMessage.try_from_responsefor body parsing instead of the legacy key-list lookup (detail,Error,errorMessage,message,Message). Error messages are richer and the "Unknown API Error" fallback prints status + body on separate lines. -
Added
typing_extensions>=4.0as a direct runtime dependency.
Fixed
-
ScrapeDoFrame.url/ScrapeDoNetworkRequest.urlrelaxed fromHttpUrltostr. Real-world iframes and network requests produce technically-valid but quirky URLs (e.g.,?feature=oembed?wmode=transparent) that pydantic-core's URL parser rejected, which blew up the whole response parse. -
ScrapeDoResponse.cookiesregex no longer captures structural whitespace after;separators. Second-and-later cookie names previously came back with a phantom leading space. -
ScrapeDoResponseconstructor no longer crashes withJSONDecodeErrorwhen Scrape.do returns HTML instead of JSON underreturnJSON=true— the failure is now properly routed throughis_proxy_error. -
RequestParameters.to_proxy_urlnow double-encodes the param string so values with URL-reserved characters (notably the JSON-stringplayWithBrowserpayload) survive httpx's transparent decode of the proxy password during Basic auth header construction. -
Python
3.9/3.10compatibility restored. Source files importingSelf/Unpack/TypeAliasfromtyping(only available in3.11+/3.10+) now usetyping_extensions. Previously the package raisedImportErrorat import time on3.9/3.10despite the trove classifiers claiming support.
Internal
-
New
scrape_do.async_apiandscrape_do.pluginssub-package layout. Async-API helpers (_raise_for_status,_parse_response,_build_job_creation_request) live as module-level functions inscrape_do.async_api.clientand are shared by both client classes. -
New unit tests for
scrape_do.async_apiandmodels/response.py. -
Integration coverage expanded from 22 → ~120 tests across the Sync API, Proxy Mode, and Async API surfaces. The new
tests/integration/async_api/suite exercises every endpoint, both client classes, polling helpers, event hooks, the render envelope, a livePlayWithBrowseraction sequence, the typed-exception hierarchy, and 12 of the 15*AsyncPluginvariants. The remaining three (google/trends,walmart/store,lowes/store) are unit-only; they hit upstream- or engine-side failures regardless of input. -
Integration logging pipeline formalized around
pytest.hookimpl-decorated setup / makereport / teardown hooks with per-test tokens stashed onitem.stash;_validate_and_log_error_stateconsolidated into aresponse_tracefixture. -
Unit test fixtures consolidated; new shared
tests/unit/async_api/conftest.pyfor the Async-API unit suite plustests/integration/async_api/conftest.pyexposing live client fixtures, a tightfast_polling_strategy, best-effort cancel helpers, and a type-dispatchedasync_api_response_trace. -
CI matrix expanded to Python
3.9/3.10/3.11/3.12/3.13(fail-fast: false);lintjob (ruff + mypy) split out and pinned to3.13.
Full Changelog: v0.2.0...v0.3.0