v0.6.2
Added
-
Custom engine exceptions —
StealthTimeoutError,StealthConnectionError,StealthBrowserNotFoundError
All three engines previously swallowed non-standard library exceptions and returnedNone, silently
bypassing Scrapy's retry middleware. Three typed exceptions now replace raw library errors:Exception Inherits from Retried by Scrapy Raised by StealthTimeoutErrorDownloadTimeoutError✅ (in default RETRY_EXCEPTIONS)All engines on request timeout StealthConnectionErrorConnectionError→OSError✅ ( OSErrorin defaultRETRY_EXCEPTIONS)BasicEngine/TurboEngineon DNS or network failure;BrowserEngineonOSErrorStealthBrowserNotFoundErrorStealthExceptiononly❌ (config error, retrying is pointless) BrowserEnginewhen Chrome/Chromium binary is missingLibrary-specific exceptions (
curl_cffi.Timeout,wreq.TimeoutError,wreq.ConnectionError,
curl_cffi.ConnectionError,curl_cffi.DNSError,curl_cffi.ProxyError,wreq.ProxyConnectionError)
are caught and re-raised as the appropriate stealth exception, preserving the original as__cause__.
All three are exported fromscrapy_stealthand can be caught in spidererrbackhandlers. -
BrowserEngine— Chrome error page detection
When a target URL is unreachable (DNS failure, network down), Chrome silently navigates to
chrome-error://chromewebdata/instead of raising a Python exception. The engine now evaluates
window.location.href.startsWith('chrome-error://')immediately after navigation; if true,
StealthConnectionErroris raised so Scrapy's retry middleware handles it correctly.
Fixed
-
Zyte (ScrapyCloud) —
FileException: download-erroronscrapy:2.15stack
BaseEngine.fetch()used Twisted'sdeferToThreadto run blocking HTTP calls in a thread pool.
On Scrapy 2.15 / Python 3.14, the media pipeline's fully-async architecture relies on native
asyncio awaiting; the Twisted→asyncio bridge no longer reliably resolved these Deferreds, causing
file/image downloads to fail withFileException("download-error").
BaseEngine.fetch()is nowasync defand usesasyncio.get_running_loop().run_in_executor().
ScrapyEngine.fetch()andStealthDownloaderMiddleware.process_request()are also madeasync. -
Zyte (ScrapyCloud) —
ImportError: cannot import name 'request_fingerprint'onscrapy:2.11stack
The previousscrapy>=2.15.2constraint forced pip to upgrade Scrapy on Zyte'sscrapy:2.11
stack, which broke Zyte's bundledsh_scrapyextension that still importsrequest_fingerprint
(removed in Scrapy 2.15). The constraint is nowscrapy>=2.12.0,<3.0. -
All Scrapy versions — unified async dispatch in
BaseEngine.fetch
Scrapy routes async downloader middlewares through different runners depending on version:
ensure_awaitable(newer Scrapy / local) runs coroutines as asyncio Tasks and requires an
asyncio Future;deferred_from_coro(Zytescrapy:2.11–2.12) drives them via Twisted
_inlineCallbacksand requires a Twisted Deferred.BaseEngine.fetchnow detects the active
runner viaasyncio.get_running_loop()and dispatches torun_in_executorordeferToThread
accordingly, making stealth requests work correctly on all supported Scrapy versions. -
TurboEngine/BasicEngine— timeout exceptions silently swallowed
curl_cffi.requests.exceptions.Timeoutandwreq.exceptions.TimeoutErrorare not subclasses
of Python's built-inTimeoutError, so theexcept TimeoutError: raiseguard did not catch
them. Both were swallowed by the broadexcept Exceptionhandler and discarded asNone,
preventing Scrapy's retry middleware from ever seeing a timeout. Both engines now raise
StealthTimeoutErrorfor their respective library timeout types. -
RequestContext(ctx) moved beforetryblock in all engines
ctx = self._ctx(request)was inside thetryblock, causing IDE warnings about possible
reference before assignment whenctxwas used inexcepthandler messages.