Skip to content

v0.6.2

Choose a tag to compare

@fawadss1 fawadss1 released this 20 May 08:19
· 102 commits to master since this release

Added

  • Custom engine exceptions — StealthTimeoutError, StealthConnectionError, StealthBrowserNotFoundError
    All three engines previously swallowed non-standard library exceptions and returned None, silently
    bypassing Scrapy's retry middleware. Three typed exceptions now replace raw library errors:

    Exception Inherits from Retried by Scrapy Raised by
    StealthTimeoutError DownloadTimeoutError ✅ (in default RETRY_EXCEPTIONS) All engines on request timeout
    StealthConnectionError ConnectionErrorOSError ✅ (OSError in default RETRY_EXCEPTIONS) BasicEngine / TurboEngine on DNS or network failure; BrowserEngine on OSError
    StealthBrowserNotFoundError StealthException only ❌ (config error, retrying is pointless) BrowserEngine when Chrome/Chromium binary is missing

    Library-specific exceptions (curl_cffi.Timeout, wreq.TimeoutError, wreq.ConnectionError,
    curl_cffi.ConnectionError, curl_cffi.DNSError, curl_cffi.ProxyError, wreq.ProxyConnectionError)
    are caught and re-raised as the appropriate stealth exception, preserving the original as __cause__.
    All three are exported from scrapy_stealth and can be caught in spider errback handlers.

  • BrowserEngine — Chrome error page detection
    When a target URL is unreachable (DNS failure, network down), Chrome silently navigates to
    chrome-error://chromewebdata/ instead of raising a Python exception. The engine now evaluates
    window.location.href.startsWith('chrome-error://') immediately after navigation; if true,
    StealthConnectionError is raised so Scrapy's retry middleware handles it correctly.

Fixed

  • Zyte (ScrapyCloud) — FileException: download-error on scrapy:2.15 stack
    BaseEngine.fetch() used Twisted's deferToThread to run blocking HTTP calls in a thread pool.
    On Scrapy 2.15 / Python 3.14, the media pipeline's fully-async architecture relies on native
    asyncio awaiting; the Twisted→asyncio bridge no longer reliably resolved these Deferreds, causing
    file/image downloads to fail with FileException("download-error").
    BaseEngine.fetch() is now async def and uses asyncio.get_running_loop().run_in_executor().
    ScrapyEngine.fetch() and StealthDownloaderMiddleware.process_request() are also made async.

  • Zyte (ScrapyCloud) — ImportError: cannot import name 'request_fingerprint' on scrapy:2.11 stack
    The previous scrapy>=2.15.2 constraint forced pip to upgrade Scrapy on Zyte's scrapy:2.11
    stack, which broke Zyte's bundled sh_scrapy extension that still imports request_fingerprint
    (removed in Scrapy 2.15). The constraint is now scrapy>=2.12.0,<3.0.

  • All Scrapy versions — unified async dispatch in BaseEngine.fetch
    Scrapy routes async downloader middlewares through different runners depending on version:
    ensure_awaitable (newer Scrapy / local) runs coroutines as asyncio Tasks and requires an
    asyncio Future; deferred_from_coro (Zyte scrapy:2.11–2.12) drives them via Twisted
    _inlineCallbacks and requires a Twisted Deferred. BaseEngine.fetch now detects the active
    runner via asyncio.get_running_loop() and dispatches to run_in_executor or deferToThread
    accordingly, making stealth requests work correctly on all supported Scrapy versions.

  • TurboEngine / BasicEngine — timeout exceptions silently swallowed
    curl_cffi.requests.exceptions.Timeout and wreq.exceptions.TimeoutError are not subclasses
    of Python's built-in TimeoutError, so the except TimeoutError: raise guard did not catch
    them. Both were swallowed by the broad except Exception handler and discarded as None,
    preventing Scrapy's retry middleware from ever seeing a timeout. Both engines now raise
    StealthTimeoutError for their respective library timeout types.

  • RequestContext (ctx) moved before try block in all engines
    ctx = self._ctx(request) was inside the try block, causing IDE warnings about possible
    reference before assignment when ctx was used in except handler messages.