Releases · info-suvastutech/scrapy-stealth

18 Jun 13:40

fawadss1

v0.6.9a2

fddad80

v0.6.9a2 Pre-release

Pre-release

Fixed

Windows browser-restart log noise (WinError 995)
Suppressed benign Windows Proactor teardown errors logged when the event loop and proxy relay are torn down during a browser restart.
The loop exception handler now ignores WinError 995 (ERROR_OPERATION_ABORTED) and WinError 64 (ERROR_NETNAME_DELETED)
alongside the existing 10054 (WSAECONNRESET); genuine errors are still surfaced. The restart itself was always succeeding — only
the spurious ERROR tracebacks are gone.

Assets 2

18 Jun 12:12

fawadss1

v0.6.9a1

78a99f0

v0.6.9a1 Pre-release

Pre-release

Added

Proxy bypass list (BROWSER_PROXY_BYPASS_LIST)
Route chosen domains around the proxy in the browser engine. The user-supplied list is passed to Chrome's --proxy-bypass-list
launch flag, so requests to those domains connect to the origin directly instead of through the proxy relay. Supports the full Chrome
bypass syntax — bare hostnames, wildcards (*.example.com), IP/CIDR ranges, ports, and the <local> token. Configured globally via
config/settings; only takes effect when a proxy is in use.

Assets 2

18 Jun 07:58

fawadss1

v0.6.8

219fbed

v0.6.8 Latest

Latest

Added

Intelligent content wait (_smart_wait)
Automatically detects JavaScript challenges, CAPTCHAs, and anti-bot interstitial pages and waits for meaningful page content before returning a response, improving success rates on protected websites.
Advanced challenge detection
Added comprehensive detection for Cloudflare, DataDome, Akamai, Kasada, and other common anti-bot challenge pages.
Randomized browser fingerprinting
Browser sessions now launch with realistic randomized window sizes and language configurations to reduce fingerprint consistency across sessions.
Intelligent browser restart (BROWSER_RESTART_AFTER_BANS)
Browser instances are now restarted only after a configurable number of consecutive bans or challenge responses, replacing the previous fixed-request restart strategy.
Static asset blocking (BROWSER_STATIC_ASSETS_BLOCK)
Optional blocking of images, fonts, stylesheets, and other non-essential assets via Chrome DevTools Protocol, reducing bandwidth usage and improving page load performance.
StealthDependencyError
New typed exception for optional dependency loading failures, providing platform-specific guidance for resolving missing native libraries and runtime dependencies.

Fixed

Windows browser restart race condition
Resolved event-loop teardown and restart timing issues that could produce InvalidStateError exceptions during browser restarts.
Windows dependency loading failures
Improved handling of wreq and curl_cffi DLL loading errors with actionable error messages instead of opaque import tracebacks.
Deferred dependency loading
Optional browser-profile dependencies are now loaded lazily, preventing unrelated engines from failing when specific native dependencies are unavailable.
Browser response rendering
Improved response handling to ensure successful pages are fully rendered before being returned to Scrapy.

Changed

Browser restart strategy
Replaced the request-count-based restart mechanism with ban-aware restart logic, reducing unnecessary browser restarts during healthy crawls.
Test suite refactoring
Simplified browser-related test cases and reduced mock complexity for improved maintainability.

Performance

Reduced bandwidth consumption
Static asset blocking can significantly decrease network usage and page load times when visual assets are not required.
Improved browser stability
Smarter restart behavior reduces browser churn while maintaining long-running crawl reliability.

Assets 2

18 Jun 07:33

fawadss1

v0.6.8a2

b957631

v0.6.8a2 Pre-release

Pre-release

Added

StealthDependencyError — typed exception for compiled-dependency failures
New exception class in exceptions.py that inherits from both StealthException and
ImportError, fitting naturally into both the package exception hierarchy and standard
except ImportError handlers.
Raised whenever a compiled optional dependency (wreq, curl_cffi) fails to load —
typically because a required native DLL or shared library could not be found.

The exception provides a platform-aware, actionable message at raise time:
- Windows — instructs the user to install both x64 and x86 Visual C++ Redistributables
  (2015–2022) with direct download links.
- Linux — suggests the appropriate apt-get / yum packages for missing system
  libraries (libssl, libcurl).
StealthDependencyError is exported from the top-level package and added to __all__,
making it catchable in user code alongside the other stealth exceptions.

Fixed

engines/basic.py — ImportError: DLL load failed while importing wreq on fresh Windows
The bare from wreq.blocking import Client and from wreq.proxy import Proxy module-level
imports crashed immediately on machines without the Visual C++ Redistributable installed,
surfacing as an opaque DLL load failed traceback deep inside Scrapy's middleware loader.
Both imports are now wrapped in try/except ImportError and delegate to
StealthDependencyError.check("wreq", exc) for a clear, actionable error message.
engines/turbo.py — same DLL failure for curl_cffi on fresh Windows
from curl_cffi import CurlHttpVersion and from curl_cffi.requests import Session suffer
the same failure path as wreq when the VCRT is absent.
Both imports are now guarded with StealthDependencyError.check("curl_cffi", exc).
utils/profiles.py — wreq.emulation crash at import time propagated silently
from wreq.emulation import Emulation, Profile was a module-level import, meaning the
entire profiles module — and by extension every engine that imports it — failed to load
on VCRT-missing machines, producing the same deep DLL load failed traceback.
The import is now guarded with a _WREQ_AVAILABLE flag; Emulation and Profile fall
back to None so the module loads cleanly. The private _require_wreq() helper raises
StealthDependencyError at the point of actual use (inside _resolve_basic), not at
import time, keeping the turbo and browser drivers unaffected on machines where
wreq is broken but curl_cffi loads fine.

Assets 2

16 Jun 07:38

fawadss1

v0.6.8b1

0225754

v0.6.8b1 Pre-release

Pre-release

Added

Intelligent browser restart (BROWSER_RESTART_AFTER_BANS)
The browser engine now restarts Chrome (fresh fingerprint, cookies, and CDP session) only
when it actually needs to — after BROWSER_RESTART_AFTER_BANS (default 5) consecutive
responses are classified as banned or challenged by AntiBotDetector. A single clean
response resets the streak to zero, so a browser sailing through cleanly is never restarted,
no matter how many requests it has served. Replaces the previous fixed-count
BROWSER_RESTART_EVERY restart, which fired blindly every N requests regardless of whether
anything was actually going wrong.
Implemented via a small BanStreakTracker helper in utils/browser.py.

Fixed

Browser engine — restart/teardown race on Windows
_reset_browser() now waits for the old event loop's thread to fully stop (_stop_loop())
before starting a new loop and thread. Previously the old ProactorEventLoop could keep
polling its selector after the replacement loop was already running, surfacing as an
InvalidStateError crash or unretrieved OSError task exceptions on Windows.

Assets 2

12 Jun 09:39

fawadss1

v0.6.8a1

c45fe02

v0.6.8a1 Pre-release

Pre-release

Added

Intelligent content wait (_smart_wait)
The browser engine now detects if a page is a JS challenge, CAPTCHA, or script-heavy stub
(e.g., Cloudflare, DataDome) and automatically waits for the real content to populate.
It uses a heuristic based on body length and tag structure to decide whether to wait,
significantly improving success rates on protected sites while maintaining speed on
normal pages.
JS challenge detection (_JS_IS_CHALLENGE)
A comprehensive JavaScript-based detector that identifies common anti-bot platforms
(Cloudflare, DataDome, Akamai, Kasada) and challenge states (Ray ID, "Checking your browser")
by scanning the DOM and window title.
Randomized browser fingerprinting
Chrome is now launched with randomized --window-size and --lang arguments selected from
a curated list of common configurations. This ensures that every browser session (and
every proxy-rotated request) presents a unique, realistic identity to anti-bot systems.

Changed

Refactored test cases
Simplified fetch mocks in tests by removing the unnecessary proxy argument and
streamlining assertions.

Fixed

Browser engine — improved response handling
Integrated _smart_wait into the fetch pipeline, ensuring 2xx responses are fully
rendered before returning.

Assets 2

10 Jun 11:45

fawadss1

v0.6.7

2706c4c

v0.6.7

Changed

Browser engine — single persistent browser for both proxy and non-proxy modes
Previously, proxy mode spawned a fresh Chrome process for every request and tore it down
immediately after, making concurrent proxy crawls extremely expensive. The engine now runs
one persistent browser regardless of whether a proxy is configured.
A local auth-injecting relay (_start_proxy_relay) is started once at browser initialisation
and the browser is launched with --proxy-server=http://127.0.0.1:<relay_port> baked in.
Each request opens an isolated tab (via new_tab=True) and closes it when done — identical
to non-proxy mode. Proxy credentials are injected at the TCP level by the relay and never
touch the browser.
Impact: one Chrome process per spider instead of one per request; dramatically lower memory
and startup overhead on proxy-enabled crawls.
Browser engine — splash screen loaded once at startup, not per request
The project logo / chrome://welcome splash was previously loaded in every request tab as a
warm-up step before navigating to the real target. It is now loaded once on browser.main_tab
immediately after the browser starts (_start()), warming up the renderer, stealth patches,
and (when proxied) the relay tunnel — before any spider request arrives. Request tabs navigate
directly to the target URL with no splash overhead.
Browser engine — early return on non-2xx responses
_do_fetch now reads the HTTP status code before waiting for page content. Responses in the
2xx range receive the full _wait_for_content() + settle delay as before. Non-2xx responses
(4xx, 5xx) skip the content wait and return immediately with whatever the browser has already
rendered, avoiding up to 10 seconds of unnecessary polling on error pages.

Added

_wait_for_status(page, timeout=8.0) utility
The Navigation Timing API (performance.getEntriesByType('navigation')[0].responseStatus)
is written asynchronously by Chrome and can return 0 immediately after page.wait(),
especially through a proxy or after redirects. The new helper polls every 250 ms until a
non-zero status is available, then returns it. If the entry never populates within 8 seconds
(rare SPA edge case) it falls back to 200 — the safest assumption when the page loaded but
left no timing entry. _JS_STATUS default changed from ?? 200 to ?? 0 to expose the
"not ready" state to the poller rather than masking it.

Fixed

Browser engine — ConnectionResetError / BrokenPipeError log noise on Windows
On Windows with Python 3.13+, closing a Chrome tab or stopping the browser triggers
_ProactorBasePipeTransport._call_connection_lost() which raises
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host. This is harmless — the connection is already gone — but asyncio logged it as
an unhandled exception on every tab close. The loop exception handler now suppresses
ConnectionResetError, BrokenPipeError, and raw OSError with winerror == 10054
(the unwrapped variant seen on some Python 3.14 builds).
Browser engine — relay and tab-semaphore torn down correctly on browser restart
_reset_browser() now closes the proxy relay server and clears _relay_server /
_relay_port before spinning up a new event loop, so the restarted browser gets a fresh
relay rather than pointing at a dead port.

Assets 2

08 Jun 13:04

fawadss1

v0.6.6

4b7663f

v0.6.6

Added

BROWSER_EXECUTABLE_PATH configuration option
New setting allows specifying a custom Chrome/Chromium/Brave binary path for the browser engine.
Set via config.BROWSER_EXECUTABLE_PATH or BROWSER_EXECUTABLE_PATH in Scrapy settings.
Useful when Chrome is installed in a non-standard location or when using alternative browsers like Brave.
Proper error messages guide users to set the config if the binary is not found at the configured path.
Unified logger output for browser engine
Replaced direct console module usage with logger throughout the browser engine for consistent,
structured logging that integrates with Scrapy's logging system. All browser startup messages,
restarts, and warnings now appear in the standard [scrapy-stealth] log format.

Changed

Browser engine — simplified stealth approach for improved detection evasion
The BrowserEngine has been streamlined to focus on real Chrome behavior without aggressive JavaScript injection.
Removed the _STEALTH_JS injection (which masked CDP fingerprints and spoofed Windows platform attributes)
because anti-bot systems increasingly detect the injections themselves rather than the CDP presence.

The engine now:
- Removes all custom user-agent forcing (uses Chrome's default)
- Eliminates JavaScript navigator property overrides (webdriver, platform, plugins, languages, WebGL, UAv4)
- Simplifies browser arguments to essential flags only (disables only AutomationControlled blink feature)
- Maintains Xvfb support for non-headless Chrome on Linux without $DISPLAY
- Keeps persistent browser reuse for performance
- Works identically in headless and non-headless modes
Result: headless=False with real display/Xvfb now evades detection more effectively because
the browser appears "normal" to anti-bot systems rather than heavily modified.

Fixed

Browser engine — bans when using headless=False with injection-based detection
Anti-bot systems like Akamai specifically scan for the telltale patterns in commonly-used CDP stealth scripts.
Removing the injection eliminates a major detection surface while maintaining the evasion benefits of running
a real browser process.

Optimized

Browser engine — code duplication eliminated
Extracted _start_browser() helper method that centralizes browser startup and BROWSER_EXECUTABLE_PATH
error handling. _start() (persistent browser) and _do_fetch() (per-proxy browser) now call the same
code path, reducing maintenance burden and ensuring consistent behavior across non-proxy and proxy modes.

Assets 2

04 Jun 11:04

fawadss1

v0.6.6a2

389c39a

v0.6.6a2 Pre-release

Pre-release

Added

Xvfb virtual display support for Docker / Zyte
On Linux without a $DISPLAY, the browser engine now automatically starts
Xvfb :99 before launching Chrome. This lets Chrome run in non-headless mode
against a virtual framebuffer — identical to a real desktop session — which is
significantly harder for anti-bot systems to detect than --headless=new.
Falls back to headless silently if Xvfb is not installed.
Requires apt-get install -y xvfb in your Docker image.

Assets 2

04 Jun 09:57

fawadss1

v0.6.6a1

6cc7b07

v0.6.6a1 Pre-release

Pre-release

Added

BROWSER_NO_SANDBOX config option
New BROWSER_NO_SANDBOX: bool | None setting controls Chrome's sandbox mode.
Defaults to None (auto-detect): sandbox is disabled automatically when the process runs
as root on Linux (e.g. Zyte, Docker). Set True to force no-sandbox, False to keep
sandbox even as root. Configurable via settings.py (BROWSER_NO_SANDBOX = True) or
the config object.

Fixed

Browser engine fails on Docker (running as root)
Chrome refuses to start without --no-sandbox when the process is root. The engine now
auto-detects root and adds both --no-sandbox and --disable-dev-shm-usage (required
in containers with limited /dev/shm).
headless=False crashes in display-less environments
When no $DISPLAY is set on Linux (Docker, CI), the engine now silently overrides
headless=False to headless=True, preventing Chrome from crashing on startup.

Assets 2

Releases: info-suvastutech/scrapy-stealth

v0.6.9a2

Fixed

Uh oh!

v0.6.9a1

Added

Uh oh!

v0.6.8

Added

Fixed

Changed

Performance

Uh oh!

v0.6.8a2

Added

Fixed

Uh oh!

v0.6.8b1

Added

Fixed

Uh oh!

v0.6.8a1

Added

Changed

Fixed

Uh oh!

v0.6.7

Changed

Added

Fixed

Uh oh!

v0.6.6

Added

Changed

Fixed

Optimized

Uh oh!

v0.6.6a2

Added

Uh oh!

v0.6.6a1

Added

Fixed

Uh oh!