Replies: 1 comment
-
|
New version will be adding new driver for TLS impersonation. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
the cloudflare 403s on the CDP scrapers (the CommunityScrapers cloudflare megaissue, and threads like #2003 noting it only worked ~1 in 5 without CDP) come down to the engine: pkg/scraper/url.go drives chrome through chromedp, and cloudflare scores headless chrome's fingerprint and the CDP attach surface regardless of the user-agent you set. setting GetScraperUserAgent() doesn't move it because the detection isn't the UA, it's the chromium-shape + CDP signature.
the engine that sidesteps that is a firefox build with the fingerprint patches at the c++ source level (canvas readback, webgl getParameter, font metrics, audio, navigator, system colors). no js shim, no CDP attach signature, non-chrome engine that cloudflare scores more leniently. it's feder-cr/invisible_firefox, MPL-2.
honest about the hard blocker here: stash drives the browser via chromedp, i.e. CDP, and firefox does not speak CDP. so unlike a chrome-to-chrome swap, pointing stash at a patched firefox would need a non-CDP driver path (playwright/webdriver bidi), which is real work in the Go scraper layer, not an ExecPath change. that's why this is a heads-up, not a PR.
mostly flagging the engine-level reason the UA workaround structurally can't fix the cloudflare scrapers, in case a non-CDP firefox driver is ever on the table for the scraper backend. happy to share the per-surface detail (what cloudflare reads on chromium vs a source-patched firefox) if it's useful for that decision.
Beta Was this translation helpful? Give feedback.
All reactions