Replies: 1 comment
-
|
Thanks for the detailed writeup and the codebase analysis, that made this easy to scope. We went with Option C and implemented the full interception API on the embeddable obscura crate's Page. What's included: • add_preload_script(&str) runs a script before any of the page's own <script> tags (the Page.addScriptToEvaluateOnNewDocument contract). No more unsafe field access or [patch] forking. To your questions:
One current limitation: resource_type reports Fetch for JS-initiated requests and doesn't yet split Xhr from Fetch, since the op backs both through the same path. Splitting them cleanly needs an extra op parameter and a snapshot rebuild, so I left it as a follow-up. Everything else from the proposal is covered. There's an end-to-end test (crates/obscura/tests/interception.rs) covering preload scripts, the channel, the passive callbacks, and a Continue URL rewrite. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Proposal: Add a built-in HTTP request/response interception API to the
PagetypeTarget repository:
obscuraTopic: Feature request / API design discussion
Motivation
We are building a web crawler and security analysis tool on top of obscura, and a critical requirement is the ability to intercept, inspect, and optionally modify all HTTP requests and responses initiated by page-level JavaScript — including
XMLHttpRequestand thefetch()API.This capability is essential for:
Xhr,Fetch,Document,Script, etc.) and timing.Chrome DevTools Protocol (CDP) provides this via
Fetch.enableandNetwork.requestIntercepted. obscura, being a standalone headless browser engine (not Chromium-based), needs its own first-class interception API.Current workaround and its limitations
Since obscura does not yet expose a public interception API, we have been using two workarounds:
1. JavaScript injection via
page.evaluate()after navigationWe inject a script that wraps
XMLHttpRequest.prototypeandwindow.fetchto collect request/response data intowindow.__xhr_records:Problem: Scripts injected via
page.evaluate()run after the page's own scripts have already executed. Any XHR/Fetch calls made during page initialization (e.g., SPA bootstrap requests) are missed entirely.2. Using
preload_scripts(via Cargo[patch])We discovered that
obscura_browser::Pagehas apreload_scripts: Vec<String>field that executes scripts before any of the page's own<script>tags — exactly matching the CDPPage.addScriptToEvaluateOnNewDocumentcontract. This is the ideal injection point for an XHR interceptor.However, this field is private (declared without
pub), so external crates cannot push to it. To work around this, we currently use[patch.crates-io]inCargo.tomlto fork the dependency:This approach has three major drawbacks:
unsafepointer transmutation, which is fragile and could break with compiler optimizations or struct layout changes.3. Existing interception infrastructure is internal-only
We found that
obscura-jsalready contains anintercept_tx/intercept_enabledmechanism inop_fetch_url(ops.rs L536-L630) that can intercept JS-level fetch/XHR calls and route them through atokio::sync::mpscchannel. However,intercept_txis also a private field onPage, and the interception is only accessible from within the crate.Similarly,
ObscuraHttpClientalready hason_requestandon_responsecallback registries (client.rs L264-L266), but these callbacks are not invoked byop_fetch_url— they only fire for navigation-level requests issued viaObscuraHttpClient.fetch_with_method(). The JS runtime uses its own reqwestClientviacached_request_client()(ops.rs L632), completely bypassing these callbacks.Suggested API design
We propose adding a public, stable interception API to the
Pagetype (or to theobscuracrate's public surface). The design should cover three dimensions:Option A: Expose
preload_scriptsas a public methodThe simplest change that would enable the most use cases:
This is a minimal, non-breaking change — the internal
preload_scripts: Vec<String>already exists and is functional; only the visibility needs to change. It would allow external crates to inject XHR interceptors (or any initialization logic) without unsafe code or patching.Option B: Add
on_request/on_responsecallbacks toPageA higher-level, more ergonomic API that avoids the need for JavaScript injection entirely:
This design leverages the existing
InterceptedRequest/InterceptResolutiontypes already defined inobscura_js::ops(ops.rs L16-L39). The key change needed is:op_fetch_url(the Deno op that backs JS fetch/XHR) invokeObscuraHttpClient.on_request/on_responsecallbacks before/after the request, or route through the existingintercept_txchannel.intercept_txsetup fromPageto external consumers.The
ResourceType::XhrandResourceType::Fetchenum variants already exist inobscura_net::client(client.rs L70-L71), which shows this was anticipated — they just need to be plumbed throughop_fetch_url.Option C: Combine both — public preload scripts + interception callbacks
For maximum flexibility, we suggest implementing both Option A and Option B:
preload_scriptsaccessible.Why this matters for the obscura ecosystem
obscura is positioned as a modern, embeddable headless browser engine. Its key differentiators include being Rust-native (no Chrome/Chromium binary dependency) and providing a library-first API. A first-class request interception API is a cornerstone capability for any headless browser, and its absence forces downstream projects to either:
Adding this API would significantly increase obscura's adoption in the crawling, automation, and security tooling communities.
We are happy to help
We have already done extensive analysis of the codebase and can confirm that the infrastructure for interception already exists but is not yet exposed publicly. Specifically:
preload_scriptsexecution is already in place (page.rs L574-L581)intercept_tx/InterceptedRequest/InterceptResolutiontypes are defined (ops.rs)on_request/on_responsecallback slots are onObscuraHttpClient(client.rs)ResourceType::Xhr/ResourceType::Fetchare already defined (client.rs)We would be glad to:
preload_scriptspublicly accessible (with appropriate documentation and safety guarantees).on_request/on_responsecallback API, including wiringop_fetch_urlto invoke these callbacks.Discussion questions
preload_scriptsas a public method?on_request/on_responsecallback API (Option B) be something you'd accept a PR for?Looking forward to the community's thoughts!
Beta Was this translation helpful? Give feedback.
All reactions