-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
network interception with Fetch.enable
breaks cloudflare
#123
Comments
I can confirm this. However, I suspect this to be a timing leak and cloudfare therefore sending a 403 back=> not really a way to fix. @milahu or any other thoughts//ideas on that? |
The problem is that one of Cloudflare's engineers is watching this repository... :) |
@juhacz Soo in case some @Cloudfare staff is reading this:Why not hire me directly instead of needing someone to analyse & understand the code on here ? :) |
@kaliiiiiiiiii Because we need people like you more :) I Suggest creating a profile at https://www.buymeacoffee.com/ I think people will confirm my words :) |
you mean the python response handler is too slow? or maybe the continueResponse/fulfillRequest logic has a bug but yeah, it seems to be a new problem
so far i used the "export HAR" function of chrome devtools network the exported HAR file does not include the bodies of binary responses chromium is open source, so it should be easy to find an alternative would be a local http proxy in the long term, they will replace captchas with government ID logins |
|
chromium devtools sourceschromium/src/third_party/devtools-frontend/src/front_end/panels/network/network-meta.ts UIStrings.recordNetworkLog UI.ActionRegistration.registerActionExtension({
actionId: 'network.toggle-recording',
category: UI.ActionRegistration.ActionCategory.NETWORK,
iconClass: UI.ActionRegistration.IconClass.START_RECORDING,
toggleable: true,
toggledIconClass: UI.ActionRegistration.IconClass.STOP_RECORDING,
toggleWithRedColor: true,
contextTypes() {
return maybeRetrieveContextTypes(Network => [Network.NetworkPanel.NetworkPanel]);
},
async loadActionDelegate() {
const Network = await loadNetworkModule();
return new Network.NetworkPanel.ActionDelegate();
},
options: [
{
value: true,
title: i18nLazyString(UIStrings.recordNetworkLog),
},
{
value: false,
title: i18nLazyString(UIStrings.stopRecordingNetworkLog),
},
], chromium/src/third_party/devtools-frontend/src/front_end/panels/network/NetworkPanel.ts network.toggle-recording export class ActionDelegate implements UI.ActionRegistration.ActionDelegate {
handleAction(context: UI.Context.Context, actionId: string): boolean {
const panel = context.flavor(NetworkPanel);
if (panel === null) {
return false;
}
switch (actionId) {
case 'network.toggle-recording': {
panel.toggleRecord(!panel.recordLogSetting.get());
return true;
} panel.toggleRecord toggleRecord(toggled: boolean): void {
this.toggleRecordAction.setToggled(toggled);
if (this.recordLogSetting.get() !== toggled) {
this.recordLogSetting.set(toggled);
}
this.networkLogView.setRecording(toggled);
if (!toggled && this.filmStripRecorder) {
this.filmStripRecorder.stopRecording(this.filmStripAvailable.bind(this));
}
} this.filmStripRecorder private willReloadPage(): void {
if (this.pendingStopTimer) {
clearTimeout(this.pendingStopTimer);
delete this.pendingStopTimer;
}
if (this.isShowing() && this.filmStripRecorder) {
this.filmStripRecorder.startRecording();
}
} this.filmStripRecorder this.filmStripRecorder = new FilmStripRecorder(this.networkLogView.timeCalculator(), this.filmStripView); FilmStripRecorder export class FilmStripRecorder implements TraceEngine.TracingManager.TracingManagerClient {
// ...
startRecording(): void {
// ...
const tracingManager =
SDK.TargetManager.TargetManager.instance().scopeTarget()?.model(TraceEngine.TracingManager.TracingManager);
// ...
this.tracingManager = tracingManager;
this.resourceTreeModel = this.tracingManager.target().model(SDK.ResourceTreeModel.ResourceTreeModel);
this.tracingModel = new TraceEngine.Legacy.TracingModel();
void this.tracingManager.start(this, '-*,disabled-by-default-devtools.screenshot', '');
// ...
}
// ...
stopRecording(callback: (filmStrip: TraceEngine.Extras.FilmStrip.Data) => void): void {
// ...
this.tracingManager.stop();
// ...
}
} → SDK.TargetManager.TargetManager.instance import * as SDK from '../../core/sdk/sdk.js'; chromium/src/third_party/devtools-frontend/src/front_end/core/sdk/sdk.ts import * as TargetManager from './TargetManager.js'; chromium/src/third_party/devtools-frontend/src/front_end/core/sdk/TargetManager.ts TraceEngine.TracingManager.TracingManager chromium/src/third_party/devtools-frontend/src/front_end/models/trace/TracingManager.ts export class TracingManager extends SDK.SDKModel.SDKModel<void> {
readonly #tracingAgent: ProtocolProxyApi.TracingApi;
// ...
async start(client: TracingManagerClient, categoryFilter: string, options: string):
Promise<Protocol.ProtocolResponseWithError> {
// ...
const args = {
bufferUsageReportingInterval: bufferUsageReportingIntervalMs,
categories: categoryFilter,
options: options,
transferMode: Protocol.Tracing.StartRequestTransferMode.ReportEvents,
};
const response = await this.#tracingAgent.invoke_start(args);
// ...
} chromium/src/third_party/devtools-frontend/src/front_end/generated/protocol-proxy-api.d.ts /**
* API generated from Protocol commands and events.
*/
declare namespace ProtocolProxyApi {
// ...
export interface TracingApi {
// ...
invoke_start(params: Protocol.Tracing.StartRequest): Promise<Protocol.ProtocolResponseWithError>; bufferUsageReportingInterval chromium sourceschromium/src/out/Debug/gen/content/browser/devtools/protocol/tracing.cc bufferUsageReportingInterval struct startParams : public crdtp::DeserializableProtocolObject<startParams> {
Maybe<String> categories;
Maybe<String> options;
Maybe<double> bufferUsageReportingInterval;
Maybe<String> transferMode;
Maybe<String> streamFormat;
Maybe<String> streamCompression;
Maybe<protocol::Tracing::TraceConfig> traceConfig;
Maybe<Binary> perfettoConfig;
Maybe<String> tracingBackend;
DECLARE_DESERIALIZATION_SUPPORT();
}; startParams void DomainDispatcherImpl::start(const crdtp::Dispatchable& dispatchable)
{
// Prepare input parameters.
auto deserializer = crdtp::DeferredMessage::FromSpan(dispatchable.Params())->MakeDeserializer();
startParams params;
if (!startParams::Deserialize(&deserializer, ¶ms)) {
ReportInvalidParams(dispatchable, deserializer);
return;
}
m_backend->Start(std::move(params.categories), std::move(params.options), std::move(params.bufferUsageReportingInterval), std::move(params.transferMode), std::move(params.streamFormat), std::move(params.streamCompression), std::move(params.traceConfig), std::move(params.perfettoConfig), std::move(params.tracingBackend), std::make_unique<StartCallbackImpl>(weakPtr(), dispatchable.CallId(), dispatchable.Serialized()));
} or simply: Tracing.start |
yep or maybe even the interception at C++ Chromium is to slow over a single websocket.
Yep there for sure are some bugs. What I as well could think of is that maybe some iframes don't get intercepted correctly, and therefore have a detectable difference to the main frame.
Yep that works as well of course, however more a workaround:)
See another thing to try is
Soo feel free to share a POC & status if you try that |
That would then explain why disabling site isolation works |
for my use case, i dont need any active interception of requests/responses so i will use Tracing.start edit: no. the i still dont understand how devtools network log gets the live network traffic |
Fetch.enable
breaks cloudflare
selenium-wire uses a patched version of mitmproxy as http proxy this also allows for active network interception |
still pretty sure the SSL/TLS fingerprint doesn't match to chrome as it doesn't use boringssl tho. see wkeeling/selenium-wire#215 (comment) |
Interesting note here that: from cdp_socket.utils.utils import launch_chrome, random_port
from cdp_socket.socket import CDPSocket
import os
import asyncio
global sock1
async def on_resumed(params):
global sock1
await sock1.exec("Fetch.continueRequest", {"requestId": params['requestId']})
print(params["request"]["url"])
async def main():
global sock1
PORT = random_port()
process = launch_chrome(PORT)
async with CDPSocket(PORT) as base_socket:
targets = await base_socket.targets
target = targets[0]
sock1 = await base_socket.get_socket(target)
await sock1.exec("Network.clearBrowserCookies")
await sock1.exec("Fetch.enable")
sock1.add_listener("Fetch.requestPaused", on_resumed)
await sock1.exec("Page.navigate", {"url": "https://nowsecure.nl#relax"})
await asyncio.sleep(5)
os.kill(process.pid, 15)
asyncio.run(main()) works just fine |
this works for requests, but not for responses test.py#!/usr/bin/env python3
# https://github.com/kaliiiiiiiiii/Selenium-Driverless/issues/123#issuecomment-1858803756
from cdp_socket.utils.utils import launch_chrome, random_port
from cdp_socket.socket import CDPSocket
from cdp_socket.exceptions import CDPError
import os
import asyncio
import json
import base64
import sys
import time
import traceback
global sock1
async def on_request_paused(params):
global sock1
url = params["request"]["url"]
url_clean = url.split("?")[0]
if len(url_clean) > 60:
url_clean = url_clean[:60] + "..."
_params = {"requestId": params['requestId']}
#if params.get('responseStatusCode') in [301, 302, 303, 307, 308]:
# # redirected request
# return await sock1.exec("Fetch.continueResponse", _params)
try:
#print("Fetch.getResponseBody ...", url_clean)
body = await sock1.exec("Fetch.getResponseBody", _params, timeout=30)
except CDPError as e:
#print("Fetch.getResponseBody CDPError", url_clean)
if e.code == -32000:
# Can only get response body on HeadersReceived pattern matched requests.
print("Fetch.getResponseBody CDPError -32000 -> Fetch.continueResponse", url_clean)
#print("Fetch.continueResponse ...", url_clean)
res = await sock1.exec("Fetch.continueResponse", _params, timeout=30)
#print("Fetch.continueResponse done", url_clean)
return res
else:
print("Fetch.getResponseBody CDPError raise", url_clean)
raise e
else:
print("Fetch.getResponseBody done", url_clean)
start = time.monotonic()
body_decoded = base64.b64decode(body['body'])
# modify body here
body_modified = base64.b64encode(body_decoded).decode("ascii")
fulfill_params = {"responseCode": 200, "body": body_modified}
fulfill_params.update(_params)
_time = time.monotonic() - start
if _time > 0.01:
print(f"decoding took long: {_time} s")
print("Fetch.fulfillRequest ...")
res = await sock1.exec("Fetch.fulfillRequest", fulfill_params, timeout=30)
print("Fetch.fulfillRequest done", url_clean)
print("Mocked response", url_clean)
return res
async def main():
global sock1
PORT = random_port()
process = launch_chrome(PORT)
async with CDPSocket(PORT) as base_socket:
targets = await base_socket.targets
target = targets[0]
sock1 = await base_socket.get_socket(target)
await sock1.exec("Network.clearBrowserCookies")
await sock1.exec("Fetch.enable")
sock1.add_listener("Fetch.requestPaused", on_request_paused)
# timeout: fix: asyncio.exceptions.TimeoutError
await sock1.exec("Page.navigate", {"url": "https://nowsecure.nl#relax"}, timeout=30)
print("waiting after Page.navigate")
await asyncio.sleep(5)
os.kill(process.pid, 30)
asyncio.run(main())
|
via |
i guess it uses a local https proxy with a self-signed certificate but still, this fails to bypass cloudflare
|
Also interesting here, that local overrides with the chrome devtools just work fine:
ahh yep, that makes sense
maybe there's a way to detect self-signed certificate usage? If no, it's probably timing or SSL//TLS fingerprinting I guess I see 2 possible aproaches here:
|
for now i gave up on intercepting requests... probably i would try the |
Well yeah, eventho I assume that the memory manipulation//ddl hooking solutions are specific to:
|
At
Uhh I think passive capturing works as well with |
Even changing request headers works just fine import asyncio
import base64
import sys
import time
import traceback
from cdp_socket.exceptions import CDPError
from selenium_driverless import webdriver
async def on_request(params, global_conn):
url = params["request"]["url"]
_params = {"interceptionId": params['interceptionId']}
if params.get('responseStatusCode') in [301, 302, 303, 307, 308]:
# redirected request
return await global_conn.execute_cdp_cmd("Network.continueInterceptedRequest", _params)
else:
fulfill_params = {"headers":params["request"]["headers"]}
fulfill_params["headers"]["test"] = "Hello World!"
fulfill_params.update(_params)
await global_conn.execute_cdp_cmd("Network.continueInterceptedRequest", fulfill_params)
print(url)
async def main():
options = webdriver.ChromeOptions()
async with webdriver.Chrome(options=options, max_ws_size=2 ** 30) as driver:
driver.base_target.socket.on_closed.append(lambda code, reason: print(f"chrome exited"))
global_conn = driver.current_target
await driver.get("about:blank")
await global_conn.execute_cdp_cmd("Network.enable", {"maxTotalBufferSize": 1_000_000, # 1GB
"maxResourceBufferSize": 1_000_000,
"maxPostDataSize": 1_000_000
})
await global_conn.execute_cdp_cmd("Network.setRequestInterception",
{"patterns": [{"urlPattern": "*",
# "interceptionStage": "HeadersReceived"
}]})
await global_conn.add_cdp_listener("Network.requestIntercepted", lambda data: on_request(data, global_conn))
await driver.get(
'https://nowsecure.nl',
timeout=60, wait_load=False)
while True:
await asyncio.sleep(10)
asyncio.run(main()) |
and where is the response body? |
#!/usr/bin/env python3
import asyncio
from selenium_driverless import webdriver
from selenium_driverless.types.by import By
import base64
async def main():
driver = await webdriver.Chrome()
#await asyncio.sleep(1)
target = None
async def requestWillBeSent(args):
#print("requestWillBeSent", args)
print("requestWillBeSent", args["request"]["url"])
async def requestWillBeSentExtraInfo(args):
print("requestWillBeSentExtraInfo", args)
async def responseReceived(args):
# TODO better. get target of this response
nonlocal target
#print("responseReceived", args)
status = args["response"]["status"]
url = args["response"]["url"]
_type = args["response"]["headers"]["Content-Type"]
# TODO better. detect when response data is ready
# fix: No data found for resource with given identifier
await asyncio.sleep(1)
args = {
"requestId": args["requestId"],
}
body = await target.execute_cdp_cmd("Network.getResponseBody", args)
body = base64.b64decode(body["body"]) if body["base64Encoded"] else body["body"]
print("responseReceived", status, url, _type, repr(body[:20]) + "...")
async def responseReceivedExtraInfo(args):
print("responseReceivedExtraInfo", args)
async def targetCreated(args):
print("targetCreated", args)
async def targetInfoChanged(args):
#print("targetInfoChanged", args)
print("targetInfoChanged")
target = await driver.current_target
#print("target.id", target.id)
# enable Target events
args = {
"discover": True,
#"filter": ...
}
await target.execute_cdp_cmd("Target.setDiscoverTargets", args)
await target.add_cdp_listener("Target.targetCreated", targetCreated)
await target.add_cdp_listener("Target.targetInfoChanged", targetInfoChanged)
#print("driver.targets", await driver.targets)
# enable Network events
args = {
"maxTotalBufferSize": 1_000_000, # 1GB
"maxResourceBufferSize": 1_000_000,
"maxPostDataSize": 1_000_000
}
await target.execute_cdp_cmd("Network.enable", args)
await target.add_cdp_listener("Network.requestWillBeSent", requestWillBeSent)
#await target.add_cdp_listener("Network.requestWillBeSentExtraInfo", requestWillBeSentExtraInfo)
await target.add_cdp_listener("Network.responseReceived", responseReceived)
#await target.add_cdp_listener("Network.responseReceivedExtraInfo", responseReceivedExtraInfo)
#await asyncio.sleep(1)
url = "http://httpbin.org/get"
print("driver.get", url)
await driver.get(url)
await asyncio.sleep(3)
#print("driver.targets", await driver.targets)
"""
print("hit enter to close")
input()
"""
await driver.close()
asyncio.run(main()) example output
|
this error appears when fix: async def requestPaused(args):
# ...
body = base64.b64encode(body).decode("ascii")
_args = {
"requestId": args["requestId"],
"responseCode": args["responseStatusCode"],
# fix: Please unblock challenges.cloudflare.com to proceed.
"responseHeaders": args["responseHeaders"],
"body": body,
}
if args["responseStatusText"] != "":
# empty string throws "Invalid http status code or phrase"
_args["responsePhrase"] = args["responseStatusText"]
await target.execute_cdp_cmd("Fetch.fulfillRequest", _args)
im looking for a generic solution, based on streams
see also https://github.com/milahu/aiohttp_chromium/tree/main/test/stream-response feel free to copy/paste/modify these scripts to |
ah yep, thanks. Might be nice if you can keep it up long-term somewhere in your repo for reference
ah heck, well then
Uh nice that we've finally got it working! Great job! And also, are we sure that
I remember there being If some still don't get intercepted, maybe target-interception might be considerable, see https://github.com/kaliiiiiiiiii/Selenium-Driverless/blob/4b71a5ab59a193d41eab80ed8f68a66e8ad5c230/tests/target_interception.py . I'm however not sure how reliable it is and how bad the timing leaks are. |
Network.streamResourceContent and Network.dataReceived
im afraid no... i also would prefer a binary protocol, no base64, no json base64 is needed for Fetch.fulfillRequest
when i pass the body as
per CDP docs, the only non-JSON endpoint is
no idea, i dont need these targets in Fetch.requestPaused.py im calling target = await driver.current_target
# ...
await target.execute_cdp_cmd("Fetch.enable", args)
await target.add_cdp_listener("Fetch.requestPaused", requestPaused) but this also works with await driver.execute_cdp_cmd("Fetch.enable", args)
await driver.add_cdp_listener("Fetch.requestPaused", requestPaused) then |
Also, I'm just thinking about - if we can't stream the responses when intercepting the requests - there's technically a way to detect the timing (if the server responds in chuncks), right? And even if it would be possible, I suppose there could be a way to setup a server with sepecific chunk timing & size + detect that at JavaScript. See http://scatter.cowchimp.com/ for a poc on scattering the chunk timing |
aah, now i understand your question
so ideally, all targets should be intercepted practically, i would avoid this premature optimization maybe put this on a todo list / future work list / debug ideas list |
yeah ofc - as this will executes cdp on the same target. I'm not sure if//how |
bad news: this also fails with chromium 120 maybe this is a bug in |
mhh maybe try with bare CDP-socket. Wouldn't know why driverless could break this. Unless it's some chrome flag which gets applied by default |
not possible, because chromedriver does not support the Network.streamResourceContent command so there is no await session.execute(devtools.network.stream_resource_content(request_id))
# or
driver.execute("Network.streamResourceContent", {"requestId": request_id}) there is only network.take_response_body_for_interception_as_stream await session.execute(devtools.network.take_response_body_for_interception_as_stream(interception_id)) ... but that requires an see also Selenium 4: how add event listeners in CDP CDP is broken by design?i have the impression that this feature (reading and writing of streams) see also Fetch.fulfillRequest and (very) long body
yeah, totally "unfortunately" and totally "at the moment" no, i guess this is very deliberate sabotage, to prevent "abusing" chromium as a generic http client dynamic analysisso... i really tried to avoid this part (because i have zero experience here)
lets see what tomorrow will bring ; ) |
Yeah that might indeed be the case. As well due to security reasons such as streaming all stuff encrypted trough a proxy.
yeah, I guess so
well have funn hahe👀 gonna be a pain. Pretty sure Chrome has stuff against that implemented. |
not resolved yet lol |
well... the original issue is fixed by sending currently i dont have time to implement reading and writing of streams |
Hmm does https://bugs.chromium.org/p/chromium/issues/detail?id=1138839 still apply tho? Maybe using |
Maybe https://github.com/tomer8007/chromium-ipc-sniffer could be a consideration worth👀 |
i would be surprised if that works
... so the raw HTTP traffic is in shared memory the most promising method is running chromium in a debugger, either gdb or lldb but all these are workarounds effectively, this would allow inserting an http proxy
currently this has zero priority for me, i just dont need it |
will be fixed with https://github.com/kaliiiiiiiiii/Selenium-Driverless/blob/dev/src/selenium_driverless/scripts/network_interceptor.py I'll close this issue when it's released & the documentation is complete |
nothing new from google just another feature request |
im trying to capture all responses as described in readme#use-events
cloudflare says
chrome shows a warning in the address bar
fixed by adding
options.add_argument("--disable-web-security")
to don't enforce the same-origin policy
test_selenium_driverless.py
The text was updated successfully, but these errors were encountered: