network interception with `Fetch.enable` breaks cloudflare #123

milahu · 2023-11-30T12:09:53Z

im trying to capture all responses as described in readme#use-events

cloudflare says

Please unblock challenges.cloudflare.com to proceed.

chrome shows a warning in the address bar

your connection to this site is not secure

fixed by adding options.add_argument("--disable-web-security")
to don't enforce the same-origin policy

test_selenium_driverless.py

#!/usr/bin/env python3

import asyncio
import base64
import sys
import time
import traceback

from cdp_socket.exceptions import CDPError

from selenium_driverless import webdriver


async def on_request(params, global_conn):

    url = params["request"]["url"]
    _params = {"requestId": params['requestId']}
    if params.get('responseStatusCode') in [301, 302, 303, 307, 308]:
        # redirected request
        return await global_conn.execute_cdp_cmd("Fetch.continueResponse", _params)
    else:
        try:
            body = await global_conn.execute_cdp_cmd("Fetch.getResponseBody", _params, timeout=1)
        except CDPError as e:
            if e.code == -32000 and e.message == 'Can only get response body on requests captured after headers received.':
                print(params, "\n", file=sys.stderr)
                traceback.print_exc()
                await global_conn.execute_cdp_cmd("Fetch.continueResponse", _params)
            else:
                raise e
        else:
            start = time.monotonic()
            body_decoded = base64.b64decode(body['body'])

            # modify body here

            body_modified = base64.b64encode(body_decoded).decode("ascii")
            fulfill_params = {"responseCode": 200, "body": body_modified}
            fulfill_params.update(_params)
            _time = time.monotonic() - start
            if _time > 0.01:
                print(f"decoding took long: {_time} s")
            await global_conn.execute_cdp_cmd("Fetch.fulfillRequest", fulfill_params)
            print("Mocked response", url)


async def main():
    options = webdriver.ChromeOptions()
    options.add_argument("--window-size=500,900")
    # fix: please unblock challenges.cloudflare.com to proceed
    # Don't enforce the same-origin policy
    options.add_argument("--disable-web-security")
    async with webdriver.Chrome(options=options, max_ws_size=2 ** 30) as driver:
        driver.base_target.socket.on_closed.append(lambda code, reason: print(f"chrome exited"))
        global_conn = driver.base_target
        await driver.get("about:blank")
        await global_conn.execute_cdp_cmd("Fetch.enable", cmd_args={"patterns": [{"requestStage": "Response", "urlPattern":"*"}]})
        await global_conn.add_cdp_listener("Fetch.requestPaused", lambda data: on_request(data, global_conn))
        await driver.get(
            #'https://wikipedia.org',
            "https://nowsecure.nl/#relax", # test cloudflare
            timeout=60, wait_load=False)
        while True:
            #time.sleep(10) # no. cloudflare would hang
            await asyncio.sleep(10)


asyncio.run(main())

The text was updated successfully, but these errors were encountered:

kaliiiiiiiiii · 2023-11-30T13:31:09Z

I can confirm this. However, I suspect this to be a timing leak and cloudfare therefore sending a 403 back=> not really a way to fix.

@milahu or any other thoughts//ideas on that?

juhacz · 2023-11-30T14:20:02Z

The problem is that one of Cloudflare's engineers is watching this repository... :)

kaliiiiiiiiii · 2023-11-30T16:47:43Z

The problem is that one of Cloudflare's engineers is watching this repository... :)

@juhacz
Likely, yes.

Soo in case some @Cloudfare staff is reading this:

Why not hire me directly instead of needing someone to analyse & understand the code on here ? :)

juhacz · 2023-11-30T16:51:07Z

@kaliiiiiiiiii Because we need people like you more :) I Suggest creating a profile at https://www.buymeacoffee.com/ I think people will confirm my words :)

milahu · 2023-12-01T07:40:04Z

I suspect this to be a timing leak

you mean the python response handler is too slow?

or maybe the continueResponse/fulfillRequest logic has a bug
(note: continueResponse is experimental)

but yeah, it seems to be a new problem
with the error message
"Please unblock challenges.cloudflare.com to proceed."
i only find a tapatalk.com thread from 2023-10-30 with no solution

any other thoughts//ideas on that?

so far i used the "export HAR" function of chrome devtools network
but that is slower than capturing the live traffic

the exported HAR file does not include the bodies of binary responses
which is actually good for large binaries
i dont want to store a 1GB response body in RAM
but let chrome write it to the filesystem

chromium is open source, so it should be easy to find
how the "record network log" command works

an alternative would be a local http proxy
i guess Fetch.enable also works with a http proxy inside of chrome
and maybe that proxy is visible to cloudflare

in the long term, they will replace captchas with government ID logins
and to bypass that, we will need p2p scraping tools...

kaliiiiiiiiii · 2023-12-01T07:52:16Z

@kaliiiiiiiiii Because we need people like you more :) I Suggest creating a profile at https://www.buymeacoffee.com/ I think people will confirm my words :)

@juhacz
added:) https://github.com/kaliiiiiiiiii#support-me

milahu · 2023-12-01T08:08:42Z

chromium is open source, so it should be easy to find
how the "record network log" command works

chromium devtools sources

chromium/src/third_party/devtools-frontend/src/front_end/panels/network/network-meta.ts

UIStrings.recordNetworkLog

UI.ActionRegistration.registerActionExtension({
  actionId: 'network.toggle-recording',
  category: UI.ActionRegistration.ActionCategory.NETWORK,
  iconClass: UI.ActionRegistration.IconClass.START_RECORDING,
  toggleable: true,
  toggledIconClass: UI.ActionRegistration.IconClass.STOP_RECORDING,
  toggleWithRedColor: true,
  contextTypes() {
    return maybeRetrieveContextTypes(Network => [Network.NetworkPanel.NetworkPanel]);
  },
  async loadActionDelegate() {
    const Network = await loadNetworkModule();
    return new Network.NetworkPanel.ActionDelegate();
  },
  options: [
    {
      value: true,
      title: i18nLazyString(UIStrings.recordNetworkLog),
    },
    {
      value: false,
      title: i18nLazyString(UIStrings.stopRecordingNetworkLog),
    },
  ],

chromium/src/third_party/devtools-frontend/src/front_end/panels/network/NetworkPanel.ts

network.toggle-recording

export class ActionDelegate implements UI.ActionRegistration.ActionDelegate {
  handleAction(context: UI.Context.Context, actionId: string): boolean {
    const panel = context.flavor(NetworkPanel);
    if (panel === null) {
      return false;
    }
    switch (actionId) {
      case 'network.toggle-recording': {
        panel.toggleRecord(!panel.recordLogSetting.get());
        return true;
      }

panel.toggleRecord

  toggleRecord(toggled: boolean): void {
    this.toggleRecordAction.setToggled(toggled);
    if (this.recordLogSetting.get() !== toggled) {
      this.recordLogSetting.set(toggled);
    }

    this.networkLogView.setRecording(toggled);
    if (!toggled && this.filmStripRecorder) {
      this.filmStripRecorder.stopRecording(this.filmStripAvailable.bind(this));
    }
  }

this.filmStripRecorder

  private willReloadPage(): void {
    if (this.pendingStopTimer) {
      clearTimeout(this.pendingStopTimer);
      delete this.pendingStopTimer;
    }
    if (this.isShowing() && this.filmStripRecorder) {
      this.filmStripRecorder.startRecording();
    }
  }

this.filmStripRecorder

      this.filmStripRecorder = new FilmStripRecorder(this.networkLogView.timeCalculator(), this.filmStripView);

FilmStripRecorder

export class FilmStripRecorder implements TraceEngine.TracingManager.TracingManagerClient {
  // ...
  startRecording(): void {
    // ...
    const tracingManager =
        SDK.TargetManager.TargetManager.instance().scopeTarget()?.model(TraceEngine.TracingManager.TracingManager);
    // ...
    this.tracingManager = tracingManager;
    this.resourceTreeModel = this.tracingManager.target().model(SDK.ResourceTreeModel.ResourceTreeModel);
    this.tracingModel = new TraceEngine.Legacy.TracingModel();
    void this.tracingManager.start(this, '-*,disabled-by-default-devtools.screenshot', '');
    // ...
  }
  // ...
  stopRecording(callback: (filmStrip: TraceEngine.Extras.FilmStrip.Data) => void): void {
    // ...
    this.tracingManager.stop();
    // ...
  }
}

→ FilmStripRecorder implements TraceEngine.TracingManager.TracingManagerClient

SDK.TargetManager.TargetManager.instance

import * as SDK from '../../core/sdk/sdk.js';

chromium/src/third_party/devtools-frontend/src/front_end/core/sdk/sdk.ts

import * as TargetManager from './TargetManager.js';

chromium/src/third_party/devtools-frontend/src/front_end/core/sdk/TargetManager.ts

TraceEngine.TracingManager.TracingManager

chromium/src/third_party/devtools-frontend/src/front_end/models/trace/TracingManager.ts

export class TracingManager extends SDK.SDKModel.SDKModel<void> {
  readonly #tracingAgent: ProtocolProxyApi.TracingApi;
  // ...
  async start(client: TracingManagerClient, categoryFilter: string, options: string):
      Promise<Protocol.ProtocolResponseWithError> {
    // ...
    const args = {
      bufferUsageReportingInterval: bufferUsageReportingIntervalMs,
      categories: categoryFilter,
      options: options,
      transferMode: Protocol.Tracing.StartRequestTransferMode.ReportEvents,
    };
    const response = await this.#tracingAgent.invoke_start(args);
    // ...
  }

chromium/src/third_party/devtools-frontend/src/front_end/generated/protocol-proxy-api.d.ts

/**
 * API generated from Protocol commands and events.
 */
declare namespace ProtocolProxyApi {
  // ...
  export interface TracingApi {
    // ...
    invoke_start(params: Protocol.Tracing.StartRequest): Promise<Protocol.ProtocolResponseWithError>;

bufferUsageReportingInterval

chromium sources

chromium/src/out/Debug/gen/content/browser/devtools/protocol/tracing.cc

bufferUsageReportingInterval

struct startParams : public crdtp::DeserializableProtocolObject<startParams> {
    Maybe<String> categories;
    Maybe<String> options;
    Maybe<double> bufferUsageReportingInterval;
    Maybe<String> transferMode;
    Maybe<String> streamFormat;
    Maybe<String> streamCompression;
    Maybe<protocol::Tracing::TraceConfig> traceConfig;
    Maybe<Binary> perfettoConfig;
    Maybe<String> tracingBackend;
    DECLARE_DESERIALIZATION_SUPPORT();
};

startParams

void DomainDispatcherImpl::start(const crdtp::Dispatchable& dispatchable)
{
    // Prepare input parameters.
    auto deserializer = crdtp::DeferredMessage::FromSpan(dispatchable.Params())->MakeDeserializer();
    startParams params;
    if (!startParams::Deserialize(&deserializer, &params)) {
      ReportInvalidParams(dispatchable, deserializer);
      return;
    }

    m_backend->Start(std::move(params.categories), std::move(params.options), std::move(params.bufferUsageReportingInterval), std::move(params.transferMode), std::move(params.streamFormat), std::move(params.streamCompression), std::move(params.traceConfig), std::move(params.perfettoConfig), std::move(params.tracingBackend), std::make_unique<StartCallbackImpl>(weakPtr(), dispatchable.CallId(), dispatchable.Serialized()));
}

or simply: Tracing.start

kaliiiiiiiiii · 2023-12-01T08:09:04Z

@milahu

you mean the python response handler is too slow?

yep or maybe even the interception at C++ Chromium is to slow over a single websocket.

Long-term workaround here would be ausing smth like selenium-wire, this however requires some development, to fix th SSL pinning.

or maybe the continueResponse/fulfillRequest logic has a bug (note: continueResponse is experimental)

Yep there for sure are some bugs. What I as well could think of is that maybe some iframes don't get intercepted correctly, and therefore have a detectable difference to the main frame.

so far i used the "export HAR" function of chrome devtools network but that is slower than capturing the live traffic

Yep that works as well of course, however more a workaround:)

an alternative would be a local http proxy i guess Fetch.enable also works with a http proxy inside of chrome and maybe that proxy is visible to cloudflare

See 1.
I assumed chrome intercepts directly between frames | boringssl and doesn't tunnel it through a proxy after boringssl.
Maybe we can find some source-code on that?

another thing to try is

Network.setRequestInterception (deprecaded tho).

Soo feel free to share a POC & status if you try that

kaliiiiiiiiii · 2023-12-01T08:11:35Z

Yep there for sure are some bugs. What I as well could think of is that maybe some iframes don't get intercepted correctly, and therefore have a detectable difference to the main frame.

That would then explain why disabling site isolation works

milahu · 2023-12-01T08:14:49Z

interception

for my use case, i dont need any active interception of requests/responses
i just need a passive live-stream of http traffic

so i will use Tracing.start

edit: no. the Tracing.dataCollected events are only sent after Tracing.end
and the Tracing.dataCollected events dont contain http traffic 0__o

i still dont understand how devtools network log gets the live network traffic
the network log uses Tracing.start only to get the trace categories
"-*,disabled-by-default-devtools.screenshot"

milahu · 2023-12-04T08:51:21Z

an alternative would be a local http proxy

selenium-wire uses a patched version of mitmproxy as http proxy

this also allows for active network interception
without chromium --disable-web-security
because we can tell chromium to trust the proxy's certificate

kaliiiiiiiiii · 2023-12-04T09:34:55Z

an alternative would be a local http proxy

selenium-wire uses a patched version of mitmproxy as http proxy

this also allows for active network interception without chromium --disable-web-security because we can tell chromium to trust the proxy's certificate

still pretty sure the SSL/TLS fingerprint doesn't match to chrome as it doesn't use boringssl tho. see wkeeling/selenium-wire#215 (comment)

kaliiiiiiiiii · 2023-12-16T12:11:10Z

Interesting note here that:

from cdp_socket.utils.utils import launch_chrome, random_port
from cdp_socket.socket import CDPSocket
import os
import asyncio

global sock1


async def on_resumed(params):
    global sock1
    await sock1.exec("Fetch.continueRequest", {"requestId": params['requestId']})
    print(params["request"]["url"])


async def main():
    global sock1
    PORT = random_port()
    process = launch_chrome(PORT)

    async with CDPSocket(PORT) as base_socket:
        targets = await base_socket.targets
        target = targets[0]
        sock1 = await base_socket.get_socket(target)
        await sock1.exec("Network.clearBrowserCookies")
        await sock1.exec("Fetch.enable")
        sock1.add_listener("Fetch.requestPaused", on_resumed)
        await sock1.exec("Page.navigate", {"url": "https://nowsecure.nl#relax"})
        await asyncio.sleep(5)

    os.kill(process.pid, 15)


asyncio.run(main())

works just fine

milahu · 2023-12-18T15:11:03Z

works just fine

this works for requests, but not for responses
because Fetch.getResponseBody always throws CDPError -32000

test.py

#!/usr/bin/env python3

# https://github.com/kaliiiiiiiiii/Selenium-Driverless/issues/123#issuecomment-1858803756

from cdp_socket.utils.utils import launch_chrome, random_port
from cdp_socket.socket import CDPSocket
from cdp_socket.exceptions import CDPError

import os
import asyncio
import json
import base64
import sys
import time
import traceback

global sock1


async def on_request_paused(params):
    global sock1

    url = params["request"]["url"]
    url_clean = url.split("?")[0]
    if len(url_clean) > 60:
        url_clean = url_clean[:60] + "..."
    _params = {"requestId": params['requestId']}
    #if params.get('responseStatusCode') in [301, 302, 303, 307, 308]:
    #    # redirected request
    #    return await sock1.exec("Fetch.continueResponse", _params)
    try:
        #print("Fetch.getResponseBody ...", url_clean)
        body = await sock1.exec("Fetch.getResponseBody", _params, timeout=30)
    except CDPError as e:
        #print("Fetch.getResponseBody CDPError", url_clean)
        if e.code == -32000:
            # Can only get response body on HeadersReceived pattern matched requests.
            print("Fetch.getResponseBody CDPError -32000 -> Fetch.continueResponse", url_clean)
            #print("Fetch.continueResponse ...", url_clean)
            res = await sock1.exec("Fetch.continueResponse", _params, timeout=30)
            #print("Fetch.continueResponse done", url_clean)
            return res
        else:
            print("Fetch.getResponseBody CDPError raise", url_clean)
            raise e
    else:
        print("Fetch.getResponseBody done", url_clean)
        start = time.monotonic()
        body_decoded = base64.b64decode(body['body'])
        # modify body here
        body_modified = base64.b64encode(body_decoded).decode("ascii")
        fulfill_params = {"responseCode": 200, "body": body_modified}
        fulfill_params.update(_params)
        _time = time.monotonic() - start
        if _time > 0.01:
            print(f"decoding took long: {_time} s")
        print("Fetch.fulfillRequest ...")
        res = await sock1.exec("Fetch.fulfillRequest", fulfill_params, timeout=30)
        print("Fetch.fulfillRequest done", url_clean)
        print("Mocked response", url_clean)
        return res


async def main():
    global sock1
    PORT = random_port()
    process = launch_chrome(PORT)

    async with CDPSocket(PORT) as base_socket:
        targets = await base_socket.targets
        target = targets[0]
        sock1 = await base_socket.get_socket(target)
        await sock1.exec("Network.clearBrowserCookies")
        await sock1.exec("Fetch.enable")
        sock1.add_listener("Fetch.requestPaused", on_request_paused)
        # timeout: fix: asyncio.exceptions.TimeoutError
        await sock1.exec("Page.navigate", {"url": "https://nowsecure.nl#relax"}, timeout=30)
        print("waiting after Page.navigate")
        await asyncio.sleep(5)

    os.kill(process.pid, 30)


asyncio.run(main())

Fetch.getResponseBody CDPError -32000 -> Fetch.continueResponse https://nowsecure.nl/
waiting after Page.navigate
Fetch.getResponseBody CDPError -32000 -> Fetch.continueResponse https://nowsecure.nl/cdn-cgi/styles/challenges.css
Fetch.getResponseBody CDPError -32000 -> Fetch.continueResponse https://nowsecure.nl/cdn-cgi/challenge-platform/h/g/orchestr...
Fetch.getResponseBody CDPError -32000 -> Fetch.continueResponse https://challenges.cloudflare.com/turnstile/v0/g/74bd6362/ap...
Fetch.getResponseBody CDPError -32000 -> Fetch.continueResponse https://nowsecure.nl/favicon.ico
Fetch.getResponseBody CDPError -32000 -> Fetch.continueResponse https://nowsecure.nl/cdn-cgi/challenge-platform/h/g/flow/ov1...
Fetch.getResponseBody CDPError -32000 -> Fetch.continueResponse https://nowsecure.nl/favicon.ico
Fetch.getResponseBody CDPError -32000 -> Fetch.continueResponse https://challenges.cloudflare.com/cdn-cgi/challenge-platform...

similar...
https://github.com/cloud-browser/scrapy-cloud-browser/blob/main/scrapy_cloud_browser/scenarist/page.py

milahu · 2023-12-20T12:21:15Z

chrome://net-export/ could be useful for passive capturing of traffic

Click the button to start logging future network activity to a file on disk. The log includes details of network activity from all of Chrome, including incognito and non-incognito tabs, visited URLs, and information about the network configuration

via chrome://net-internals/

kaliiiiiiiiii · 2023-12-31T17:27:01Z

Looks like Network.setRequestInterception has the same issues. WOnder tho why it's flaged as "Insecure", eventho the request is over HTTPS

import asyncio
import base64
import sys
import time
import traceback

from cdp_socket.exceptions import CDPError

from selenium_driverless import webdriver


async def on_request(params, global_conn):
    url = params["request"]["url"]
    _params = {"interceptionId": params['interceptionId']}
    if params.get('responseStatusCode') in [301, 302, 303, 307, 308]:
        # redirected request
        return await global_conn.execute_cdp_cmd("Network.continueInterceptedRequest", _params)
    else:
        try:
            body = await global_conn.execute_cdp_cmd("Network.getResponseBodyForInterception", _params, timeout=1)
        except CDPError as e:
            if e.code == -32000 and e.message == 'Can only get response body on requests captured after headers received.':
                print(params, "\n", file=sys.stderr)
                traceback.print_exc()
                await global_conn.execute_cdp_cmd("Fetch.continueResponse", _params)
            else:
                raise e
        else:
            start = time.monotonic()
            body_encoded = base64.b64decode(body['body'])

            # modify body here

            body_modified = base64.b64encode(body_encoded).decode()
            fulfill_params = {"rawResponse": body_modified}
            fulfill_params.update(_params)
            _time = time.monotonic() - start
            if _time > 0.01:
                print(f"decoding took long: {_time} s")
            await global_conn.execute_cdp_cmd("Network.continueInterceptedRequest", fulfill_params)
            print("Mocked response", url)


async def main():
    options = webdriver.ChromeOptions()
    async with webdriver.Chrome(options=options, max_ws_size=2 ** 30) as driver:
        driver.base_target.socket.on_closed.append(lambda code, reason: print(f"chrome exited"))
        global_conn = driver.current_target
        await driver.get("about:blank")
        await global_conn.execute_cdp_cmd("Network.enable", {"maxTotalBufferSize": 1_000_000,  # 1GB
                                                             "maxResourceBufferSize":1_000_000,
                                                             "maxPostDataSize":1_000_000
                                                             })
        await global_conn.execute_cdp_cmd("Network.setRequestInterception", {"patterns":[{"urlPattern":"*", "interceptionStage":"HeadersReceived"}]})
        await global_conn.add_cdp_listener("Network.requestIntercepted", lambda data: on_request(data, global_conn))
        await driver.get(
            'https://nowsecure.nl',
            timeout=60, wait_load=False)
        while True:
            await asyncio.sleep(10)


asyncio.run(main())

milahu · 2023-12-31T17:52:22Z

wonder tho why it's flaged as "Insecure", eventho the request is over HTTPS

i guess it uses a local https proxy with a self-signed certificate
without adding that certificate as "trusted cert" to ~/.pki/nssdb/

but still, this fails to bypass cloudflare

Please unblock challenges.cloudflare.com to proceed.

kaliiiiiiiiii · 2023-12-31T17:59:25Z

Also interesting here, that local overrides with the chrome devtools just work fine:

i guess it uses a local https proxy with a self-signed certificate
without adding that certificate as "trusted cert" to ~/.pki/nssdb/

ahh yep, that makes sense

but still, this fails to bypass cloudflare

maybe there's a way to detect self-signed certificate usage? If no, it's probably timing or SSL//TLS fingerprinting I guess

I see 2 possible aproaches here:

check if we can access that over a chrome extensions (check if existing ones work) @milahu feel free to lmk if you find a workimg one. Getting the source-code & analysing shouldn't be that hard.
What if we, instead of mofifying the body binary, point the url to a local webserver?

milahu · 2023-12-31T18:09:05Z

for now i gave up on intercepting requests...
chrome seems to make it really hard, also to provide security against MITM attacks

probably i would try the frida route
as i described in wkeeling/selenium-wire#656 (comment)

kaliiiiiiiiii · 2023-12-31T18:36:39Z

for now i gave up on intercepting requests... chrome seems to make it really hard, also to provide security against MITM attacks

probably i would try the frida route as i described in wkeeling/selenium-wire#656 (comment)

Well yeah, eventho I assume that the memory manipulation//ddl hooking solutions are specific to:

chrome versions
OS
and therefore hard to maintain long-term:/

kaliiiiiiiiii · 2023-12-31T18:39:24Z

At

chrome://net-export/ could be useful for passive capturing of traffic

Click the button to start logging future network activity to a file on disk. The log includes details of network activity from all of Chrome, including incognito and non-incognito tabs, visited URLs, and information about the network configuration

via chrome://net-internals/

Uhh I think passive capturing works as well with Fetch.enable or Network.setRequestInterception as long you don't modify the body btw

kaliiiiiiiiii · 2023-12-31T18:50:56Z

Even changing request headers works just fine

import asyncio
import base64
import sys
import time
import traceback

from cdp_socket.exceptions import CDPError

from selenium_driverless import webdriver


async def on_request(params, global_conn):
    url = params["request"]["url"]
    _params = {"interceptionId": params['interceptionId']}
    if params.get('responseStatusCode') in [301, 302, 303, 307, 308]:
        # redirected request
        return await global_conn.execute_cdp_cmd("Network.continueInterceptedRequest", _params)
    else:

        fulfill_params = {"headers":params["request"]["headers"]}
        fulfill_params["headers"]["test"] = "Hello World!"
        fulfill_params.update(_params)
        await global_conn.execute_cdp_cmd("Network.continueInterceptedRequest", fulfill_params)
        print(url)


async def main():
    options = webdriver.ChromeOptions()
    async with webdriver.Chrome(options=options, max_ws_size=2 ** 30) as driver:
        driver.base_target.socket.on_closed.append(lambda code, reason: print(f"chrome exited"))
        global_conn = driver.current_target
        await driver.get("about:blank")
        await global_conn.execute_cdp_cmd("Network.enable", {"maxTotalBufferSize": 1_000_000,  # 1GB
                                                             "maxResourceBufferSize": 1_000_000,
                                                             "maxPostDataSize": 1_000_000
                                                             })
        await global_conn.execute_cdp_cmd("Network.setRequestInterception",
                                          {"patterns": [{"urlPattern": "*",
                                                         # "interceptionStage": "HeadersReceived"
                                                         }]})
        await global_conn.add_cdp_listener("Network.requestIntercepted", lambda data: on_request(data, global_conn))
        await driver.get(
            'https://nowsecure.nl',
            timeout=60, wait_load=False)
        while True:
            await asyncio.sleep(10)


asyncio.run(main())

milahu · 2023-12-31T19:20:11Z

print(url)

and where is the response body?

milahu · 2024-01-13T09:27:55Z

and where is the response body?

Network.getResponseBody

#!/usr/bin/env python3

import asyncio
from selenium_driverless import webdriver
from selenium_driverless.types.by import By
import base64

async def main():

    driver = await webdriver.Chrome()
    #await asyncio.sleep(1)

    target = None

    async def requestWillBeSent(args):
        #print("requestWillBeSent", args)
        print("requestWillBeSent", args["request"]["url"])

    async def requestWillBeSentExtraInfo(args):
        print("requestWillBeSentExtraInfo", args)

    async def responseReceived(args):
        # TODO better. get target of this response
        nonlocal target
        #print("responseReceived", args)
        status = args["response"]["status"]
        url = args["response"]["url"]
        _type = args["response"]["headers"]["Content-Type"]

        # TODO better. detect when response data is ready
        # fix: No data found for resource with given identifier
        await asyncio.sleep(1)

        args = {
            "requestId": args["requestId"],
        }
        body = await target.execute_cdp_cmd("Network.getResponseBody", args)
        body = base64.b64decode(body["body"]) if body["base64Encoded"] else body["body"]

        print("responseReceived", status, url, _type, repr(body[:20]) + "...")

    async def responseReceivedExtraInfo(args):
        print("responseReceivedExtraInfo", args)

    async def targetCreated(args):
        print("targetCreated", args)

    async def targetInfoChanged(args):
        #print("targetInfoChanged", args)
        print("targetInfoChanged")

    target = await driver.current_target
    #print("target.id", target.id)

    # enable Target events
    args = {
        "discover": True,
        #"filter": ...
    }
    await target.execute_cdp_cmd("Target.setDiscoverTargets", args)

    await target.add_cdp_listener("Target.targetCreated", targetCreated)
    await target.add_cdp_listener("Target.targetInfoChanged", targetInfoChanged)

    #print("driver.targets", await driver.targets)

    # enable Network events
    args = {
        "maxTotalBufferSize": 1_000_000,  # 1GB
        "maxResourceBufferSize": 1_000_000,
        "maxPostDataSize": 1_000_000
    }
    await target.execute_cdp_cmd("Network.enable", args)

    await target.add_cdp_listener("Network.requestWillBeSent", requestWillBeSent)
    #await target.add_cdp_listener("Network.requestWillBeSentExtraInfo", requestWillBeSentExtraInfo)
    await target.add_cdp_listener("Network.responseReceived", responseReceived)
    #await target.add_cdp_listener("Network.responseReceivedExtraInfo", responseReceivedExtraInfo)



    #await asyncio.sleep(1)

    url = "http://httpbin.org/get"
    print("driver.get", url)
    await driver.get(url)
    await asyncio.sleep(3)

    #print("driver.targets", await driver.targets)

    """
    print("hit enter to close")
    input()
    """

    await driver.close()

asyncio.run(main())

example output

driver.get http://httpbin.org/get
requestWillBeSent http://httpbin.org/get
targetInfoChanged
requestWillBeSent http://httpbin.org/favicon.ico
responseReceived 200 http://httpbin.org/get application/json '{\n  "args": {}, \n  "'...
responseReceived 404 http://httpbin.org/favicon.ico text/html '<!DOCTYPE HTML PUBLI'...

milahu · 2024-01-24T12:23:42Z

Please unblock challenges.cloudflare.com to proceed.

this error appears when Fetch.fulfillRequest has no response headers

fix:

    async def requestPaused(args):
        # ...
        body = base64.b64encode(body).decode("ascii")
        _args = {
            "requestId": args["requestId"],
            "responseCode": args["responseStatusCode"],
            # fix: Please unblock challenges.cloudflare.com to proceed.
            "responseHeaders": args["responseHeaders"],
            "body": body,
        }
        if args["responseStatusText"] != "":
            # empty string throws "Invalid http status code or phrase"
            _args["responsePhrase"] = args["responseStatusText"]
        await target.execute_cdp_cmd("Fetch.fulfillRequest", _args)

passive capturing works as well with Fetch.enable or Network.setRequestInterception as long you don't modify the body

im looking for a generic solution, based on streams
so i can handle infinite-size responses without storing the whole response in RAM
and so i can handle streams of events with low latency

working: Network.enable and Fetch.requestPaused and Fetch.takeResponseBodyAsStream and Fetch.fulfillRequest - this is not perfect, because i cannot send a stream back to chromium, there is no Fetch.giveResponseBodyAsStream and IO.write. but i can abort the response with Fetch.failRequest, which works for file downloads, but not for content that is also needed in the html page. this allows active interception of responses
broken: Network.enable and Network.streamResourceContent and Network.dataReceived - this is broken in chromium 117, because data is always empty. this would be nice, because it allows passive capturing of infinite streams, and the original response stream is sent to the browser. todo: maybe this works in chromium 119
deprecated: Network.setRequestInterception and Network.requestIntercepted
deprecated: Network.takeResponseBodyForInterceptionAsStream is as good as deprecated, because Network.requestIntercepted is deprecated, which is needed to get interceptionId

see also https://github.com/milahu/aiohttp_chromium/tree/main/test/stream-response

feel free to copy/paste/modify these scripts to Selenium-Driverless/examples/

kaliiiiiiiiii · 2024-01-24T13:13:17Z

see also https://github.com/milahu/aiohttp_chromium/tree/main/test/stream-response

feel free to copy/paste/modify these scripts to Selenium-Driverless/examples/

ah yep, thanks. Might be nice if you can keep it up long-term somewhere in your repo for reference

broken: Network.enable and Network.streamResourceContent and Network.dataReceived - this is broken in chromium 117, because data is always empty.

ah heck, well then Network usage should probably be avoided as it's deprecated and more stuff might break in future chrome versions

Please unblock challenges.cloudflare.com to proceed.

this error appears when Fetch.fulfillRequest has no response headers

    async def requestPaused(args):
        # ...
        body = base64.b64encode(body).decode("ascii")
        _args = {
            "requestId": args["requestId"],
            "responseCode": args["responseStatusCode"],
            # fix: Please unblock challenges.cloudflare.com to proceed.
            "responseHeaders": args["responseHeaders"],
            "body": body,
        }
        if args["responseStatusText"] != "":
            # empty string throws "Invalid http status code or phrase"
           _args["responsePhrase"] = > args["responseStatusText"]
        await target.execute_cdp_cmd("Fetch.fulfillRequest", _args)

Uh nice that we've finally got it working! Great job!
Wonder, is there any way to optimize base64.b64encode(body).decode("ascii") even more btw?

And also, are we sure that Fetch.enable intercepts as well:

WebWorkers & service-workers
cross//OOPIF iframes?
background scripts in extensions.

I remember there being Network.setBypassServiceWorker, however no idea if it affects Fetch.enable as well.

If some still don't get intercepted, maybe target-interception might be considerable, see https://github.com/kaliiiiiiiiii/Selenium-Driverless/blob/4b71a5ab59a193d41eab80ed8f68a66e8ad5c230/tests/target_interception.py . I'm however not sure how reliable it is and how bad the timing leaks are.

milahu · 2024-01-24T13:40:49Z

then Network usage should probably be avoided as it's deprecated and more stuff might break in future chrome versions

Network.streamResourceContent and Network.dataReceived
are not deprecated, but experimental
so i expect them to work in newer versions

is there any way to optimize base64.b64encode(body).decode("ascii")

im afraid no... i also would prefer a binary protocol, no base64, no json

base64 is needed for Fetch.fulfillRequest

body: string: A response body. If absent, original response body will be used if the request is intercepted at the response stage and empty body will be used if the request is intercepted at the request stage. (Encoded as a base64 string when passed over JSON)

when i pass the body as bytes i get

TypeError: Object of type bytes is not JSON serializable

per CDP docs, the only non-JSON endpoint is

WebSocket /devtools/page/{targetId}
The WebSocket endpoint for the protocol.

are we sure that Fetch.enable intercepts as well

no idea, i dont need these targets

in Fetch.requestPaused.py im calling

    target = await driver.current_target
    # ...
    await target.execute_cdp_cmd("Fetch.enable", args)
    await target.add_cdp_listener("Fetch.requestPaused", requestPaused)

but this also works with

    await driver.execute_cdp_cmd("Fetch.enable", args)
    await driver.add_cdp_listener("Fetch.requestPaused", requestPaused)

then requestPaused should be called for all targets

kaliiiiiiiiii · 2024-01-24T13:42:06Z

Also, I'm just thinking about - if we can't stream the responses when intercepting the requests - there's technically a way to detect the timing (if the server responds in chuncks), right?

And even if it would be possible, I suppose there could be a way to setup a server with sepecific chunk timing & size + detect that at JavaScript.

See http://scatter.cowchimp.com/ for a poc on scattering the chunk timing

milahu · 2024-01-24T14:03:41Z

aah, now i understand your question

are we sure that Fetch.enable intercepts as well

so ideally, all targets should be intercepted
to add the same latency to all requests

practically, i would avoid this premature optimization
because different latencies can have legitimate reasons
like different cpu loads on different cpu cores

maybe put this on a todo list / future work list / debug ideas list
in case cloudflare blocking becomes more aggressive

kaliiiiiiiiii · 2024-01-24T16:28:22Z

target = await driver.current_target
# ...
await target.execute_cdp_cmd("Fetch.

yeah ofc - as this will executes cdp on the same target.

I'm not sure if//how driver.base_target behaves tbh. I could imagine, that service-worker requests are only covered by base_target. At least for target interception, this is the case.

milahu · 2024-01-24T20:31:47Z

Network.streamResourceContent and Network.dataReceived
are not deprecated, but experimental
so i expect them to work in newer versions

bad news: this also fails with chromium 120

maybe this is a bug in selenium_driverless?
tomorrow i will port Network.dataReceived.py to selenium
i would be surprised if this is a chromium bug

kaliiiiiiiiii · 2024-01-24T22:05:46Z

Network.streamResourceContent and Network.dataReceived
are not deprecated, but experimental
so i expect them to work in newer versions

bad news: this also fails with chromium 120

maybe this is a bug in selenium_driverless? tomorrow i will port Network.dataReceived.py to selenium i would be surprised if this is a chromium bug

mhh maybe try with bare CDP-socket. Wouldn't know why driverless could break this. Unless it's some chrome flag which gets applied by default

milahu · 2024-01-26T18:53:58Z

tomorrow i will port Network.dataReceived.py to selenium

not possible, because chromedriver does not support the Network.streamResourceContent command

so there is no

await session.execute(devtools.network.stream_resource_content(request_id))
# or
driver.execute("Network.streamResourceContent", {"requestId": request_id})

there is only network.take_response_body_for_interception_as_stream

await session.execute(devtools.network.take_response_body_for_interception_as_stream(interception_id))

... but that requires an interception_id
and there is still no IO.write so i cannot send the stream to chromium

see also Selenium 4: how add event listeners in CDP

CDP is broken by design?

i have the impression that this feature (reading and writing of streams)
is deliberately not implemented by CDP

see also Fetch.fulfillRequest and (very) long body

Unfortunately, there's no streaming support for Fetch network interception at the moment

yeah, totally "unfortunately" and totally "at the moment"

no, i guess this is very deliberate sabotage, to prevent "abusing" chromium as a generic http client
which is pretty much what we are trying to do here...

dynamic analysis

so... i really tried to avoid this part (because i have zero experience here)
but i will have to use frida to insert hooks into the chromium binary

for now i gave up on intercepting requests... chrome seems to make it really hard, also to provide security against MITM attacks

probably i would try the frida route as i described in wkeeling/selenium-wire#656 (comment)

lets see what tomorrow will bring ; )

kaliiiiiiiiii · 2024-01-26T19:45:41Z

tomorrow i will port Network.dataReceived.py to selenium

not possible, because chromedriver does not support the Network.streamResourceContent command

so there is no
await session.execute(devtools.network.stream_resource_content(request_id))
# or
driver.execute("Network.streamResourceContent", {"requestId": request_id})
there is only network.take_response_body_for_interception_as_stream
await session.execute(devtools.network.take_response_body_for_interception_as_stream(interception_id))
... but that requires an interception_id and there is still no IO.write so i cannot send the stream to chromium

see also Selenium 4: how add event listeners in CDP

CDP is broken by design?

i have the impression that this feature (reading and writing of streams) is deliberately not implemented by CDP

Yeah that might indeed be the case. As well due to security reasons such as streaming all stuff encrypted trough a proxy.

see also Fetch.fulfillRequest and (very) long body

Unfortunately, there's no streaming support for Fetch network interception at the moment

yeah, totally "unfortunately" and totally "at the moment"

no, i guess this is very deliberate sabotage, to prevent "abusing" chromium as a generic http client which is pretty much what we are trying to do here...

yeah, I guess so

dynamic analysis

so... i really tried to avoid this part (because i have zero experience here) but i will have to use frida to insert hooks into the chromium binary

for now i gave up on intercepting requests... chrome seems to make it really hard, also to provide security against MITM attacks
probably i would try the frida route as i described in wkeeling/selenium-wire#656 (comment)

lets see what tomorrow will bring ; )

well have funn hahe👀 gonna be a pain. Pretty sure Chrome has stuff against that implemented.

kaliiiiiiiiii · 2024-01-29T19:43:31Z

not resolved yet lol

milahu · 2024-01-29T20:30:05Z

well... the original issue is fixed by sending responseHeaders

currently i dont have time to implement reading and writing of streams
also i guess this is out-of-scope for selenium_driverless
because this is not possible with CDP

kaliiiiiiiiii · 2024-02-01T12:49:09Z

well... the original issue is fixed by sending responseHeaders

currently i dont have time to implement reading and writing of streams also i guess this is out-of-scope for selenium_driverless because this is not possible with CDP

Hmm does https://bugs.chromium.org/p/chromium/issues/detail?id=1138839 still apply tho?
Also, I'm not that sure if all headers have the correct order tbh

Maybe using binaryResponseHeaders for continuing the request would be more safe?

kaliiiiiiiiii · 2024-02-01T17:00:32Z

Probably happens somewhere at https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/core/inspector/inspector_emulation_agent.cc;l=514;drc=d3d4ff28768842dd1ce94f408f89d1e2d31dd4fd

kaliiiiiiiiii · 2024-02-29T18:23:48Z

@milahu

probably i would try the frida route

Maybe https://github.com/tomer8007/chromium-ipc-sniffer could be a consideration worth👀
screenshot below id 4 years old, some stuff might have changed ofc.

milahu · 2024-02-29T20:37:59Z

i would be surprised if that works
the raw HTTP traffic is hidden for better security

However, this project won't see anything that doesn't go over pipes, which is mostly shared memory IPC:

Mojo data pipe contents (raw networking buffers, audio, etc.)

... so the raw HTTP traffic is in shared memory

the most promising method is running chromium in a debugger, either gdb or lldb
but i have to disable sandboxing to set breakpoints on BIO_read and BIO_write
radare is too slow, frida fails to hook the functions
gdb works, but parsing its output is slow, and gdb in python is kinda broken
lldb would be better for interfacing with python (or native code), but its kinda broken...
see also chromium-capture-http

but all these are workarounds
and a proper fix would be to implement full http stream support
to fix either Fetch.requestPaused.py or Network.dataReceived.py

effectively, this would allow inserting an http proxy
with full control over request and response streams

its surprising that such a basic feature is missing

there is Fetch.takeResponseBodyAsStream and IO.read
but not Fetch.giveResponseBodyAsStream and IO.write

there is Network.takeResponseBodyForInterceptionAsStream and IO.read
but not Network.giveResponseBodyForInterceptionAsStream and IO.write

currently this has zero priority for me, i just dont need it

kaliiiiiiiiii · 2024-04-16T10:04:15Z

will be fixed with https://github.com/kaliiiiiiiiii/Selenium-Driverless/blob/dev/src/selenium_driverless/scripts/network_interceptor.py

I'll close this issue when it's released & the documentation is complete

kaliiiiiiiiii · 2024-04-17T11:33:23Z

resolved with https://kaliiiiiiiiii.github.io/Selenium-Driverless/api/RequestInterception/

milahu · 2024-05-27T20:57:36Z

a proper fix would be to implement full http stream support

nothing new from google
https://issues.chromium.org/issues/332570739

just another feature request
which would be easy to implement, but is ignored as "low priority"

kaliiiiiiiiii added bug Something isn't working enhancement New feature or request labels Dec 2, 2023

kaliiiiiiiiii changed the title ~~network interception with Fetch.enable breaks cloudflare~~ network interception with Fetch.enable breaks cloudflare Dec 2, 2023

milahu mentioned this issue Dec 18, 2023

python site-packages should be treated as read-only kaliiiiiiiiii/CDP-Socket#17

Closed

milahu mentioned this issue Jan 12, 2024

driver.page_source fails on non-html pages: CDPError: Could not find node with given id #148

Closed

kaliiiiiiiiii closed this as completed in 98e0c12 Jan 29, 2024

kaliiiiiiiiii reopened this Jan 29, 2024

milahu mentioned this issue Mar 5, 2024

Should this handle packaged Chromium too? fkie-cad/friTap#17

Open

This was referenced Apr 16, 2024

Network module is missing a mechanism to alter incoming response body w3c/webdriver-bidi#541

Open

lets collaborate milahu/aiohttp_chromium#1

Open

kaliiiiiiiiii closed this as completed Apr 17, 2024

milahu mentioned this issue May 8, 2024

Rapidgator: CAPTCHA can't be shown due to (new?) X-Frame-Options header pyload/pyload#4456

Open

network interception with Fetch.enable breaks cloudflare #123

network interception with Fetch.enable breaks cloudflare #123

Comments

milahu commented Nov 30, 2023 • edited Loading

kaliiiiiiiiii commented Nov 30, 2023

juhacz commented Nov 30, 2023

kaliiiiiiiiii commented Nov 30, 2023

Soo in case some @Cloudfare staff is reading this:

juhacz commented Nov 30, 2023

milahu commented Dec 1, 2023 • edited Loading

kaliiiiiiiiii commented Dec 1, 2023

milahu commented Dec 1, 2023 • edited Loading

kaliiiiiiiiii commented Dec 1, 2023

kaliiiiiiiiii commented Dec 1, 2023

milahu commented Dec 1, 2023 • edited Loading

milahu commented Dec 4, 2023

kaliiiiiiiiii commented Dec 4, 2023

kaliiiiiiiiii commented Dec 16, 2023

milahu commented Dec 18, 2023

milahu commented Dec 20, 2023

kaliiiiiiiiii commented Dec 31, 2023

milahu commented Dec 31, 2023 • edited Loading

kaliiiiiiiiii commented Dec 31, 2023

milahu commented Dec 31, 2023

kaliiiiiiiiii commented Dec 31, 2023

kaliiiiiiiiii commented Dec 31, 2023

kaliiiiiiiiii commented Dec 31, 2023

milahu commented Dec 31, 2023

milahu commented Jan 13, 2024

milahu commented Jan 24, 2024 • edited Loading

kaliiiiiiiiii commented Jan 24, 2024

milahu commented Jan 24, 2024

kaliiiiiiiiii commented Jan 24, 2024 • edited Loading

milahu commented Jan 24, 2024 • edited Loading

kaliiiiiiiiii commented Jan 24, 2024

milahu commented Jan 24, 2024

kaliiiiiiiiii commented Jan 24, 2024

milahu commented Jan 26, 2024

CDP is broken by design?

dynamic analysis

kaliiiiiiiiii commented Jan 26, 2024

CDP is broken by design?

dynamic analysis

kaliiiiiiiiii commented Jan 29, 2024

milahu commented Jan 29, 2024

kaliiiiiiiiii commented Feb 1, 2024 • edited Loading

kaliiiiiiiiii commented Feb 1, 2024

kaliiiiiiiiii commented Feb 29, 2024

milahu commented Feb 29, 2024

kaliiiiiiiiii commented Apr 16, 2024

kaliiiiiiiiii commented Apr 17, 2024

milahu commented May 27, 2024

network interception with `Fetch.enable` breaks cloudflare #123

network interception with `Fetch.enable` breaks cloudflare #123

milahu commented Nov 30, 2023 •

edited

Loading

milahu commented Dec 1, 2023 •

edited

Loading

milahu commented Dec 1, 2023 •

edited

Loading

milahu commented Dec 1, 2023 •

edited

Loading

milahu commented Dec 31, 2023 •

edited

Loading

milahu commented Jan 24, 2024 •

edited

Loading

kaliiiiiiiiii commented Jan 24, 2024 •

edited

Loading

milahu commented Jan 24, 2024 •

edited

Loading

kaliiiiiiiiii commented Feb 1, 2024 •

edited

Loading