Skip to content

[Question] inside a page.response or page.requestcompleted handler i can't get the page body. It is a bug ? #945

@MauroMFerreira

Description

@MauroMFerreira
...
def request_finished_handler(request):
        response = request.response()
        ...
        if request.redirect_to==None and request.resource_type in [ 'document','script' ]:
            body = response.body()    # error => Response body is unavailable for redirect responses
            body = response.text()      # error => Response body is unavailable for redirect responses
            body = page.content()      # error => Execution context was destroyed, most likely because of a navigation.
            body = page.evaluate('document.body') # error => Execution context was destroyed, most likely because of a navigation.
 ...
page.on("requestfinished", request_finished_handler)
page.goto(url)

I also tried with:

def response_handler(response):
        request = response.request
...
page.on("response", response_handler)
page.goto(url)

Problem is, I don't need the body of the final page loaded, but the full bodies of the documents and scripts from the starting url until the last link before the final url, to learn and later avoid or spoof fingerprinting.

I can - and i am using by now - requests.get() to get those bodies, but this have a major problem: being outside playwright, can be detected and denied as a scrapper (no session, no referrer, etc.), so i want to avoid this hack.

It is a bug or there is a way to do this that i don't know ?

Thanks !

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions