Support playwright_stealth #109

tanghq33 · 2022-07-26T14:43:34Z

Integrated playwright_stealth, and PLAYWRIGHT_STEALTH_ENABLED as an optional config.

Attached bot test results.

PLAYWRIGHT_STEALTH_ENABLED = True

PLAYWRIGHT_STEALTH_ENABLED = False

elacuesta · 2022-07-27T19:57:19Z

Thank you very much for the contribution, but I don't want to include any third-party dependency unless it's really necessary.
I've been thinking that one way to allow this functionality (and address #25 at the same time) would be to add a way to handle pages right after they are created (an idea I've already explored at #26 (comment)). I'm imagining something like the following:

from scrapy import Spider, Request
from playwright.async_api import Page

async def new_page_handler(page: Page) -> None:
    await page.add_init_script("/path/to/script")
    # more stuff

class AwesomeSpider(Spider):
    def start_requests(self):
        yield Request(
            url="https://httpbin.org/get",
            meta={"playwright": True, "playwright_configure_page": new_page_handler},
        )

elacuesta · 2022-10-09T21:53:35Z

For the record, this should be possible after #128

nimish · 2022-11-01T15:04:31Z

Thank you very much for the contribution, but I don't want to include any third-party dependency unless it's really necessary. I've been thinking that one way to allow this functionality (and address #25 at the same time) would be to add a way to handle pages right after they are created (an idea I've already explored at #26 (comment)). I'm imagining something like the following:
from scrapy import Spider, Request
from playwright.async_api import Page

async def new_page_handler(page: Page) -> None:
    await page.add_init_script("/path/to/script")
    # more stuff

class AwesomeSpider(Spider):
    def start_requests(self):
        yield Request(
            url="https://httpbin.org/get",
            meta={"playwright": True, "playwright_configure_page": new_page_handler},
        )

It should be possible to include this with an optional pip dependency e.g. scrapy-playwright[with_playwright_stealth] to avoid requiring the dependency while also including this in the distribution

elacuesta · 2022-11-01T20:41:42Z

It should be possible to include this with an optional pip dependency e.g. scrapy-playwright[with_playwright_stealth] to avoid requiring the dependency while also including this in the distribution

That's true, but it would still require changes to the main handler in order to support the integration - that's what I want to avoid.
It's possible to integrate with this after v0.0.22, by using the playwright_page_init_callback request meta key:

from playwright_stealth import stealth_async

async def init_page(page, request):
    await stealth_async(page)

class StealthSpider(scrapy.Spider):
    def start_requests(self):
        yield scrapy.Request(
            url="https://example.org",
            meta={
                "playwright": True,
                "playwright_page_init_callback": init_page,
            },
        )

kinoute · 2023-03-01T15:53:37Z

@hqtang33 Were you able to find a solution? I tried to include your changes proposed here and also your fork of the stealth plugin but unfortunately, even the "simple" removal of "Headless" doesn't work in the user-agent.

tanghq33 added 2 commits July 26, 2022 22:25

Support playwright_stealth

dad8d18

added await for stealth_async

8e215ed

tanghq33 closed this Sep 27, 2022

elacuesta mentioned this pull request Jan 26, 2023

is it possible to use playwright-stealth with the scrapy-playwright integration? #160

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support playwright_stealth #109

Support playwright_stealth #109

tanghq33 commented Jul 26, 2022

elacuesta commented Jul 27, 2022

elacuesta commented Oct 9, 2022

nimish commented Nov 1, 2022 •

edited

Loading

elacuesta commented Nov 1, 2022 •

edited

Loading

kinoute commented Mar 1, 2023

Support playwright_stealth #109

Support playwright_stealth #109

Conversation

tanghq33 commented Jul 26, 2022

elacuesta commented Jul 27, 2022

elacuesta commented Oct 9, 2022

nimish commented Nov 1, 2022 • edited Loading

elacuesta commented Nov 1, 2022 • edited Loading

kinoute commented Mar 1, 2023

nimish commented Nov 1, 2022 •

edited

Loading

elacuesta commented Nov 1, 2022 •

edited

Loading