Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blocking certain requests, e.g. images, css #26

Closed
pawelmhm opened this issue Sep 28, 2021 · 1 comment · Fixed by #63
Closed

Blocking certain requests, e.g. images, css #26

pawelmhm opened this issue Sep 28, 2021 · 1 comment · Fixed by #63
Labels
enhancement New feature or request

Comments

@pawelmhm
Copy link
Collaborator

pawelmhm commented Sep 28, 2021

It would be nice to allow to configure playwright routes that block certain types of requests, e.g. css, images, ads etc. This could be done by context.route. At the moment in scrapy-playwright context can be configured with arguments, so you can pass strings. What do you think about allowing users to create their context objects somewhere? For example there can be some method create_context, defined somewhere, e.g. in settings or spider? That could return Context object used by playwright download handler later. I'm not sure this is best option, just looking for some solutions

@Gallaecio Gallaecio added the enhancement New feature or request label Sep 28, 2021
@elacuesta
Copy link
Member

Sounds reasonable. Please see 5bc2dd0 for a first and naive attempt. It kind of works for this use case, but the problem is that route("**") needs to be called later, in order to intercept all requests (to set the correct HTTP method and headers, among other things). At least with the current design, this is necessary at the page level, not the context level, because multiple pages from the same context could be fetching resources at the same time, and the request handler that's currently being attached does some checks using the current request for a given page.
In any case, this approach could be used to solve #25, although with a clear warning about not using route on the received context (I'm hesitant about adding a feature with such restriction though).

An alternative would be to add a PLAYWRIGHT_ABORT_ROUTE setting and playwright_abort_route meta key to handle only the abort case, not any arbitrary handling of intercepted requests. That would be less flexible but it would solve this specific case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants