You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be nice to allow to configure playwright routes that block certain types of requests, e.g. css, images, ads etc. This could be done by context.route. At the moment in scrapy-playwright context can be configured with arguments, so you can pass strings. What do you think about allowing users to create their context objects somewhere? For example there can be some method create_context, defined somewhere, e.g. in settings or spider? That could return Context object used by playwright download handler later. I'm not sure this is best option, just looking for some solutions
The text was updated successfully, but these errors were encountered:
Sounds reasonable. Please see 5bc2dd0 for a first and naive attempt. It kind of works for this use case, but the problem is that route("**") needs to be called later, in order to intercept all requests (to set the correct HTTP method and headers, among other things). At least with the current design, this is necessary at the page level, not the context level, because multiple pages from the same context could be fetching resources at the same time, and the request handler that's currently being attached does some checks using the current request for a given page.
In any case, this approach could be used to solve #25, although with a clear warning about not using route on the received context (I'm hesitant about adding a feature with such restriction though).
An alternative would be to add a PLAYWRIGHT_ABORT_ROUTE setting and playwright_abort_route meta key to handle only the abort case, not any arbitrary handling of intercepted requests. That would be less flexible but it would solve this specific case.
It would be nice to allow to configure playwright routes that block certain types of requests, e.g. css, images, ads etc. This could be done by context.route. At the moment in scrapy-playwright context can be configured with arguments, so you can pass strings. What do you think about allowing users to create their context objects somewhere? For example there can be some method create_context, defined somewhere, e.g. in settings or spider? That could return Context object used by playwright download handler later. I'm not sure this is best option, just looking for some solutions
The text was updated successfully, but these errors were encountered: