-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How should I use scrapy-playwright without downloading images? #51
Comments
This seems like a duplicate of #26, please reopen with more info if you don't agree. |
I have a question on this. I have followed the example you provided in the answer to that thread, and from my initial thought I imagined no images would be downloaded, i.e. I would not get any response/requests sent to .jpeg attachements. Here's the output I get:
Perhaps I have misunderstood what context actually do - If I wanted no requests/responses sent/received from images how would I accomplish this or is this not the same as saying 'without downloading images'? |
Only #63 should be considered "in progress". That said, it's probably going to keep showing requests like that in the stats, because it relies on aborting requests caught by the |
I checked the new books_block_request and it ran as expected! Does this mean that I cannot use:
to abort responses? - because this still fails to block responses even when I updated with the latest Edit:
When we want to block multiple resource types:
|
That's correct, the PLAYWRIGHT_ACCEPT_REQUEST_PREDICATE = lambda req: req.resource_type not in ("image", "script") |
Thank you for the implementation - I agree, there's still more creative ideas for it's implementation. An additional one may be the following:
For the following reason:
However, I'm not sure what will happen when the following is involved:
It may be useful to add a break on the previous occurring Whereas, position would denote the following:
The position of the
This makes things slightly more complex - however, this would represent the following: Then the following will denote:
All the resource types are included when Therefore we just have to include:
and make controls on abort in the script itself for further functionality. |
On addition to the above; there's also a need to include the route handle, as such:
Where, the Alternatively, I was thinking how functional would Something like:
There's quite a lot to unpack here but I hope it proves useful for the future development of @elacuesta your thoughts? |
I'm sorry, that sounds overcomplicated. My aim is to keep things as simple as possible, I don't want to build a whole new API.
The idea is to abort requests before they're sent, if they already have responses it's just not possible to abort them anymore. However, passing the |
Hi,
Could anyone help me to use scrapy-playwright without downloading images?
Thanks for your support,
The text was updated successfully, but these errors were encountered: