-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Receiving a 400 response after clicking "I agree" on the consent form on Google, but not when running through regular Playwright. #97
Comments
From a quick look, it seems like it might be due to the header processing done by scrapy-playwright. I'd suggest you to look into the |
I have tried setting |
I'm not able to reproduce, the site does not reply to me with a response that matches your code, i.e. no |
I'm not sure the proxy is the issue. If I don't use a proxy and use a "normal" user agent, then I don't get the consent page. However, if I supply the default scrapy user agent then do I get hit with the consent page, and I still get the 400 response code after clicking "I agree". Perhaps this would allow you to reproduce the issue? Also, would I be correct in saying that by setting |
Indeed, seems like the site doesn't like Scrapy's user agent. Besides that, I can't reproduce, either with or without Regarding this:
Thanks! You just found a bug: #98 |
Thank you for fixing that bug! But, this is very strange. I have even started the scrapy project completely from scratch with a minimal script and I still get either a 400 or a 405 response code depending on the type of consent page that I get. I have attached my logs and script from this minimal setup. As I said, clicking on this consent page works absolutely fine in vanilla Playwright on the same machine, so I'm struggling to wrap my head around why this isn't working. Spider
Settings file
|
I've just tried this again and I still can't reproduce. With the code from this comment I get a a captcha with a message about suspicious traffic. If I remove all query params from the querystring except for the actual search string ( |
Closing due to inactivity. |
Hi,
I have a strange issue where I am receiving a 400 response from Google after clicking on the "I agree" button on their consent form.
This issue however does not appear if I click on the "Customise" button, nor does it happen if I perform the request via regular Playwright. I thought at first that it may be the proxy I am using, but that also works via regular Playwright.
Playwright code:
scrapy-playwright code:
What could be a reason for this? There is probably something simple I am missing here.
OS: Ubuntu 22.04
Python: 3.8.10
scrapy-playwright: 0.0.17
The text was updated successfully, but these errors were encountered: