New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] [settings/default_settings.py] dont retry 400 #1289
Conversation
As in HTTP specs: "10.4.1 400 Bad Request The request could not be understood by the server due to malformed syntax. The client SHOULD NOT repeat the request without modifications." Scrapy should not retry 400 by default.
Some servers don't follow this suggestion - e.g. according to Amazon DynamoDB docs client should retry some of requests if their server returned HTTP 400 error. For me changing it to match HTTP specs makes sense, +1 to change it. But I wonder why was 400 in the list in the first place. It was in Scrapy since forever - HTTP 400 is in list in the initially imported code. |
interesting, but still only some of errors related to 400 should be retried so even for DynamoDB retrying all 400 by default is not recommended, only 4 out of 12 types of 400 responses should be retried according to their docs. I thought about raising this because it seems dangerous. If you have broken spider generating 100k requests all of them returning 400, with current default Scrapy settings your spider will most likely generate 300k bad requests... |
Yo have my +1, but we need @pablohoffman or @dangra or @shaneaevans to confirm removing HTTP 400 from retry list is OK. |
I can't think on any useful case but it is clearly a backward incompatible change. We should comment it in 1.0 release notes. |
+1, because people (scraper devs) should make at least a conscious decision to ignore common standards, by overriding defaults. The current default here looks prone to server bashing from uninformed users. |
[MRG+1] [settings/default_settings.py] dont retry 400
[MRG+1] DOC fix docs after GH-1289.
As in HTTP specs:
So IMO Scrapy should not retry 400 by default.