Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] [settings/default_settings.py] dont retry 400 #1289

Merged
merged 1 commit into from Sep 2, 2015

Conversation

@pawelmhm
Copy link
Contributor

@pawelmhm pawelmhm commented Jun 8, 2015

As in HTTP specs:

10.4.1 400 Bad Request

The request could not be understood by the server due to malformed
syntax. The client SHOULD NOT repeat the request without
modifications.

So IMO Scrapy should not retry 400 by default.

As in HTTP specs:

"10.4.1 400 Bad Request

The request could not be understood by the server due to malformed
syntax. The client SHOULD NOT repeat the request without
modifications."

Scrapy should not retry 400 by default.
@kmike
Copy link
Member

@kmike kmike commented Jun 8, 2015

Some servers don't follow this suggestion - e.g. according to Amazon DynamoDB docs client should retry some of requests if their server returned HTTP 400 error.

For me changing it to match HTTP specs makes sense, +1 to change it.

But I wonder why was 400 in the list in the first place. It was in Scrapy since forever - HTTP 400 is in list in the initially imported code.

@pawelmhm
Copy link
Contributor Author

@pawelmhm pawelmhm commented Jun 8, 2015

Some servers don't follow this suggestion - e.g. according to Amazon DynamoDB docs client should retry some of requests if their server returned HTTP 400 error.

interesting, but still only some of errors related to 400 should be retried so even for DynamoDB retrying all 400 by default is not recommended, only 4 out of 12 types of 400 responses should be retried according to their docs.

I thought about raising this because it seems dangerous. If you have broken spider generating 100k requests all of them returning 400, with current default Scrapy settings your spider will most likely generate 300k bad requests...

@kmike kmike changed the title [settings/default_settings.py] dont retry 400 [MRG+1] [settings/default_settings.py] dont retry 400 Jun 9, 2015
@kmike
Copy link
Member

@kmike kmike commented Jun 9, 2015

Yo have my +1, but we need @pablohoffman or @dangra or @shaneaevans to confirm removing HTTP 400 from retry list is OK.

@dangra
Copy link
Member

@dangra dangra commented Jun 9, 2015

I can't think on any useful case but it is clearly a backward incompatible change. We should comment it in 1.0 release notes.

@nyov
Copy link
Contributor

@nyov nyov commented Jul 21, 2015

+1, because people (scraper devs) should make at least a conscious decision to ignore common standards, by overriding defaults. The current default here looks prone to server bashing from uninformed users.

dangra added a commit that referenced this pull request Sep 2, 2015
[MRG+1] [settings/default_settings.py] dont retry 400
@dangra dangra merged commit af08737 into scrapy:master Sep 2, 2015
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@dangra dangra added this to the Scrapy 1.1 milestone Sep 2, 2015
kmike added a commit that referenced this pull request Oct 6, 2015
kmike added a commit that referenced this pull request Oct 8, 2015
pablohoffman added a commit that referenced this pull request Oct 13, 2015
[MRG+1] DOC fix docs after GH-1289.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants