Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate Content-Length header for POST requests with empty body #2677

Closed
redapple opened this issue Mar 23, 2017 · 3 comments
Closed

Duplicate Content-Length header for POST requests with empty body #2677

redapple opened this issue Mar 23, 2017 · 3 comments
Assignees
Labels
Milestone

Comments

@redapple
Copy link
Contributor

@redapple redapple commented Mar 23, 2017

Originally reported on StackOverflow.

HTTP requests with POST method and no request body are sent with 2 Content-Length: 0 headers.

Reproducible with Scrapy 1.3.3 and Twisted 17.1:

$ scrapy version -v
Scrapy    : 1.3.3
lxml      : 3.7.3.0
libxml2   : 2.9.3
cssselect : 1.0.1
parsel    : 1.1.0
w3lib     : 1.17.0
Twisted   : 17.1.0
Python    : 2.7.12 (default, Nov 19 2016, 06:48:10) - [GCC 5.4.0 20160609]
pyOpenSSL : 16.2.0 (OpenSSL 1.0.2g  1 Mar 2016)
Platform  : Linux-4.8.0-41-generic-x86_64-with-Ubuntu-16.10-yakkety
$ scrapy shell
>>> fetch(scrapy.Request('http://httpbin.org/post', method='POST'))
2017-03-23 10:58:34 [scrapy.core.engine] DEBUG: Crawled (200) <POST http://httpbin.org/post> (referer: None)

(note that httpbin.org/post output does not show duplicate headers, but the Wireshark capture does)

Wireshark sniffing:

POST /post HTTP/1.1
Content-Length: 0
Content-Length: 0
Accept-Language: en
Accept-Encoding: gzip,deflate
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
User-Agent: Scrapy/1.3.3 (+http://scrapy.org)
Host: httpbin.org

HTTP/1.1 200 OK
Connection: keep-alive
Server: gunicorn/19.7.1
Date: Thu, 23 Mar 2017 09:58:34 GMT
Content-Type: application/json
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
Content-Length: 458
Via: 1.1 vegur

{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", 
    "Accept-Encoding": "gzip,deflate", 
    "Accept-Language": "en", 
    "Connection": "close", 
    "Content-Length": "0", 
    "Host": "httpbin.org", 
    "User-Agent": "Scrapy/1.3.3 (+http://scrapy.org)"
  }, 
  "json": null, 
  "origin": "89.84.122.217", 
  "url": "http://httpbin.org/post"
}

This is due to Twisted's twisted/twisted#670 since v17.1 and Scrapy's #1800 since Scrapy 1.1.0

@redapple redapple added bug http labels Mar 23, 2017
@redapple redapple self-assigned this Mar 23, 2017
@redapple redapple added this to the v1.4 milestone Mar 28, 2017
@redapple
Copy link
Contributor Author

@redapple redapple commented Apr 12, 2017

A ticket has been opened upstream too: https://twistedmatrix.com/trac/ticket/9097

@hawkowl
Copy link

@hawkowl hawkowl commented Apr 17, 2017

@redapple This doesn't seem like a Twisted bug? Shouldn't scrapy just not do the workaround on fixed Twisteds?

@redapple
Copy link
Contributor Author

@redapple redapple commented Apr 18, 2017

Hi @hawkowl , @glyph asked me to open an issue on Twisted so I did.
In Scrapy I'd rather not test specific Twisted version numbers in code if we can avoid it, so I've prepared another work around that works (I think) with all versions of Twisted>=14: #2678
Though I would say that Twisted could test if it is adding a duplicate header or not. Your call.

@dangra dangra closed this in #2678 Apr 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants