Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Unhandled error caused by HTTP non-compliant headers #210
Use Case 1
scrapy shell http://aaa.17domn.com/bt9/file.php/MERH77V.html
[ScrapyHTTPPageGetter,client] Unhandled Error
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-macosx-10.6-intel.egg/twisted/web/http.py", line 406, in extractHeader
Use Case 2:
scrapy shell http://www1.wkdown.info/fs3/file.php/M994ATR.html
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Scrapy-0.16.2-py2.7.egg/scrapy/core/downloader/webclient.py", line 122, in _build_response
ValueError: invalid literal for int() with base 10: 'html'
thanks, the problem is clear and should be fixed.
I tried chrome and it renders the page fine ignoring the bad headers and assuming 200 status
For later debugging this is the curl output including headers for both urls: