Big download is not being cancelled properly #1616

katcipis · 2015-11-25T12:46:06Z

I am running scrapy 1.0.1 in my pyhon web crawler and setting the max download size like this:

DOWNLOAD_MAXSIZE = 41943040 #40MB.

When I run into a bigger file, scrappy keeps logging the error: ERROR: Received (51316256) bytes larger than download max size (41943040) but never stops. I have looked into the code and saw a call to cancel the processe here, but in my case is not working, the download keeps going on.

Any thoughts on why this would happen ? It seems to be something wrong on twisted not cancelling the download.

Detailed version info:

scrapy version -v
2015-11-25 10:44:54 [scrapy] INFO: Scrapy 1.0.1 started (bot:)
2015-11-25 10:44:54 [scrapy] INFO: Optional features available: ssl, http11
2015-11-25 10:44:54 [scrapy] INFO: Overridden settings: {'COOKIES_DEBUG': True, 'DOWNLOAD_TIMEOUT': 600, 'SPIDER_MODULES': ['spiders'], 'CONCURRENT_REQUESTS': 10, 'RANDOMIZE_DOWNLOAD_DELAY': False, 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter', 'RETRY_TIMES': 10, 'BOT_NAME': 'bot', 'DOWNLOAD_MAXSIZE': 41943040, 'USER_AGENT': 'Mozilla/5.0 (X11; Linux i686; rv:23.0) Gecko/20100101 Firefox/23.0', 'NEWSPIDER_MODULE': 'spiders'}
Scrapy  : 1.0.1
lxml    : 3.5.0.0
libxml2 : 2.9.2
Twisted : 15.4.0
Python  : 2.7.10 (default, Aug 13 2015, 12:27:27) - [GCC 4.9.2]
Platform: Linux-3.16.0-38-generic-x86_64-with

The text was updated successfully, but these errors were encountered:

aditya-K93 · 2016-02-04T10:40:24Z

How do i reproduce this error ?

rmax · 2017-03-05T00:48:00Z

Fix #2622

redapple added the bug label Jan 25, 2016

redapple added this to the v1.4 milestone Mar 10, 2017

rmax mentioned this issue Apr 12, 2017

[MRG+1] Abort connection earlier and avoid to buffer data when max size limit is reached #2622

Merged

dangra closed this as completed in #2622 May 4, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Big download is not being cancelled properly #1616

Big download is not being cancelled properly #1616

katcipis commented Nov 25, 2015

aditya-K93 commented Feb 4, 2016

rmax commented Mar 5, 2017

Big download is not being cancelled properly #1616

Big download is not being cancelled properly #1616

Comments

katcipis commented Nov 25, 2015

aditya-K93 commented Feb 4, 2016

rmax commented Mar 5, 2017