We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A basic spider like this hangs and is only possible to close the spider by double Ctrl-C
class TestSpider1c(scrapy.Spider): name = "test1c" allowed_domains = ['productlibrary.brandbank.com'] start_urls = [ 'https://productlibrary.brandbank.com/products/detail/949211', ] def parse(self, request): return []
$ scrapy crawl test1c 2014-12-16 11:18:18-0200 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats 2014-12-16 11:18:18-0200 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware 2014-12-16 11:18:18-0200 [scrapy] INFO: Enabled item pipelines: 2014-12-16 11:18:18-0200 [test1c] INFO: Spider opened 2014-12-16 11:18:18-0200 [test1c] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2014-12-16 11:18:18-0200 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023 2014-12-16 11:18:18-0200 [scrapy] DEBUG: Web service listening on 127.0.0.1:6080 2014-12-16 11:18:18-0200 [scrapy] INFO: Scrapy 0.25.1 started (bot: hangtest) 2014-12-16 11:18:18-0200 [scrapy] INFO: Scrapy 0.25.1 started (bot: hangtest) 2014-12-16 11:18:18-0200 [scrapy] INFO: Optional features available: ssl, http11 2014-12-16 11:18:18-0200 [scrapy] INFO: Optional features available: ssl, http11 2014-12-16 11:18:18-0200 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'hangtest.spiders', 'SPIDER_MODULES': ['hangtest.spiders'], 'BOT_NAME': 'hangtest'} 2014-12-16 11:18:18-0200 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'hangtest.spiders', 'SPIDER_MODULES': ['hangtest.spiders'], 'BOT_NAME': 'hangtest'} 2014-12-16 11:18:19-0200 [test1c] DEBUG: Redirecting (302) to <GET https://secure.brandbank.com/users/issue.aspx?wa=wsignin1.0&wtrealm=https%3a%2f%2fproductlibrary.brandbank.com&wctx=https%3a%2f%2fproductlibrary.brandbank.com%2fproducts%2fdetail%2f949211> from <GET https://productlibrary.brandbank.com/products/detail/949211> 2014-12-16 11:18:21-0200 [test1c] DEBUG: Redirecting (302) to <GET https://secure.brandbank.com/login.aspx?ReturnUrl=%2fusers%2fissue.aspx%3fwa%3dwsignin1.0%26wtrealm%3dhttps%253a%252f%252fproductlibrary.brandbank.com%26wctx%3dhttps%253a%252f%252fproductlibrary.brandbank.com%252fproducts%252fdetail%252f949211&wa=wsignin1.0&wtrealm=https%3a%2f%2fproductlibrary.brandbank.com&wctx=https%3a%2f%2fproductlibrary.brandbank.com%2fproducts%2fdetail%2f949211> from <GET https://secure.brandbank.com/users/issue.aspx?wa=wsignin1.0&wtrealm=https%3a%2f%2fproductlibrary.brandbank.com&wctx=https%3a%2f%2fproductlibrary.brandbank.com%2fproducts%2fdetail%2f949211> 2014-12-16 11:18:21-0200 [test1c] DEBUG: Crawled (200) <GET https://secure.brandbank.com/login.aspx?ReturnUrl=%2fusers%2fissue.aspx%3fwa%3dwsignin1.0%26wtrealm%3dhttps%253a%252f%252fproductlibrary.brandbank.com%26wctx%3dhttps%253a%252f%252fproductlibrary.brandbank.com%252fproducts%252fdetail%252f949211&wa=wsignin1.0&wtrealm=https%3a%2f%2fproductlibrary.brandbank.com&wctx=https%3a%2f%2fproductlibrary.brandbank.com%2fproducts%2fdetail%2f949211> (referer: None) 2014-12-16 11:18:21-0200 [test1c] INFO: Closing spider (finished) 2014-12-16 11:18:21-0200 [test1c] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 1201, 'downloader/request_count': 3, 'downloader/request_method_count/GET': 3, 'downloader/response_bytes': 6838, 'downloader/response_count': 3, 'downloader/response_status_count/200': 1, 'downloader/response_status_count/302': 2, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2014, 12, 16, 13, 18, 21, 833571), 'log_count/DEBUG': 5, 'log_count/INFO': 9, 'response_received_count': 1, 'scheduler/dequeued': 3, 'scheduler/dequeued/memory': 3, 'scheduler/enqueued': 3, 'scheduler/enqueued/memory': 3, 'start_time': datetime.datetime(2014, 12, 16, 13, 18, 18, 551032)} 2014-12-16 11:18:21-0200 [test1c] INFO: Spider closed (finished) ^C^C $
git bisect between 0.25.0 and master branch:
There are only 'skip'ped commits left to test. The first bad commit could be any of: 39c6a80 d7038b2 3ae9714 980e30a a995727 d402735 870438e eb0253e 84fa004 d0edad4 89df18b We cannot bisect more!
The text was updated successfully, but these errors were encountered:
@dangra #708
Sorry, something went wrong.
@nramirezuy but in this case start_requests output should be an standard iterable based on start_urls. how is #708 supposed to fix the issue?
@dangra You are right, this is different.
I found this is not being used anywhere.
Successfully merging a pull request may close this issue.
A basic spider like this hangs and is only possible to close the spider by double Ctrl-C
$ scrapy crawl test1c 2014-12-16 11:18:18-0200 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats 2014-12-16 11:18:18-0200 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware 2014-12-16 11:18:18-0200 [scrapy] INFO: Enabled item pipelines: 2014-12-16 11:18:18-0200 [test1c] INFO: Spider opened 2014-12-16 11:18:18-0200 [test1c] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2014-12-16 11:18:18-0200 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023 2014-12-16 11:18:18-0200 [scrapy] DEBUG: Web service listening on 127.0.0.1:6080 2014-12-16 11:18:18-0200 [scrapy] INFO: Scrapy 0.25.1 started (bot: hangtest) 2014-12-16 11:18:18-0200 [scrapy] INFO: Scrapy 0.25.1 started (bot: hangtest) 2014-12-16 11:18:18-0200 [scrapy] INFO: Optional features available: ssl, http11 2014-12-16 11:18:18-0200 [scrapy] INFO: Optional features available: ssl, http11 2014-12-16 11:18:18-0200 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'hangtest.spiders', 'SPIDER_MODULES': ['hangtest.spiders'], 'BOT_NAME': 'hangtest'} 2014-12-16 11:18:18-0200 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'hangtest.spiders', 'SPIDER_MODULES': ['hangtest.spiders'], 'BOT_NAME': 'hangtest'} 2014-12-16 11:18:19-0200 [test1c] DEBUG: Redirecting (302) to <GET https://secure.brandbank.com/users/issue.aspx?wa=wsignin1.0&wtrealm=https%3a%2f%2fproductlibrary.brandbank.com&wctx=https%3a%2f%2fproductlibrary.brandbank.com%2fproducts%2fdetail%2f949211> from <GET https://productlibrary.brandbank.com/products/detail/949211> 2014-12-16 11:18:21-0200 [test1c] DEBUG: Redirecting (302) to <GET https://secure.brandbank.com/login.aspx?ReturnUrl=%2fusers%2fissue.aspx%3fwa%3dwsignin1.0%26wtrealm%3dhttps%253a%252f%252fproductlibrary.brandbank.com%26wctx%3dhttps%253a%252f%252fproductlibrary.brandbank.com%252fproducts%252fdetail%252f949211&wa=wsignin1.0&wtrealm=https%3a%2f%2fproductlibrary.brandbank.com&wctx=https%3a%2f%2fproductlibrary.brandbank.com%2fproducts%2fdetail%2f949211> from <GET https://secure.brandbank.com/users/issue.aspx?wa=wsignin1.0&wtrealm=https%3a%2f%2fproductlibrary.brandbank.com&wctx=https%3a%2f%2fproductlibrary.brandbank.com%2fproducts%2fdetail%2f949211> 2014-12-16 11:18:21-0200 [test1c] DEBUG: Crawled (200) <GET https://secure.brandbank.com/login.aspx?ReturnUrl=%2fusers%2fissue.aspx%3fwa%3dwsignin1.0%26wtrealm%3dhttps%253a%252f%252fproductlibrary.brandbank.com%26wctx%3dhttps%253a%252f%252fproductlibrary.brandbank.com%252fproducts%252fdetail%252f949211&wa=wsignin1.0&wtrealm=https%3a%2f%2fproductlibrary.brandbank.com&wctx=https%3a%2f%2fproductlibrary.brandbank.com%2fproducts%2fdetail%2f949211> (referer: None) 2014-12-16 11:18:21-0200 [test1c] INFO: Closing spider (finished) 2014-12-16 11:18:21-0200 [test1c] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 1201, 'downloader/request_count': 3, 'downloader/request_method_count/GET': 3, 'downloader/response_bytes': 6838, 'downloader/response_count': 3, 'downloader/response_status_count/200': 1, 'downloader/response_status_count/302': 2, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2014, 12, 16, 13, 18, 21, 833571), 'log_count/DEBUG': 5, 'log_count/INFO': 9, 'response_received_count': 1, 'scheduler/dequeued': 3, 'scheduler/dequeued/memory': 3, 'scheduler/enqueued': 3, 'scheduler/enqueued/memory': 3, 'start_time': datetime.datetime(2014, 12, 16, 13, 18, 18, 551032)} 2014-12-16 11:18:21-0200 [test1c] INFO: Spider closed (finished) ^C^C $git bisect between 0.25.0 and master branch:
The text was updated successfully, but these errors were encountered: