Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrapy 0.25 hangs closing spider #985

Closed
dangra opened this issue Dec 16, 2014 · 3 comments · Fixed by #999
Closed

Scrapy 0.25 hangs closing spider #985

dangra opened this issue Dec 16, 2014 · 3 comments · Fixed by #999

Comments

@dangra
Copy link
Member

dangra commented Dec 16, 2014

A basic spider like this hangs and is only possible to close the spider by double Ctrl-C

class TestSpider1c(scrapy.Spider):
    name = "test1c"
    allowed_domains = ['productlibrary.brandbank.com']

    start_urls = [
        'https://productlibrary.brandbank.com/products/detail/949211',

    ]

    def parse(self, request):
        return []
$ scrapy crawl test1c
2014-12-16 11:18:18-0200 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2014-12-16 11:18:18-0200 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2014-12-16 11:18:18-0200 [scrapy] INFO: Enabled item pipelines: 
2014-12-16 11:18:18-0200 [test1c] INFO: Spider opened
2014-12-16 11:18:18-0200 [test1c] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2014-12-16 11:18:18-0200 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2014-12-16 11:18:18-0200 [scrapy] DEBUG: Web service listening on 127.0.0.1:6080
2014-12-16 11:18:18-0200 [scrapy] INFO: Scrapy 0.25.1 started (bot: hangtest)
2014-12-16 11:18:18-0200 [scrapy] INFO: Scrapy 0.25.1 started (bot: hangtest)
2014-12-16 11:18:18-0200 [scrapy] INFO: Optional features available: ssl, http11
2014-12-16 11:18:18-0200 [scrapy] INFO: Optional features available: ssl, http11
2014-12-16 11:18:18-0200 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'hangtest.spiders', 'SPIDER_MODULES': ['hangtest.spiders'], 'BOT_NAME': 'hangtest'}
2014-12-16 11:18:18-0200 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'hangtest.spiders', 'SPIDER_MODULES': ['hangtest.spiders'], 'BOT_NAME': 'hangtest'}
2014-12-16 11:18:19-0200 [test1c] DEBUG: Redirecting (302) to <GET https://secure.brandbank.com/users/issue.aspx?wa=wsignin1.0&wtrealm=https%3a%2f%2fproductlibrary.brandbank.com&wctx=https%3a%2f%2fproductlibrary.brandbank.com%2fproducts%2fdetail%2f949211> from <GET https://productlibrary.brandbank.com/products/detail/949211>
2014-12-16 11:18:21-0200 [test1c] DEBUG: Redirecting (302) to <GET https://secure.brandbank.com/login.aspx?ReturnUrl=%2fusers%2fissue.aspx%3fwa%3dwsignin1.0%26wtrealm%3dhttps%253a%252f%252fproductlibrary.brandbank.com%26wctx%3dhttps%253a%252f%252fproductlibrary.brandbank.com%252fproducts%252fdetail%252f949211&wa=wsignin1.0&wtrealm=https%3a%2f%2fproductlibrary.brandbank.com&wctx=https%3a%2f%2fproductlibrary.brandbank.com%2fproducts%2fdetail%2f949211> from <GET https://secure.brandbank.com/users/issue.aspx?wa=wsignin1.0&wtrealm=https%3a%2f%2fproductlibrary.brandbank.com&wctx=https%3a%2f%2fproductlibrary.brandbank.com%2fproducts%2fdetail%2f949211>
2014-12-16 11:18:21-0200 [test1c] DEBUG: Crawled (200) <GET https://secure.brandbank.com/login.aspx?ReturnUrl=%2fusers%2fissue.aspx%3fwa%3dwsignin1.0%26wtrealm%3dhttps%253a%252f%252fproductlibrary.brandbank.com%26wctx%3dhttps%253a%252f%252fproductlibrary.brandbank.com%252fproducts%252fdetail%252f949211&wa=wsignin1.0&wtrealm=https%3a%2f%2fproductlibrary.brandbank.com&wctx=https%3a%2f%2fproductlibrary.brandbank.com%2fproducts%2fdetail%2f949211> (referer: None)
2014-12-16 11:18:21-0200 [test1c] INFO: Closing spider (finished)
2014-12-16 11:18:21-0200 [test1c] INFO: Dumping Scrapy stats:
    {'downloader/request_bytes': 1201,
     'downloader/request_count': 3,
     'downloader/request_method_count/GET': 3,
     'downloader/response_bytes': 6838,
     'downloader/response_count': 3,
     'downloader/response_status_count/200': 1,
     'downloader/response_status_count/302': 2,
     'finish_reason': 'finished',
     'finish_time': datetime.datetime(2014, 12, 16, 13, 18, 21, 833571),
     'log_count/DEBUG': 5,
     'log_count/INFO': 9,
     'response_received_count': 1,
     'scheduler/dequeued': 3,
     'scheduler/dequeued/memory': 3,
     'scheduler/enqueued': 3,
     'scheduler/enqueued/memory': 3,
     'start_time': datetime.datetime(2014, 12, 16, 13, 18, 18, 551032)}
2014-12-16 11:18:21-0200 [test1c] INFO: Spider closed (finished)
^C^C
$

git bisect between 0.25.0 and master branch:

There are only 'skip'ped commits left to test.
The first bad commit could be any of:
39c6a80
d7038b2
3ae9714
980e30a
a995727
d402735
870438e
eb0253e
84fa004
d0edad4
89df18b
We cannot bisect more!

@nramirezuy
Copy link
Contributor

@dangra #708

@dangra
Copy link
Member Author

dangra commented Dec 16, 2014

@nramirezuy but in this case start_requests output should be an standard iterable based on start_urls. how is #708 supposed to fix the issue?

@nramirezuy
Copy link
Contributor

@dangra You are right, this is different.

I found this is not being used anywhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants