0 pages is being crawled #1

ummezafiirah · 2018-09-10T19:21:30Z

Hello,

I am new to scrapy and I have tried your codes.
I tried to scrap Donald Trump page
I have this being displayed:
[scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
I can't figure out where actually the problem is.

Please find below the entire message being output:
2018-09-10 23:14:01 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: fbcrawl)
2018-09-10 23:14:01 [scrapy.utils.log] INFO: Overridden settings: {'BOT_NAME': 'fbcrawl', 'FEED_EXPORT_ENCODING': 'utf-8', 'FEED_EXPORT_FIELDS': ['source', 'date', 'text', 'reactions', 'likes', 'ahah', 'love', 'wow', 'sigh', 'grrr', 'comments', 'url'], 'LOG_LEVEL': 'INFO', 'NEWSPIDER_MODULE': 'fbcrawl.spiders', 'SPIDER_MODULES': ['fbcrawl.spiders']}
2018-09-10 23:14:01 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.logstats.LogStats']
2018-09-10 23:14:02 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-09-10 23:14:02 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-09-10 23:14:02 [scrapy.middleware] INFO: Enabled item pipelines:
['fbcrawl.pipelines.FbcrawlPipeline']
2018-09-10 23:14:02 [scrapy.core.engine] INFO: Spider opened
2018-09-10 23:14:02 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-09-10 23:14:05 [fb] INFO: Parse function called on https://mbasic.facebook.com/DonaldTrump/?refid=46
2018-09-10 23:14:06 [scrapy.core.scraper] ERROR: Spider error processing <GET https://mbasic.facebook.com/DonaldTrump/?refid=46> (referer: https://mbasic.facebook.com/login/save-device/?login_source=login&refsrc=https%3A%2F%2Fmbasic.facebook.com%2F&refid=8&_rdr)

rugantio · 2018-10-21T21:13:17Z

Hi, welcome to the scrapy world, it's a fun journey!
First thing I notice is that right before the last ERROR line you should have the INFO like:

[fb] INFO: Parse function called on https://mbasic.facebook.com/DonaldTrump

So make sure you gave the appropriate page name. Also sometimes the bot gets stuck because Facebook fingerprints the browser and tries to block the new scrapy device. Although I've written some code in the parse_home function to bypass this behavior, sometimes it doesn't work well. A simple workaround is to log in via your traditional web browser once and everything should work fine.
Also check your mailbox, sometimes fb sends you an email saying that an unknown device has been trying to access without permission.

likers - press the more button and collect all reactions (currently i…

rugantio closed this as completed Oct 21, 2018

hohvn mentioned this issue Nov 11, 2018

cannot redirect to page #4

Closed

erba994 pushed a commit to erba994/fbcrawl that referenced this issue Oct 12, 2021

Merge pull request rugantio#1 from jaewonha/master

93e19b6

likers - press the more button and collect all reactions (currently i…

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0 pages is being crawled #1

0 pages is being crawled #1

ummezafiirah commented Sep 10, 2018 •

edited

Loading

rugantio commented Oct 21, 2018

0 pages is being crawled #1

0 pages is being crawled #1

Comments

ummezafiirah commented Sep 10, 2018 • edited Loading

rugantio commented Oct 21, 2018

ummezafiirah commented Sep 10, 2018 •

edited

Loading