Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0 pages is being crawled #1

Closed
ummezafiirah opened this issue Sep 10, 2018 · 1 comment
Closed

0 pages is being crawled #1

ummezafiirah opened this issue Sep 10, 2018 · 1 comment

Comments

@ummezafiirah
Copy link

ummezafiirah commented Sep 10, 2018

Hello,

I am new to scrapy and I have tried your codes.
I tried to scrap Donald Trump page
I have this being displayed:
[scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
I can't figure out where actually the problem is.

Please find below the entire message being output:
2018-09-10 23:14:01 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: fbcrawl)
2018-09-10 23:14:01 [scrapy.utils.log] INFO: Overridden settings: {'BOT_NAME': 'fbcrawl', 'FEED_EXPORT_ENCODING': 'utf-8', 'FEED_EXPORT_FIELDS': ['source', 'date', 'text', 'reactions', 'likes', 'ahah', 'love', 'wow', 'sigh', 'grrr', 'comments', 'url'], 'LOG_LEVEL': 'INFO', 'NEWSPIDER_MODULE': 'fbcrawl.spiders', 'SPIDER_MODULES': ['fbcrawl.spiders']}
2018-09-10 23:14:01 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.logstats.LogStats']
2018-09-10 23:14:02 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-09-10 23:14:02 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-09-10 23:14:02 [scrapy.middleware] INFO: Enabled item pipelines:
['fbcrawl.pipelines.FbcrawlPipeline']
2018-09-10 23:14:02 [scrapy.core.engine] INFO: Spider opened
2018-09-10 23:14:02 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-09-10 23:14:05 [fb] INFO: Parse function called on https://mbasic.facebook.com/DonaldTrump/?refid=46
2018-09-10 23:14:06 [scrapy.core.scraper] ERROR: Spider error processing <GET https://mbasic.facebook.com/DonaldTrump/?refid=46> (referer: https://mbasic.facebook.com/login/save-device/?login_source=login&refsrc=https%3A%2F%2Fmbasic.facebook.com%2F&refid=8&_rdr)

@rugantio
Copy link
Owner

Hi, welcome to the scrapy world, it's a fun journey!
First thing I notice is that right before the last ERROR line you should have the INFO like:

[fb] INFO: Parse function called on https://mbasic.facebook.com/DonaldTrump

So make sure you gave the appropriate page name. Also sometimes the bot gets stuck because Facebook fingerprints the browser and tries to block the new scrapy device. Although I've written some code in the parse_home function to bypass this behavior, sometimes it doesn't work well. A simple workaround is to log in via your traditional web browser once and everything should work fine.
Also check your mailbox, sometimes fb sends you an email saying that an unknown device has been trying to access without permission.

erba994 pushed a commit to erba994/fbcrawl that referenced this issue Oct 12, 2021
likers - press the more button and collect all reactions (currently i…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants