You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am new to scrapy and I have tried your codes.
I tried to scrap Donald Trump page
I have this being displayed:
[scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
I can't figure out where actually the problem is.
Hi, welcome to the scrapy world, it's a fun journey!
First thing I notice is that right before the last ERROR line you should have the INFO like:
[fb] INFO: Parse function called on https://mbasic.facebook.com/DonaldTrump
So make sure you gave the appropriate page name. Also sometimes the bot gets stuck because Facebook fingerprints the browser and tries to block the new scrapy device. Although I've written some code in the parse_home function to bypass this behavior, sometimes it doesn't work well. A simple workaround is to log in via your traditional web browser once and everything should work fine.
Also check your mailbox, sometimes fb sends you an email saying that an unknown device has been trying to access without permission.
Hello,
I am new to scrapy and I have tried your codes.
I tried to scrap Donald Trump page
I have this being displayed:
[scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
I can't figure out where actually the problem is.
Please find below the entire message being output:
2018-09-10 23:14:01 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: fbcrawl)
2018-09-10 23:14:01 [scrapy.utils.log] INFO: Overridden settings: {'BOT_NAME': 'fbcrawl', 'FEED_EXPORT_ENCODING': 'utf-8', 'FEED_EXPORT_FIELDS': ['source', 'date', 'text', 'reactions', 'likes', 'ahah', 'love', 'wow', 'sigh', 'grrr', 'comments', 'url'], 'LOG_LEVEL': 'INFO', 'NEWSPIDER_MODULE': 'fbcrawl.spiders', 'SPIDER_MODULES': ['fbcrawl.spiders']}
2018-09-10 23:14:01 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.logstats.LogStats']
2018-09-10 23:14:02 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-09-10 23:14:02 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-09-10 23:14:02 [scrapy.middleware] INFO: Enabled item pipelines:
['fbcrawl.pipelines.FbcrawlPipeline']
2018-09-10 23:14:02 [scrapy.core.engine] INFO: Spider opened
2018-09-10 23:14:02 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-09-10 23:14:05 [fb] INFO: Parse function called on https://mbasic.facebook.com/DonaldTrump/?refid=46
2018-09-10 23:14:06 [scrapy.core.scraper] ERROR: Spider error processing <GET https://mbasic.facebook.com/DonaldTrump/?refid=46> (referer: https://mbasic.facebook.com/login/save-device/?login_source=login&refsrc=https%3A%2F%2Fmbasic.facebook.com%2F&refid=8&_rdr)
The text was updated successfully, but these errors were encountered: