獲取不到微博內容 #6

loksum · 2018-10-03T06:13:55Z

你好，我是新手想請教一下。我用了你的例子來嘗試一下，可是為什麼我拿不到微博信息呢？
萬分感激！

2018-10-03 02:05:33 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 2.7.10 (default, Oct  6 2017, 22:29:07) - [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.31)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i  14 Aug 2018), cryptography 2.3.1, Platform Darwin-17.7.0-x86_64-i386-64bit
2018-10-03 02:05:33 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'sina.spiders', 'SPIDER_MODULES': ['sina.spiders'], 'DOWNLOAD_DELAY': 3, 'BOT_NAME': 'sina'}
2018-10-03 02:05:33 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.corestats.CoreStats']
2018-10-03 02:05:33 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-10-03 02:05:33 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-10-03 02:05:33 [scrapy.middleware] INFO: Enabled item pipelines:
['sina.pipelines.MongoDBPipeline']
2018-10-03 02:05:33 [scrapy.core.engine] INFO: Spider opened
2018-10-03 02:05:33 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-10-03 02:05:33 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-10-03 02:05:34 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://weibo.cn/2803301701/info> (referer: None)
2018-10-03 02:05:36 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://weibo.cn/1699432410/info> (referer: None)
2018-10-03 02:05:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://weibo.cn/u/2803301701> (referer: https://weibo.cn/2803301701/info)
2018-10-03 02:05:40 [scrapy.core.scraper] DEBUG: Scraped from <200 https://weibo.cn/u/2803301701>
{'_id': '2803301701',
 'authentication': u'\u300a\u4eba\u6c11\u65e5\u62a5\u300b\u6cd5\u4eba\u5fae\u535a',
 'birthday': u'1948-06-15',
 'crawl_time': 1538546734,
 'fans_num': 72033515,
 'follows_num': 3033,
 'gender': u'\u7537',
 'nick_name': u'\u4eba\u6c11\u65e5\u62a5',
 'province': u'\u5317\u4eac',
 'tweets_num': 91312,
 'vip_level': u'6\u7ea7'}
2018-10-03 02:05:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://weibo.cn/u/1699432410> (referer: https://weibo.cn/1699432410/info)
2018-10-03 02:05:44 [scrapy.core.scraper] DEBUG: Scraped from <200 https://weibo.cn/u/1699432410>
{'_id': '1699432410',
 'authentication': u'\u65b0\u534e\u793e\u6cd5\u4eba\u5fae\u535a',
 'birthday': u'1931-11-07',
 'crawl_time': 1538546737,
 'fans_num': 42741520,
 'follows_num': 4242,
 'gender': u'\u7537',
 'nick_name': u'\u65b0\u534e\u89c6\u70b9',
 'province': u'\u5317\u4eac',
 'tweets_num': 100178,
 'vip_level': u'5\u7ea7'}
2018-10-03 02:05:48 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://weibo.cn/2803301701/profile?page=1> (referer: https://weibo.cn/u/2803301701)
2018-10-03 02:05:48 [weibo_spider] ERROR: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
2018-10-03 02:05:48 [weibo_spider] ERROR: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
2018-10-03 02:05:48 [weibo_spider] ERROR: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
2018-10-03 02:05:48 [weibo_spider] ERROR: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
2018-10-03 02:05:48 [weibo_spider] ERROR: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
2018-10-03 02:05:48 [weibo_spider] ERROR: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
2018-10-03 02:05:48 [weibo_spider] ERROR: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
2018-10-03 02:05:48 [weibo_spider] ERROR: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
2018-10-03 02:05:48 [weibo_spider] ERROR: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
2018-10-03 02:05:48 [weibo_spider] ERROR: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

The text was updated successfully, but these errors were encountered:

nghuyong · 2018-10-03T11:38:25Z

可能是xpath中文编码的问题，你可以参考这个方案，在xpath前面加一个u，https://blog.csdn.net/zcc_0015/article/details/52274996

loksum · 2018-10-04T03:27:19Z

想問一下我要加在每個xpath上嗎？現在我只能獲取information跟relationships

loksum · 2018-10-04T05:08:12Z

我是用的python2.7遇到的問題。現在轉用python3.6 就沒了這個情況，謝謝你的解答。

loksum closed this as completed Oct 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

獲取不到微博內容 #6

獲取不到微博內容 #6

loksum commented Oct 3, 2018

nghuyong commented Oct 3, 2018

loksum commented Oct 4, 2018 •

edited

loksum commented Oct 4, 2018

獲取不到微博內容 #6

獲取不到微博內容 #6

Comments

loksum commented Oct 3, 2018

nghuyong commented Oct 3, 2018

loksum commented Oct 4, 2018 • edited

loksum commented Oct 4, 2018

loksum commented Oct 4, 2018 •

edited