Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

獲取不到微博內容 #6

Closed
loksum opened this issue Oct 3, 2018 · 3 comments
Closed

獲取不到微博內容 #6

loksum opened this issue Oct 3, 2018 · 3 comments

Comments

@loksum
Copy link

loksum commented Oct 3, 2018

你好,我是新手想請教一下。我用了你的例子來嘗試一下,可是為什麼我拿不到微博信息呢?
萬分感激!

2018-10-03 02:05:33 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 2.7.10 (default, Oct  6 2017, 22:29:07) - [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.31)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i  14 Aug 2018), cryptography 2.3.1, Platform Darwin-17.7.0-x86_64-i386-64bit
2018-10-03 02:05:33 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'sina.spiders', 'SPIDER_MODULES': ['sina.spiders'], 'DOWNLOAD_DELAY': 3, 'BOT_NAME': 'sina'}
2018-10-03 02:05:33 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.corestats.CoreStats']
2018-10-03 02:05:33 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-10-03 02:05:33 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-10-03 02:05:33 [scrapy.middleware] INFO: Enabled item pipelines:
['sina.pipelines.MongoDBPipeline']
2018-10-03 02:05:33 [scrapy.core.engine] INFO: Spider opened
2018-10-03 02:05:33 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-10-03 02:05:33 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-10-03 02:05:34 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://weibo.cn/2803301701/info> (referer: None)
2018-10-03 02:05:36 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://weibo.cn/1699432410/info> (referer: None)
2018-10-03 02:05:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://weibo.cn/u/2803301701> (referer: https://weibo.cn/2803301701/info)
2018-10-03 02:05:40 [scrapy.core.scraper] DEBUG: Scraped from <200 https://weibo.cn/u/2803301701>
{'_id': '2803301701',
 'authentication': u'\u300a\u4eba\u6c11\u65e5\u62a5\u300b\u6cd5\u4eba\u5fae\u535a',
 'birthday': u'1948-06-15',
 'crawl_time': 1538546734,
 'fans_num': 72033515,
 'follows_num': 3033,
 'gender': u'\u7537',
 'nick_name': u'\u4eba\u6c11\u65e5\u62a5',
 'province': u'\u5317\u4eac',
 'tweets_num': 91312,
 'vip_level': u'6\u7ea7'}
2018-10-03 02:05:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://weibo.cn/u/1699432410> (referer: https://weibo.cn/1699432410/info)
2018-10-03 02:05:44 [scrapy.core.scraper] DEBUG: Scraped from <200 https://weibo.cn/u/1699432410>
{'_id': '1699432410',
 'authentication': u'\u65b0\u534e\u793e\u6cd5\u4eba\u5fae\u535a',
 'birthday': u'1931-11-07',
 'crawl_time': 1538546737,
 'fans_num': 42741520,
 'follows_num': 4242,
 'gender': u'\u7537',
 'nick_name': u'\u65b0\u534e\u89c6\u70b9',
 'province': u'\u5317\u4eac',
 'tweets_num': 100178,
 'vip_level': u'5\u7ea7'}
2018-10-03 02:05:48 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://weibo.cn/2803301701/profile?page=1> (referer: https://weibo.cn/u/2803301701)
2018-10-03 02:05:48 [weibo_spider] ERROR: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
2018-10-03 02:05:48 [weibo_spider] ERROR: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
2018-10-03 02:05:48 [weibo_spider] ERROR: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
2018-10-03 02:05:48 [weibo_spider] ERROR: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
2018-10-03 02:05:48 [weibo_spider] ERROR: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
2018-10-03 02:05:48 [weibo_spider] ERROR: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
2018-10-03 02:05:48 [weibo_spider] ERROR: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
2018-10-03 02:05:48 [weibo_spider] ERROR: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
2018-10-03 02:05:48 [weibo_spider] ERROR: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
2018-10-03 02:05:48 [weibo_spider] ERROR: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
@nghuyong
Copy link
Owner

nghuyong commented Oct 3, 2018

可能是xpath中文编码的问题,你可以参考这个方案,在xpath前面加一个u,https://blog.csdn.net/zcc_0015/article/details/52274996

@loksum
Copy link
Author

loksum commented Oct 4, 2018

想問一下我要加在每個xpath上嗎?現在我只能獲取information跟relationships

@loksum
Copy link
Author

loksum commented Oct 4, 2018

我是用的python2.7遇到的問題。 現在轉用python3.6 就沒了這個情況, 謝謝你的解答。

@loksum loksum closed this as completed Oct 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants