We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
我现在打算抓取每个微博的评论,所以需要改动已有代码。我有一个疑问:在contrib/sina/__init__.py中定义了如下一组url模式,例如微博的模式: Url(r'http://weibo.com/aj/mblog/mbloglist.*', 'micro_blog', MicroBlogParser), 访问http://weibo.com/aj/mblog/mbloglist.*请问你这个页面模式是怎么得到的呢?
contrib/sina/__init__.py
Url(r'http://weibo.com/aj/mblog/mbloglist.*', 'micro_blog', MicroBlogParser),
http://weibo.com/aj/mblog/mbloglist.*
我熟悉的方式是:先访问某个微博页面,如:http://weibo.com/p/1006061774908135/home?from=page_100606&mod=TAB#place, 然后观察页面的结构采用bs4或者lxml进行抽取。
烦请指点!
The text was updated successfully, but these errors were encountered:
cola支持抓取微博的评论,配置文件里有个comment: no,改成yes即可。
p.s. 单机版本的话最好是使用develop分支代码。
Sorry, something went wrong.
因为我可能需要自定义抓取内容,例如微博内容信息等,所以还是烦请告知你是如何得到类似http://weibo.com/aj/mblog/mbloglist.*的模式的?
还是分析网页的ajax请求应该就能得到这些url了
No branches or pull requests
我现在打算抓取每个微博的评论,所以需要改动已有代码。我有一个疑问:在
contrib/sina/__init__.py
中定义了如下一组url模式,例如微博的模式:Url(r'http://weibo.com/aj/mblog/mbloglist.*', 'micro_blog', MicroBlogParser),
访问
http://weibo.com/aj/mblog/mbloglist.*
请问你这个页面模式是怎么得到的呢?我熟悉的方式是:先访问某个微博页面,如:http://weibo.com/p/1006061774908135/home?from=page_100606&mod=TAB#place, 然后观察页面的结构采用bs4或者lxml进行抽取。
烦请指点!
The text was updated successfully, but these errors were encountered: