Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问有办法爬取2级评论吗? #302

Closed
wyycommu opened this issue Sep 30, 2023 · 6 comments
Closed

请问有办法爬取2级评论吗? #302

wyycommu opened this issue Sep 30, 2023 · 6 comments

Comments

@wyycommu
Copy link

首先感谢作者的付出。

实际使用了comment功能,发现无法爬取二级评论。

我只懂一些粗浅的原理,如果二级评论比较少的话,data['comment']会直接显示。如下图:
image

但是如果二级评论比较多的话,它会一折叠起来,我就拿它没办法了。如下图:

image

此外,向您提一个bug:部分官方账号是不显示ip地址的,所以item['ip_location'] = data['source']这里如果没有获得信息的话,这条评论就会被跳过。这可能是部分评论抓取不到的原因。
关键字段加一个try-except或者if "" in data·,或许可以改善。

附上述问题相关的主要代码:

    def parse_comment(data):
        """
        解析comment
        """
        item = dict()
        item['created_at'] = parse_time(data['created_at'])
        item['_id'] = data['id']
        item['like_counts'] = data['like_counts']
        item['ip_location'] = data['source']
        item['content'] = data['text_raw']
        item['comment_user'] = parse_user_info(data['user'])
        return item
nghuyong added a commit that referenced this issue Dec 12, 2023
@nghuyong
Copy link
Owner

ip_location 已经修复了

nghuyong added a commit that referenced this issue Dec 12, 2023
@nghuyong
Copy link
Owner

已经支持了,默认打开的

二级评论会多一个字段reply_comment

@PengxiangZhou98Whu
Copy link

请问您这边解决问二级评论的问题了吗?我这边能抓到一些reply_comment, 但是不全。多个二级评论或者是只有一个二级评论都存在无法抓取的情况

@nghuyong
Copy link
Owner

方便提供一个具体的微博id(进行debug)

@PengxiangZhou98Whu
Copy link

方便提供一个具体的微博id(进行debug)

感谢您的回复!
https://weibo.com/3235040884/O1SHpzWoB?pagetype=profilefeed 这一条二级评论比较少,可以先看看这个。
另外这一条也是https://weibo.com/1749127163/O4cpumkS8,比较多的二级评论

@PengxiangZhou98Whu
Copy link

请问下抓取评论的时候,例子里的"_id": 4826279188108038指的是什么以及这边是否会返回评论对应的微博的id/如何查看微博的id。辛苦帮忙回答下~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants