Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于爬取大量评论中断的问题 #18

Closed
boan-anbo opened this issue Jun 19, 2020 · 2 comments
Closed

关于爬取大量评论中断的问题 #18

boan-anbo opened this issue Jun 19, 2020 · 2 comments

Comments

@boan-anbo
Copy link

repo主你好,非常强大的脚本,感谢!有个小问题,就是当我爬取一条评论很多的微博的时候,经常会在中间中断,可能是因为网络的原因,经常爬到几百条就会断开,显示traceback卡在
KeyError: 'data'
这一句。但我很确定后面还是有数据的,如果重新执行有时候能爬几百条,有时候能爬几千条,但断了以后只能从0开始爬。
所以想问,如果想遇到这种情况,继续重试,不break,需要改哪几句?

@inspurer
Copy link
Member

我觉得应该是速度快了,有数据的话没有 data 字段,就是服务器识别出来爬虫了,没给数据了,可以尝试每一页 sleep 下

@boan-anbo
Copy link
Author

谢谢回复!我后来的方法是没有加sleep时间,但是有把max_id写到另一个文件里,这样就算断了也可以用max_id继续,不用每次都从头。另外其他可能导致断开的原因是一些符号,例如emoji,可能造成输入错误。但都是小问题,script非常好用。感谢。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants