-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
电影页面目前没法以非登陆状态抓取 #11
Comments
@murdercdh 关于什么是限制级豆瓣并没有公开相关标准,这个登录提示不登录的无法打开的就是了。 |
目前测试在200-300的连续请求以后,豆瓣的安全防火墙会无脑ban掉抓取ip,很弱智,就连登陆完毕的网页也一起banned掉了,我要迁移个数据感觉就很麻烦. 我找到一个他们自己爬了一些数据下来的query接口,但是却没有提供认证的姿势,也是郁闷了 |
豆瓣导出CSV时可以指定导出的开始日期,少量多次的分批导出也许可以。 |
已经导出完毕,现在我改了下程序,没有用随机代理和随机agent了(douban反扒,对于已经登陆用户,直接封24小时ip, 挺无聊的),直接每天跑一次,直到所有我自己片单都跑完,已经迁移到letterbox然后import到imdb了。 卸载douban app完毕 |
难道要伪造ua和cookies才行么?
The text was updated successfully, but these errors were encountered: