Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

自荐分布式高可用代理爬虫 HAipproxy #23

Closed
ResolveWang opened this issue Mar 5, 2018 · 0 comments
Closed

自荐分布式高可用代理爬虫 HAipproxy #23

ResolveWang opened this issue Mar 5, 2018 · 0 comments

Comments

@ResolveWang
Copy link

感谢repo主把weibospider收录进去了。看了这么多 awesome spider,我觉得还差一款爬虫的基础支撑程序,所以自荐HAipproxy

Haipproxy是一款高可用低时延的分布式代理程序,高可用包含两个方面:

  • 代理资源的高可用(通过IP验证和筛选策略实现)
  • 各个组件的高可用(通过分布式来实现)

HAipproxy目前测试的速度可以达到 1w+ requests/hour。下面是以知乎为目标网站,单机测试结果

请求量 时间 耗时 IP负载策略 客户端
0 2018/03/03 22:03 0 greedy py_cli
10000 2018/03/03 11:03 1 hour greedy py_cli
20000 2018/03/04 00:08 2 hours greedy py_cli
30000 2018/03/04 01:02 3 hours greedy py_cli
40000 2018/03/04 02:15 4 hours greedy py_cli
50000 2018/03/04 03:03 5 hours greedy py_cli
60000 2018/03/04 05:18 7 hours greedy py_cli
70000 2018/03/04 07:11 9 hours greedy py_cli
80000 2018/03/04 08:43 11 hours greedy py_cli
Repository owner deleted a comment Mar 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant