We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
之前海狗的做法是,所有的分区都并发去请求每个shards,在机器资源有限的情况下 如果分区数量过多,会产生很多次http请求,然后merger server的压力过大。 故一直以来在adhoc项目上,海狗单次扫描的数据量限制在10亿,但这显然不能满足有些需求 ,故改进之。
当前的做法是分多次提交,每次只提交固定的分区数量(比如说只提交4个分区),每个shard计算完毕后,将数据dump到hdfs中 最终提交一个merger的操作(并发数量取决于hash的数量),将所有dump到hdfs中的数据,进行merger
The text was updated successfully, but these errors were encountered:
已经实现
Sorry, something went wrong.
No branches or pull requests
之前海狗的做法是,所有的分区都并发去请求每个shards,在机器资源有限的情况下
如果分区数量过多,会产生很多次http请求,然后merger server的压力过大。
故一直以来在adhoc项目上,海狗单次扫描的数据量限制在10亿,但这显然不能满足有些需求
,故改进之。
当前的做法是分多次提交,每次只提交固定的分区数量(比如说只提交4个分区),每个shard计算完毕后,将数据dump到hdfs中
最终提交一个merger的操作(并发数量取决于hash的数量),将所有dump到hdfs中的数据,进行merger
The text was updated successfully, but these errors were encountered: