Skip to content

Optimize memory usage of bulkloader #5525

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

xiangzhao632
Copy link

@xiangzhao632 xiangzhao632 commented May 27, 2020

This PR optimize memory usage of reduce stage of bulkloader while not harming speed。
Related to #4529


This change is Reviewable

Docs Preview: Dgraph Preview

@pawanrawal
Copy link
Contributor

Hi @xiangzhao632

Thanks for the PR. Could you please merge dgraph:master into your fork so that the diff is cleaner. Also, could you share some benchmarking numbers that you were able to observe before and after this change? Things like the memory bulk loader used and the time that it takes to load your dataset.

@xiangzhao632
Copy link
Author

xiangzhao632 commented May 28, 2020

@pawanrawal Thanks very much for reply! I have merged dgraph:master into my fork. And I have finished my test. The number is :
About Data: 300million rdf (600million edge). The total memory is 192G. Before change: load time 38m01s, memory usage 67G; after: load time 14m40s, mem 18G.
I noticed that since v20.03.1, the reduce stage of bulkloader has been changed from a min-heap style to a partitionkey based method, done by f7d0371. My optimization is based on the min-heap version, so I switch back.
We have loaded about 50 billion rdf(about 83 billion edges) in 15 hours using the bulkloader on one 192G and ten 128G machines.

@xiangzhao632
Copy link
Author

xiangzhao632 commented May 28, 2020

memory usage before change(total memory is 192G)
image
and the load time
image

@xiangzhao632
Copy link
Author

memory usage after change
image
and the load time
image

@xiangzhao632 xiangzhao632 force-pushed the xiangzhao020/bulk-memory-reduce branch from 720aba0 to eb1521b Compare June 5, 2020 14:56
@awsl-dbq
Copy link

awsl-dbq commented Jun 9, 2020

cool

@xiangzhao632
Copy link
Author

xiangzhao632 commented Jun 9, 2020

Hi @pawanrawal, has anyone tested this? Could someone tell me this pr works or not?
OOM is a big issue, for two reasons:

  1. The largest memory of my machines is 192g (many other teams are 128G). Such memory can only cope with 3 billion rdf, while my team have more than 50 billion rdf. I know some teams who want to try dgraph, but due to the OOM issue, they can't init a cluster by bulkloader and they turned to liveloader.
  2. Less memory usage in the reduce stage means that --reducer can be set to a larger value, which can also increase the loading speed.
    Another issue Inconsistent bulk loader failures #5361 about this.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@mangalaman93
Copy link
Member

@xiangzhao632 Thank you for the PR. We are happy to look at the PR again if you can sign the CLA and raise it against the main branch. If you need any help, please let me know.

Copy link

github-actions bot commented Aug 3, 2024

This PR has been stale for 60 days and will be closed automatically in 7 days. Comment to keep it open.

@github-actions github-actions bot added the Stale label Aug 3, 2024
@github-actions github-actions bot closed this Aug 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

5 participants