-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark for src-d/Engine #33
Comments
Goal is to determine expectations on the performance - how much is it possible to do in 1h? Most common use-case:
|
Using ~330 siva files from the 10k most relevant github repositories, after ~30min of processing I got a out of space error:
Before stop the container:
After stop the container:
We had ~263GB on suffle operation! |
Executed code: https://github.com/ajnavarro/spark-api/blob/b410479a55d1d47f142a69150269942d8826efe0/examples/Basic%2BExample.ipynb List of siva files used:
|
Thank you for detained report on experiment! Very interesting. Do you think it might be worth trying the same, but without .siva copy/unpack in spirit of #36 and check if that makes difference in performance? |
We should compare its performance vs Berserker baseline. I'm gathering more info with @eiso and @vmarkovtsev on what we should be benchmarking next. |
Here is 100 siva files that we can test on https://drive.google.com/open?id=0BxNVgwtOUkMUaG5wbmpnRklmMEk |
Superseded by https://github.com/src-d/backlog/issues/1090. |
@bzz please close it. |
Try using it \w 100, 1000 repos for our usecases
The text was updated successfully, but these errors were encountered: