-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU performance on different CPUs and multi-socket (NUMA) servers #13
Comments
r4 vs c5 is 20-40% slower, the ranking of libs stay the same for given data size when changing CPU that is lightgbm > xgboost > h2o > catboost (with a >b meaning a better than b) for larger data and lightgbm > xgboost > catboost > h2o for very small data
going from 1 to 2-socket CPU is:
|
RAM usage (for 10M data):
Note: this is just a very rough (and maybe even wrong) estimate (tools might reduce footprint while running if approaching RAM ceilings by using e.g. free() or gc etc; also it depends not only on the tool but also on R/data format/data reading etc. Could also limit RAM with:
h2o:
xgboost:
lightgbm:
catboost:
so RAM (low) footprint rank same as speed rank: lightgbm > xgboost > h2o > catboost (with a >b meaning a better than b) |
m5.12xlarge
Note: need to change to Dockerfile to use cores 0-23 instead of 0-15.
for large data more cores (m5) better than high frequency CPU (c5), for small data the other way around the ranking of libs stay the same for given data size when changing CPU that is lightgbm > xgboost > h2o > catboost (with a >b meaning a better than b) for larger data and lightgbm > xgboost > catboost > h2o for very small data, except for m5 lightgbm > h2o > xgboost > catboost for the largest size (h2o and xgboost swapped places) |
2 sockets ran with leaving 2 cores free for the VM host: Note: need to change to Dockerfile to use cores 0-33 instead of 0-15.
lightgbm improves a bit, but it's still worse on 2 sockets than 1 |
2020-09-09 UPDATE: xgboost/lightgbm has improved in multi-core scaling / NUMA slow-down has been mitigated: |
c5.9xlarge:
This is a newer/faster CPU vs the one used in the main benchmark
Note: need to change to Dockerfile to use cores 0-17 instead of 0-15.
2-socket instance c5.18xlarge:
Note: need to change to Dockerfile to use cores 0-35 instead of 0-15.
for comparison r4.8xlarge (main benchmark):
The text was updated successfully, but these errors were encountered: