New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
single-node xgboost-ray is 3.5x slower than single-node vanilla xgboost #230
Comments
Just to be sure, a few questions:
It would be great if you could post (parts of) your training script so we can look into how to optimize the data loading part of training. Just as a test for your next run, can you set the |
Quick update, I've ran a benchmark on a single node 16 CPU instance with 10GB data and can't find any differences in training time with a single worker. Vanilla xgboost
XGBoost-Ray
Note that this is with local data. I'll benchmark with two nodes next (possibly only next week though) |
I've walked through it with @faaany and it seems that for their use-case, running 5 actors with 30 CPUs on a single node gives major speedups over 1 actor with 154 CPUs, though still not as fast as vanilla XGBoost. For 2 nodes, using 9 actors with 32 CPUs per actor is around 15% faster than vanilla XGBoost. It seems that there is an upper celling of CPUs per actor and adding more beyond that gives diminishing returns compared to adding more actors. It's still not clear why vanilla XGBoost is so fast, though. @faaany, please confirm when you have a moment :) |
@Yard1 is correct. Regarding @krfricke 's questions at the beginning, here are the answers:
Do you mean the |
@xwjiang2010 my code is actually very simple, not that different from the xgboost-ray example code. I agree with @Yard1 's statement that there might be an upper ceiling of CPUs per actor. For a machine with 160 CPUs, it is not a good idea to set the number of actors to 1 and the number of CPUs close to 160. On the other hand, the hardware I am using might not be that typical for other xgboost-ray users. So you may close this issue if you want. |
As stated above, I need to tune the number of actors and number of cpus per actor to achieve better runtime performance with vanilla xgboost for my machine with 160 cores. |
Hi guys, I am trying to improve my xgboost training with xgboost-ray due to its amazing features, like multi-node training, distributed data loading etc. However, my experiment shows that training with xgboost is 3.5 faster than xgboost-ray on single node.
My configuration is as follows:
XGBoost-Ray Version: 0.1.10
XGBoost Version: 1.6.1
ray: 1.13.0
Training Data size: 18GB, with 51,995,904 rows and 174 columns.
Cluster Machine: 160 Cores and 500 RAM
Ray Config: 1 actor and 154 cpus_per_actor
Ray is initialized with ray.init() and the ray cluster is set up manually and the nodes are connected via Docker swarm (which might not be relevant for this issue)
Please let me know if there is more info needed from your side. Thanks a lot!
The text was updated successfully, but these errors were encountered: