-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add feature weights support #265
Add feature weights support #265
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Can you run format.sh
in the root folder to make sure the code is properly formatted?
setup.py
Outdated
@@ -3,7 +3,7 @@ | |||
setup( | |||
name="xgboost_ray", | |||
packages=find_packages(where=".", include="xgboost_ray*"), | |||
version="0.1.14", | |||
version="0.1.15", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to change the version in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, thanks for the comment. Will do!
@Yard1 I'm not sure how to proceed with the latest pytest failure. May be this is a temporary issue on Azure repo side? The error message says |
@mozjay0619 I just reran the job and it passed that point. It looks like a random failure not related to your changes. |
hmm @mozjay0619 we seem to have some actual errors, can you check? |
@Yard1 Sorry, let me explain the nature of the error since we've had this multiple times. The feature_weights leverages the colsample feature, so it is inherently stochastic. The weights are configured to be monotonically increasing with the last (9th) feature having the biggest contribution. But due to the stochasticity, the second to last feature sometimes overpowers the last feature. I increased the boosting rounds for convergence and lowered the colsample rates as well. And this time, I tested it against 100 random seed values. I can take more drastic measures but I don't want to alter the code too much from the original demo code provided by xgboost official doc... But it is working properly since the feature with 0 weight is not being selected at all. If it fails again this time, I will make the weight distribution extremely skewed. |
@mozjay0619 can we also set the seed as a parameter for xgboost? The numpy seed will not be carried over to the xgboost-ray Actors carrying out the training. |
@Yard1 Yeah, I had entertained that exact idea. But I couldn't find an easy way to do it since the xgboost.train() method does not have seed parameter, and it relies on numpy random state. I suspect this is why the demo code sets numpy random state. One hacky way might be to create a seed parameter to The current tests all seem to have passed the feature weight part. I would suggest adding the random state support as a separate PR. But please let me know if you feel very strongly about this, I can look into it more :) |
I believe it has a |
This PR adds support for feature_weights parameter, which is xgb.DMatrix parameter since Xgboost version 1.3.0. This PR does not add this support for RayDeviceQuantileDMatrix class.