Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature weights support #265

Merged
merged 9 commits into from
Feb 7, 2023

Conversation

mozjay0619
Copy link
Contributor

@mozjay0619 mozjay0619 commented Feb 4, 2023

This PR adds support for feature_weights parameter, which is xgb.DMatrix parameter since Xgboost version 1.3.0. This PR does not add this support for RayDeviceQuantileDMatrix class.

Copy link
Member

@Yard1 Yard1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Can you run format.sh in the root folder to make sure the code is properly formatted?

setup.py Outdated
@@ -3,7 +3,7 @@
setup(
name="xgboost_ray",
packages=find_packages(where=".", include="xgboost_ray*"),
version="0.1.14",
version="0.1.15",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to change the version in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, thanks for the comment. Will do!

@Yard1 Yard1 self-assigned this Feb 5, 2023
@mozjay0619 mozjay0619 requested a review from Yard1 February 5, 2023 06:53
@mozjay0619
Copy link
Contributor Author

@Yard1 I'm not sure how to proceed with the latest pytest failure. May be this is a temporary issue on Azure repo side? The error message says Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?, but I am not sure how to do this. Could you please advise? Thanks!

@Yard1
Copy link
Member

Yard1 commented Feb 6, 2023

@mozjay0619 I just reran the job and it passed that point. It looks like a random failure not related to your changes.

@Yard1
Copy link
Member

Yard1 commented Feb 6, 2023

hmm @mozjay0619 we seem to have some actual errors, can you check?

@mozjay0619
Copy link
Contributor Author

mozjay0619 commented Feb 6, 2023

@Yard1 Sorry, let me explain the nature of the error since we've had this multiple times. The feature_weights leverages the colsample feature, so it is inherently stochastic. The weights are configured to be monotonically increasing with the last (9th) feature having the biggest contribution. But due to the stochasticity, the second to last feature sometimes overpowers the last feature. I increased the boosting rounds for convergence and lowered the colsample rates as well. And this time, I tested it against 100 random seed values. I can take more drastic measures but I don't want to alter the code too much from the original demo code provided by xgboost official doc... But it is working properly since the feature with 0 weight is not being selected at all. If it fails again this time, I will make the weight distribution extremely skewed.

@Yard1
Copy link
Member

Yard1 commented Feb 6, 2023

@mozjay0619 can we also set the seed as a parameter for xgboost? The numpy seed will not be carried over to the xgboost-ray Actors carrying out the training.

@mozjay0619
Copy link
Contributor Author

mozjay0619 commented Feb 6, 2023

@Yard1 Yeah, I had entertained that exact idea. But I couldn't find an easy way to do it since the xgboost.train() method does not have seed parameter, and it relies on numpy random state. I suspect this is why the demo code sets numpy random state. One hacky way might be to create a seed parameter to train method in xgboost_ray and use that number to set the local seed for each Actor? But I'm not sure what this would actually do in the context of distributed xgboost training...

The current tests all seem to have passed the feature weight part. I would suggest adding the random state support as a separate PR. But please let me know if you feel very strongly about this, I can look into it more :)

@Yard1
Copy link
Member

Yard1 commented Feb 7, 2023

I believe it has a seed learning parameter but if the test passes here then I am fine with that. Thanks for iterating on this!

@Yard1 Yard1 merged commit dcdc4b7 into ray-project:master Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants