Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sub-features for node level #2330

Merged
merged 10 commits into from Sep 3, 2019
Merged

sub-features for node level #2330

merged 10 commits into from Sep 3, 2019

Conversation

guolinke
Copy link
Collaborator

implement #2315

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check a few comments about the param description.

include/LightGBM/config.h Outdated Show resolved Hide resolved
include/LightGBM/config.h Outdated Show resolved Hide resolved
include/LightGBM/config.h Outdated Show resolved Hide resolved
@guolinke
Copy link
Collaborator Author

@BlindApe could you try this PR could provide some feedbacks?

@BlindApe
Copy link

BlindApe commented Aug 30, 2019

Thank you @guolinke for this feature, and sorry for delay. I was on holidays and disconnected.
I've tried to use this and seems perhaps I'm doing something badly. The results are the same:

rm(list = ls(all = TRUE))
require(lightgbm)
require(data.table)
data(agaricus.train, package = "lightgbm")
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
data(agaricus.test, package = "lightgbm")
test <- agaricus.test
dtest <- lgb.Dataset.create.valid(dtrain, test$data, label = test$label)
params <- list(objective = "regression", metric = "l2")
valids <- list(test = dtest)
model <- lgb.train(params, dtrain, 5, valids, min_data = 1, learning_rate = 0.05, feature_fraction_bynode = FALSE)
[LightGBM] [Info] Total Bins 369
[LightGBM] [Info] Number of data: 6513, number of used features: 116
[LightGBM] [Info] Start training from score 0.482113
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[1]: test's l2:0.225323
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[2]: test's l2:0.203354
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[3]: test's l2:0.183527
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[4]: test's l2:0.165633
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[5]: test's l2:0.149484

model <- lgb.train(params, dtrain, 5, valids, min_data = 1, learning_rate = 0.05, feature_fraction_bynode = TRUE)
[LightGBM] [Info] Total Bins 369
[LightGBM] [Info] Number of data: 6513, number of used features: 116
[LightGBM] [Info] Start training from score 0.482113
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[1]: test's l2:0.225323
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[2]: test's l2:0.203354
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[3]: test's l2:0.183527
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[4]: test's l2:0.165633
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[5]: test's l2:0.149484

@guolinke
Copy link
Collaborator Author

guolinke commented Aug 30, 2019

@BlindApe it seems you forgot to set feature_fraction .

refer to test here:
https://github.com/microsoft/LightGBM/pull/2330/files#diff-904f28e01be14f69c39db2a8ba2474a9R1594-R1614

@BlindApe
Copy link

BlindApe commented Aug 30, 2019

@guolinke

Perhaps I don't understand the relation between both parameters now.
Is feature_fraction_bynode a choice between to do the sampling at tree level or node level?

I've tested with feature_fraction 0.2 and 0.4 and results when feature_fraction_bynode = TRUE are the same. When feature_fraction_bynode = FALSE are differents.

@guolinke
Copy link
Collaborator Author

@BlindApe yeah, feature_fraction_bynode is a flag to open subcol by node.
feature_fraction is the used ratio of features.

I add the test case, to test the different feature fraction when enabling subcol_by_node.

@guolinke
Copy link
Collaborator Author

@BlindApe the test passed. So it should be different when with different feature_fraction.

@BlindApe
Copy link

BlindApe commented Sep 2, 2019

OK, I understand now.
I've tested the functionality and works.
Thank you!

rm(list = ls(all = TRUE))
require(lightgbm)
require(data.table)
data(agaricus.train, package = "lightgbm")
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
data(agaricus.test, package = "lightgbm")
test <- agaricus.test
dtest <- lgb.Dataset(test$data, label = test$label)
valids <- list(train = dtrain, test = dtest)
params <- list(objective = "binary", feature_fraction = 0.3, learning_rate = 0.05, feature_fraction_bynode = FALSE)
gbm1 <- lgb.train(params = params, data = dtrain, num_round = 5, valids = valids)
[LightGBM] [Warning] Starting from the 2.1.2 version, default value for the "boost_from_average" parameter in "binary" objective is true.
This may cause significantly different results comparing to the previous versions of LightGBM.
Try to set boost_from_average=false, if your old models produce bad results
[LightGBM] [Info] Number of positive: 3140, number of negative: 3373
[LightGBM] [Info] Total Bins 342
[LightGBM] [Info] Number of data: 6513, number of used features: 107
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.482113 -> initscore=-0.071580
[LightGBM] [Info] Start training from score -0.071580
[1]: train's binary_logloss:0.646381 test's binary_logloss:0.646845
[2]: train's binary_logloss:0.602521 test's binary_logloss:0.603166
[3]: train's binary_logloss:0.563393 test's binary_logloss:0.564101
[4]: train's binary_logloss:0.526795 test's binary_logloss:0.527475
[5]: train's binary_logloss:0.493253 test's binary_logloss:0.493936

params <- list(objective = "binary", feature_fraction = 0.3, learning_rate = 0.05, feature_fraction_bynode = TRUE)
gbm2 <- lgb.train(params = params, data = dtrain, num_round = 5, valids = valids)
[LightGBM] [Warning] Starting from the 2.1.2 version, default value for the "boost_from_average" parameter in "binary" objective is true.
This may cause significantly different results comparing to the previous versions of LightGBM.
Try to set boost_from_average=false, if your old models produce bad results
[LightGBM] [Info] Number of positive: 3140, number of negative: 3373
[LightGBM] [Info] Total Bins 342
[LightGBM] [Info] Number of data: 6513, number of used features: 107
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.482113 -> initscore=-0.071580
[LightGBM] [Info] Start training from score -0.071580
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[1]: train's binary_logloss:0.64433 test's binary_logloss:0.644204
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[2]: train's binary_logloss:0.600371 test's binary_logloss:0.600178
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[3]: train's binary_logloss:0.560376 test's binary_logloss:0.560181
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[4]: train's binary_logloss:0.523975 test's binary_logloss:0.523874
[5]: train's binary_logloss:0.490471 test's binary_logloss:0.490451

@guolinke guolinke merged commit bbbad73 into master Sep 3, 2019
@StrikerRUS StrikerRUS deleted the subcol-node branch September 3, 2019 11:06
@lock lock bot locked as resolved and limited conversation to collaborators Mar 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants