sub-features for node level #2330

guolinke · 2019-08-17T06:19:56Z

implement #2315

StrikerRUS

Please check a few comments about the param description.

include/LightGBM/config.h

guolinke · 2019-08-23T06:31:02Z

@BlindApe could you try this PR could provide some feedbacks?

BlindApe · 2019-08-30T12:03:51Z

Thank you @guolinke for this feature, and sorry for delay. I was on holidays and disconnected.
I've tried to use this and seems perhaps I'm doing something badly. The results are the same:

rm(list = ls(all = TRUE))
require(lightgbm)
require(data.table)
data(agaricus.train, package = "lightgbm")
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
data(agaricus.test, package = "lightgbm")
test <- agaricus.test
dtest <- lgb.Dataset.create.valid(dtrain, test$data, label = test$label)
params <- list(objective = "regression", metric = "l2")
valids <- list(test = dtest)
model <- lgb.train(params, dtrain, 5, valids, min_data = 1, learning_rate = 0.05, feature_fraction_bynode = FALSE)
[LightGBM] [Info] Total Bins 369
[LightGBM] [Info] Number of data: 6513, number of used features: 116
[LightGBM] [Info] Start training from score 0.482113
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[1]: test's l2:0.225323
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[2]: test's l2:0.203354
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[3]: test's l2:0.183527
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[4]: test's l2:0.165633
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[5]: test's l2:0.149484

model <- lgb.train(params, dtrain, 5, valids, min_data = 1, learning_rate = 0.05, feature_fraction_bynode = TRUE)
[LightGBM] [Info] Total Bins 369
[LightGBM] [Info] Number of data: 6513, number of used features: 116
[LightGBM] [Info] Start training from score 0.482113
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[1]: test's l2:0.225323
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[2]: test's l2:0.203354
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[3]: test's l2:0.183527
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[4]: test's l2:0.165633
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[5]: test's l2:0.149484

guolinke · 2019-08-30T12:18:31Z

@BlindApe it seems you forgot to set feature_fraction .

refer to test here:
https://github.com/microsoft/LightGBM/pull/2330/files#diff-904f28e01be14f69c39db2a8ba2474a9R1594-R1614

BlindApe · 2019-08-30T12:29:02Z

@guolinke

Perhaps I don't understand the relation between both parameters now.
Is feature_fraction_bynode a choice between to do the sampling at tree level or node level?

I've tested with feature_fraction 0.2 and 0.4 and results when feature_fraction_bynode = TRUE are the same. When feature_fraction_bynode = FALSE are differents.

guolinke · 2019-08-30T13:04:37Z

@BlindApe yeah, feature_fraction_bynode is a flag to open subcol by node.
feature_fraction is the used ratio of features.

I add the test case, to test the different feature fraction when enabling subcol_by_node.

guolinke · 2019-08-30T16:04:12Z

@BlindApe the test passed. So it should be different when with different feature_fraction.

tests/python_package_test/test_engine.py

BlindApe · 2019-09-02T15:30:04Z

OK, I understand now.
I've tested the functionality and works.
Thank you!

rm(list = ls(all = TRUE))
require(lightgbm)
require(data.table)
data(agaricus.train, package = "lightgbm")
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
data(agaricus.test, package = "lightgbm")
test <- agaricus.test
dtest <- lgb.Dataset(test$data, label = test$label)
valids <- list(train = dtrain, test = dtest)
params <- list(objective = "binary", feature_fraction = 0.3, learning_rate = 0.05, feature_fraction_bynode = FALSE)
gbm1 <- lgb.train(params = params, data = dtrain, num_round = 5, valids = valids)
[LightGBM] [Warning] Starting from the 2.1.2 version, default value for the "boost_from_average" parameter in "binary" objective is true.
This may cause significantly different results comparing to the previous versions of LightGBM.
Try to set boost_from_average=false, if your old models produce bad results
[LightGBM] [Info] Number of positive: 3140, number of negative: 3373
[LightGBM] [Info] Total Bins 342
[LightGBM] [Info] Number of data: 6513, number of used features: 107
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.482113 -> initscore=-0.071580
[LightGBM] [Info] Start training from score -0.071580
[1]: train's binary_logloss:0.646381 test's binary_logloss:0.646845
[2]: train's binary_logloss:0.602521 test's binary_logloss:0.603166
[3]: train's binary_logloss:0.563393 test's binary_logloss:0.564101
[4]: train's binary_logloss:0.526795 test's binary_logloss:0.527475
[5]: train's binary_logloss:0.493253 test's binary_logloss:0.493936

params <- list(objective = "binary", feature_fraction = 0.3, learning_rate = 0.05, feature_fraction_bynode = TRUE)
gbm2 <- lgb.train(params = params, data = dtrain, num_round = 5, valids = valids)
[LightGBM] [Warning] Starting from the 2.1.2 version, default value for the "boost_from_average" parameter in "binary" objective is true.
This may cause significantly different results comparing to the previous versions of LightGBM.
Try to set boost_from_average=false, if your old models produce bad results
[LightGBM] [Info] Number of positive: 3140, number of negative: 3373
[LightGBM] [Info] Total Bins 342
[LightGBM] [Info] Number of data: 6513, number of used features: 107
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.482113 -> initscore=-0.071580
[LightGBM] [Info] Start training from score -0.071580
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[1]: train's binary_logloss:0.64433 test's binary_logloss:0.644204
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[2]: train's binary_logloss:0.600371 test's binary_logloss:0.600178
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[3]: train's binary_logloss:0.560376 test's binary_logloss:0.560181
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[4]: train's binary_logloss:0.523975 test's binary_logloss:0.523874
[5]: train's binary_logloss:0.490471 test's binary_logloss:0.490451

guolinke added 2 commits August 17, 2019 13:53

add parameter

da381b9

implement

3d50b41

guolinke requested review from jameslamb, Laurae2 and StrikerRUS as code owners August 17, 2019 06:19

guolinke added 2 commits August 18, 2019 00:20

fix bug

eb51e6f

fix bug

3fae909

StrikerRUS reviewed Aug 17, 2019

View reviewed changes

include/LightGBM/config.h Outdated Show resolved Hide resolved

include/LightGBM/config.h Outdated Show resolved Hide resolved

include/LightGBM/config.h Outdated Show resolved Hide resolved

fix according comment

71e4c78

guolinke added 2 commits August 30, 2019 10:02

Merge branch 'master' into subcol-node

f3aa05a

add test

7d54e00

Update test_engine.py

c5008cc

Update test_engine.py

5f36720

StrikerRUS reviewed Aug 30, 2019

View reviewed changes

tests/python_package_test/test_engine.py Outdated Show resolved Hide resolved

Update test_engine.py

be9c12f

guolinke merged commit bbbad73 into master Sep 3, 2019

StrikerRUS deleted the subcol-node branch September 3, 2019 11:06

StrikerRUS mentioned this pull request Sep 3, 2019

Feature request: colsample by node #2315

Closed

lock bot locked as resolved and limited conversation to collaborators Mar 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sub-features for node level #2330

sub-features for node level #2330

guolinke commented Aug 17, 2019

StrikerRUS left a comment

guolinke commented Aug 23, 2019

BlindApe commented Aug 30, 2019 •

edited

guolinke commented Aug 30, 2019 •

edited

BlindApe commented Aug 30, 2019 •

edited

guolinke commented Aug 30, 2019

guolinke commented Aug 30, 2019

BlindApe commented Sep 2, 2019 •

edited

sub-features for node level #2330

sub-features for node level #2330

Conversation

guolinke commented Aug 17, 2019

StrikerRUS left a comment

Choose a reason for hiding this comment

guolinke commented Aug 23, 2019

BlindApe commented Aug 30, 2019 • edited

guolinke commented Aug 30, 2019 • edited

BlindApe commented Aug 30, 2019 • edited

guolinke commented Aug 30, 2019

guolinke commented Aug 30, 2019

BlindApe commented Sep 2, 2019 • edited

BlindApe commented Aug 30, 2019 •

edited

guolinke commented Aug 30, 2019 •

edited

BlindApe commented Aug 30, 2019 •

edited

BlindApe commented Sep 2, 2019 •

edited