update feature_fraction_bynode #2381

guolinke · 2019-09-05T02:20:29Z

I just notice that feature_fraction has an alias, colsample_bytree.
Therefore, use it for the colsample by node is not straight-forward.
following is the new definition of the feature_fraction_bynode:

// alias = sub_feature_bynode, colsample_bynode
// check = >0.0
// check = <=1.0
// desc = LightGBM will randomly select part of features on each tree node if feature_fraction smaller than 1.0. For example, if you set it to 0.8, LightGBM will select 80% of features at each tree node.
// desc = can be used to deal with over-fitting
// desc = Note: unlike feature_fraction, this cannot speed up training
// desc = Note: if both feature_fraction and feature_fraction_bynode are smaller than 1.0, the final fraction of each node is feature_fraction * feature_fraction_bynode
double feature_fraction_bynode = 1.0;

ping @BlindApe for the changes.

guolinke · 2019-09-05T02:21:02Z

now the behavior is the same as xgb.

StrikerRUS

As usual, minor style comments from me 😄

include/LightGBM/config.h

guolinke · 2019-09-12T09:07:23Z

@StrikerRUS can this be merged?

StrikerRUS · 2019-09-12T12:48:13Z

@guolinke

can this be merged?

Yeah, I think so. I like new way of setting feature_fraction_bynode more than previous one and find it more intuitive.

mayer79 · 2019-09-14T09:52:42Z

A great feature. As this is one of the core elements to make random forests shine at least in high-dimensional settings, I was playing with the diamonds data set in ggplot2 in R and tried to predict log(price) by a couple of features.

The results are a bit uncomfortable:

a "real" rf implementation (ranger) with feature_fraction_bynode -> test R-squared = 0.99
xgb random forest mode with and without feature_fraction_bynode -> both R-squared 0.99
same for lgb -> without feature_fraction_bynode, R-squared is 0.99, with feature_fraction_bynode only 0.91.

What could be the reason for this sudden drop in the lgb version? (Of course, we cannot directly compare the results due to different parametrizations).

library(tidyverse)
library(xgboost)
library(lightgbm)
library(ranger)

# Function to measure performance
perf <- function(y, pred) {
  res <- y - pred
  c(r2 = 1 - var(res) / var(y),
    rmse = sqrt(mean(res^2)),
    mae = mean(abs(res)))
}

#==============================
# DATA PREP
#==============================

diamonds <- diamonds %>% 
  mutate_if(is.ordered, as.numeric) %>% 
  mutate(log_price = log(price),
         log_carat = log(carat))

# Train/test split
set.seed(3928272)
.in <- sample(c(FALSE, TRUE), nrow(diamonds), replace = TRUE, p = c(0.15, 0.85))

x <- c("log_carat", "cut", "color", "clarity", "depth", "table")
y <- "log_price"

train <- list(y = diamonds[[y]][.in], 
              X = as.matrix(diamonds[.in, x]))
test <- list(y = diamonds[[y]][!.in],
             X = as.matrix(diamonds[!.in, x]))
trainDF <- diamonds[.in, c(y, x)]
testDF <- diamonds[!.in, c(y, x)]

# For XGBoost
dtrain_xgb <- xgb.DMatrix(train$X, label = train$y)
watchlist <- list(train = dtrain_xgb)

# For lgb
dtrain_lgb <- lgb.Dataset(train$X, label = train$y)

#==============================
# MODELLING
#==============================
feature_fraction <- 1/3
mtry <- trunc(length(x) * feature_fraction)


#==============================
# A "real" rf
#==============================

system.time(fit_ranger <- ranger(reformulate(x, y), 
                                 data = trainDF, 
                                 num.trees = 100, 
                                 min.node.size = 5,
                                 mtry = mtry,
                                 seed = 837363)) # 1 sec
pred <- predict(fit_ranger, testDF)$predictions
perf(test$y, pred) # 0.989 R-squared

#==============================
# xgb without feature frac
#==============================

param_xgb <- list(max_depth = 20,
                  learning_rate = 1,
                  nthread = 4,
                  objective = "reg:squarederror",
                  eval_metric = "rmse",
                  subsample = 0.63,
            #      colsample_bynode = feature_fraction,
                  lambda = 0)

system.time(fit_xgb <- xgb.train(param_xgb,
                                 dtrain_xgb,
                                 watchlist = watchlist,
                                 nrounds = 1,
                                 num_parallel_tree = 100,
                                 verbose = 0)) # 5-8 sec
pred <- predict(fit_xgb, test$X)
perf(test$y, pred) # R-squared: 0.989 with feature frac, 0.989 without


#==============================
# lgb
#==============================

param_lgb <- list(max_depth = 20,
                  learning_rate = 1,
                  boosting = "rf",
                  nthread = 4,
                  min_data_in_leaf = 5,
                  num_leaves = 1000,
                  objective = "regression",
                  bagging_freq = 1,
           #       feature_fraction_bynode = feature_fraction,
                  bagging_fraction = 0.63,
                  metric = "rmse")

system.time(fit_lgb <- lgb.train(param_lgb,
                                 dtrain_lgb,
                                 nrounds = 100,
                                 verbose = 0)) # 2 sec
pred <- predict(fit_lgb, test$X)
perf(test$y, pred) # R-squared: 0.990 without feature frac, 0.913 without

guolinke · 2019-09-14T10:07:09Z

@mayer79 did you try different seeds?

mayer79 · 2019-09-14T10:46:37Z

Not yet. With this data set, an R-squared of 0.98 is quite bad. So a value of 0.91 is extreme.

guolinke · 2019-09-14T12:15:41Z

@mayer79 could you provide the data file? csv format will be better.

mayer79 · 2019-09-14T12:19:57Z

It is shipped along with ggplot2 in R. The raw source is https://github.com/tidyverse/ggplot2/blob/master/data-raw/diamonds.csv

guolinke · 2019-09-14T15:37:38Z

@mayer79
I check your data, it seems the number of features is only 6, with fraction 0.33, it will only choose one feature for each node.
In this case, it is easy to stop the growth of tree and cause the bad result.

BTW, I change the sample rate of xgb to 0.33, its result is only 0.86:

> param_xgb <- list(max_depth = 20,
+                   learning_rate = 1,
+                   nthread = 4,
+                   objective = "reg:squarederror",
+                   eval_metric = "rmse",
+                   subsample = 0.63,
+                   colsample_bynode = 0.33,
+                   lambda = 0)
>
> system.time(fit_xgb <- xgb.train(param_xgb,
+                                  dtrain_xgb,
+                                  watchlist = watchlist,
+                                  nrounds = 1,
+                                  num_parallel_tree = 100,
+                                  verbose = 0)) # 5-8 sec
   user  system elapsed
   2.88    0.26    1.66
> pred <- predict(fit_xgb, test$X)
> perf(test$y, pred) # R-squared: 0.989 with feature frac, 0.989 without
       r2      rmse       mae
0.8687868 0.3649375 0.2843431

guolinke · 2019-09-14T15:38:56Z

@mayer79
So I think a quick improvement, is to force at least 2 features are chosen at each node. Otherwise it is like random learning, since the used feature is random.

mayer79 · 2019-09-14T15:54:00Z

@guolinke : It indeed seems like a rounding issue. I was using floating point 1/3 as rate, leading xgb to sample 6/3 = 2 and lgb to sample only one feature. I would not force to sample at least 2 features but rather keep your implementation as it is.

mayer79 · 2019-09-15T07:57:56Z

Maybe rounding up the number of sampled columns would be an idea.

update

c6461af

guolinke requested review from jameslamb, Laurae2 and StrikerRUS as code owners September 5, 2019 02:20

fix a bug

7acedf4

StrikerRUS reviewed Sep 6, 2019

View reviewed changes

include/LightGBM/config.h Outdated Show resolved Hide resolved

include/LightGBM/config.h Outdated Show resolved Hide resolved

guolinke added 2 commits September 8, 2019 11:47

Update config.h

9a5069f

Update Parameters.rst

4c7ab5f

StrikerRUS mentioned this pull request Sep 10, 2019

v2.3.0 realese #2138

Merged

StrikerRUS approved these changes Sep 12, 2019

View reviewed changes

StrikerRUS merged commit ad8e8cc into master Sep 12, 2019

StrikerRUS deleted the node-ff branch September 12, 2019 12:53

guolinke mentioned this pull request Sep 14, 2019

at least 2 features are chosen in subcolumn #2409

Merged

lock bot locked as resolved and limited conversation to collaborators Mar 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update feature_fraction_bynode #2381

update feature_fraction_bynode #2381

guolinke commented Sep 5, 2019

guolinke commented Sep 5, 2019

StrikerRUS left a comment •

edited

guolinke commented Sep 12, 2019

StrikerRUS commented Sep 12, 2019

mayer79 commented Sep 14, 2019 •

edited

guolinke commented Sep 14, 2019

mayer79 commented Sep 14, 2019

guolinke commented Sep 14, 2019

mayer79 commented Sep 14, 2019

guolinke commented Sep 14, 2019

guolinke commented Sep 14, 2019

mayer79 commented Sep 14, 2019

mayer79 commented Sep 15, 2019

update feature_fraction_bynode #2381

update feature_fraction_bynode #2381

Conversation

guolinke commented Sep 5, 2019

guolinke commented Sep 5, 2019

StrikerRUS left a comment • edited

Choose a reason for hiding this comment

guolinke commented Sep 12, 2019

StrikerRUS commented Sep 12, 2019

mayer79 commented Sep 14, 2019 • edited

guolinke commented Sep 14, 2019

mayer79 commented Sep 14, 2019

guolinke commented Sep 14, 2019

mayer79 commented Sep 14, 2019

guolinke commented Sep 14, 2019

guolinke commented Sep 14, 2019

mayer79 commented Sep 14, 2019

mayer79 commented Sep 15, 2019

StrikerRUS left a comment •

edited

mayer79 commented Sep 14, 2019 •

edited