Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update feature_fraction_bynode #2381

Merged
merged 4 commits into from
Sep 12, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
18 changes: 10 additions & 8 deletions docs/Parameters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -254,22 +254,24 @@ Learning Control Parameters

- random seed for bagging

- ``feature_fraction_bynode`` :raw-html:`<a id="feature_fraction_bynode" title="Permalink to this parameter" href="#feature_fraction_bynode">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool, aliases: ``sub_feature_bynode``, ``colsample_bytree_bynode``

- set this to ``true`` to randomly select part of features for each node
- ``feature_fraction`` :raw-html:`<a id="feature_fraction" title="Permalink to this parameter" href="#feature_fraction">&#x1F517;&#xFE0E;</a>`, default = ``1.0``, type = double, aliases: ``sub_feature``, ``colsample_bytree``, constraints: ``0.0 < feature_fraction <= 1.0``

- set this to ``false`` to randomly select part of features for each tree (use the same sub features for each tree)
- LightGBM will randomly select part of features on each iteration (tree) if ``feature_fraction`` smaller than ``1.0``. For example, if you set it to ``0.8``, LightGBM will select 80% of features before training each tree

- **Note**: set this to ``true`` cannot speed up the training, but set this to ``false`` can speed up the training linearly
- can be used to speed up training

- ``feature_fraction`` :raw-html:`<a id="feature_fraction" title="Permalink to this parameter" href="#feature_fraction">&#x1F517;&#xFE0E;</a>`, default = ``1.0``, type = double, aliases: ``sub_feature``, ``colsample_bytree``, constraints: ``0.0 < feature_fraction <= 1.0``
- can be used to deal with over-fitting

- LightGBM will randomly select part of features on each iteration if ``feature_fraction`` smaller than ``1.0``. For example, if you set it to ``0.8``, LightGBM will select 80% of features before training each tree
- ``feature_fraction_bynode`` :raw-html:`<a id="feature_fraction_bynode" title="Permalink to this parameter" href="#feature_fraction_bynode">&#x1F517;&#xFE0E;</a>`, default = ``1.0``, type = double, aliases: ``sub_feature_bynode``, ``colsample_bynode``, constraints: ``0.0 < feature_fraction_bynode <= 1.0``

- can be used to speed up training
- LightGBM will randomly select part of features on each tree node if ``feature_fraction_bynode`` smaller than ``1.0``. For example, if you set it to ``0.8``, LightGBM will select 80% of features at each tree node.

- can be used to deal with over-fitting

- **Note**: unlike ``feature_fraction``, this cannot speed up training

- **Note**: if both ``feature_fraction`` and ``feature_fraction_bynode`` are smaller than ``1.0``, the final fraction of each node is ``feature_fraction * feature_fraction_bynode``

- ``feature_fraction_seed`` :raw-html:`<a id="feature_fraction_seed" title="Permalink to this parameter" href="#feature_fraction_seed">&#x1F517;&#xFE0E;</a>`, default = ``2``, type = int

- random seed for ``feature_fraction``
Expand Down
17 changes: 10 additions & 7 deletions include/LightGBM/config.h
Original file line number Diff line number Diff line change
Expand Up @@ -268,20 +268,23 @@ struct Config {
// desc = random seed for bagging
int bagging_seed = 3;

// alias = sub_feature_bynode, colsample_bytree_bynode
// desc = set this to ``true`` to randomly select part of features for each node
// desc = set this to ``false`` to randomly select part of features for each tree (use the same sub features for each tree)
// desc = **Note**: set this to ``true`` cannot speed up the training, but set this to ``false`` can speed up the training linearly
bool feature_fraction_bynode = false;

// alias = sub_feature, colsample_bytree
// check = >0.0
// check = <=1.0
// desc = LightGBM will randomly select part of features on each iteration if ``feature_fraction`` smaller than ``1.0``. For example, if you set it to ``0.8``, LightGBM will select 80% of features before training each tree
// desc = LightGBM will randomly select part of features on each iteration (tree) if ``feature_fraction`` smaller than ``1.0``. For example, if you set it to ``0.8``, LightGBM will select 80% of features before training each tree
// desc = can be used to speed up training
// desc = can be used to deal with over-fitting
double feature_fraction = 1.0;

// alias = sub_feature_bynode, colsample_bynode
// check = >0.0
// check = <=1.0
// desc = LightGBM will randomly select part of features on each tree node if ``feature_fraction_bynode`` smaller than ``1.0``. For example, if you set it to ``0.8``, LightGBM will select 80% of features at each tree node.
// desc = can be used to deal with over-fitting
// desc = **Note**: unlike ``feature_fraction``, this cannot speed up training
// desc = **Note**: if both ``feature_fraction`` and ``feature_fraction_bynode`` are smaller than ``1.0``, the final fraction of each node is ``feature_fraction * feature_fraction_bynode``
double feature_fraction_bynode = 1.0;

// desc = random seed for ``feature_fraction``
int feature_fraction_seed = 2;

Expand Down
14 changes: 8 additions & 6 deletions src/io/config_auto.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -66,10 +66,10 @@ std::unordered_map<std::string, std::string> Config::alias_table({
{"neg_bagging", "neg_bagging_fraction"},
{"subsample_freq", "bagging_freq"},
{"bagging_fraction_seed", "bagging_seed"},
{"sub_feature_bynode", "feature_fraction_bynode"},
{"colsample_bytree_bynode", "feature_fraction_bynode"},
{"sub_feature", "feature_fraction"},
{"colsample_bytree", "feature_fraction"},
{"sub_feature_bynode", "feature_fraction_bynode"},
{"colsample_bynode", "feature_fraction_bynode"},
{"early_stopping_rounds", "early_stopping_round"},
{"early_stopping", "early_stopping_round"},
{"max_tree_output", "max_delta_step"},
Expand Down Expand Up @@ -188,8 +188,8 @@ std::unordered_set<std::string> Config::parameter_set({
"neg_bagging_fraction",
"bagging_freq",
"bagging_seed",
"feature_fraction_bynode",
"feature_fraction",
"feature_fraction_bynode",
"feature_fraction_seed",
"early_stopping_round",
"first_metric_only",
Expand Down Expand Up @@ -327,12 +327,14 @@ void Config::GetMembersFromString(const std::unordered_map<std::string, std::str

GetInt(params, "bagging_seed", &bagging_seed);

GetBool(params, "feature_fraction_bynode", &feature_fraction_bynode);

GetDouble(params, "feature_fraction", &feature_fraction);
CHECK(feature_fraction >0.0);
CHECK(feature_fraction <=1.0);

GetDouble(params, "feature_fraction_bynode", &feature_fraction_bynode);
CHECK(feature_fraction_bynode >0.0);
CHECK(feature_fraction_bynode <=1.0);

GetInt(params, "feature_fraction_seed", &feature_fraction_seed);

GetInt(params, "early_stopping_round", &early_stopping_round);
Expand Down Expand Up @@ -591,8 +593,8 @@ std::string Config::SaveMembersToString() const {
str_buf << "[neg_bagging_fraction: " << neg_bagging_fraction << "]\n";
str_buf << "[bagging_freq: " << bagging_freq << "]\n";
str_buf << "[bagging_seed: " << bagging_seed << "]\n";
str_buf << "[feature_fraction_bynode: " << feature_fraction_bynode << "]\n";
str_buf << "[feature_fraction: " << feature_fraction << "]\n";
str_buf << "[feature_fraction_bynode: " << feature_fraction_bynode << "]\n";
str_buf << "[feature_fraction_seed: " << feature_fraction_seed << "]\n";
str_buf << "[early_stopping_round: " << early_stopping_round << "]\n";
str_buf << "[first_metric_only: " << first_metric_only << "]\n";
Expand Down
6 changes: 3 additions & 3 deletions src/treelearner/data_parallel_tree_learner.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -169,9 +169,9 @@ void DataParallelTreeLearner<TREELEARNER_T>::FindBestSplitsFromHistograms(const
std::vector<SplitInfo> larger_bests_per_thread(this->num_threads_, SplitInfo());
std::vector<int8_t> smaller_node_used_features(this->num_features_, 1);
std::vector<int8_t> larger_node_used_features(this->num_features_, 1);
if (this->config_->feature_fraction_bynode) {
smaller_node_used_features = this->GetUsedFeatures();
larger_node_used_features = this->GetUsedFeatures();
if (this->config_->feature_fraction_bynode < 1.0f) {
smaller_node_used_features = this->GetUsedFeatures(false);
larger_node_used_features = this->GetUsedFeatures(false);
}
OMP_INIT_EX();
#pragma omp parallel for schedule(static)
Expand Down
77 changes: 52 additions & 25 deletions src/treelearner/serial_tree_learner.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -104,8 +104,8 @@ void SerialTreeLearner::Init(const Dataset* train_data, bool is_constant_hessian
}
}
Log::Info("Number of data: %d, number of used features: %d", num_data_, num_features_);
feature_used.clear();
feature_used.resize(train_data->num_features());
is_feature_used_in_split_.clear();
is_feature_used_in_split_.resize(train_data->num_features());

if (!config_->cegb_penalty_feature_coupled.empty()) {
CHECK(config_->cegb_penalty_feature_coupled.size() == static_cast<size_t>(train_data_->num_total_features()));
Expand Down Expand Up @@ -268,24 +268,51 @@ Tree* SerialTreeLearner::FitByExistingTree(const Tree* old_tree, const std::vect
return FitByExistingTree(old_tree, gradients, hessians);
}

std::vector<int8_t> SerialTreeLearner::GetUsedFeatures() {
std::vector<int8_t> SerialTreeLearner::GetUsedFeatures(bool is_tree_level) {
std::vector<int8_t> ret(num_features_, 1);
if (config_->feature_fraction >= 1.0f) {
if (config_->feature_fraction >= 1.0f && is_tree_level) {
return ret;
}
if (config_->feature_fraction_bynode >= 1.0f && !is_tree_level) {
return ret;
}
int used_feature_cnt = static_cast<int>(valid_feature_indices_.size()*config_->feature_fraction);
// at least use one feature
used_feature_cnt = std::max(used_feature_cnt, 1);
// initialize used features
std::memset(ret.data(), 0, sizeof(int8_t) * num_features_);
auto sampled_indices = random_.Sample(static_cast<int>(valid_feature_indices_.size()), used_feature_cnt);
int omp_loop_size = static_cast<int>(sampled_indices.size());
#pragma omp parallel for schedule(static, 512) if (omp_loop_size >= 1024)
for (int i = 0; i < omp_loop_size; ++i) {
int used_feature = valid_feature_indices_[sampled_indices[i]];
int inner_feature_index = train_data_->InnerFeatureIndex(used_feature);
CHECK(inner_feature_index >= 0);
ret[inner_feature_index] = 1;
if (is_tree_level) {
int used_feature_cnt = static_cast<int>(valid_feature_indices_.size() * config_->feature_fraction);
used_feature_cnt = std::max(used_feature_cnt, 1);
used_feature_indices_ = random_.Sample(static_cast<int>(valid_feature_indices_.size()), used_feature_cnt);
int omp_loop_size = static_cast<int>(used_feature_indices_.size());
#pragma omp parallel for schedule(static, 512) if (omp_loop_size >= 1024)
for (int i = 0; i < omp_loop_size; ++i) {
int used_feature = valid_feature_indices_[used_feature_indices_[i]];
int inner_feature_index = train_data_->InnerFeatureIndex(used_feature);
CHECK(inner_feature_index >= 0);
ret[inner_feature_index] = 1;
}
} else if(used_feature_indices_.size() <= 0) {
int used_feature_cnt = static_cast<int>(valid_feature_indices_.size() * config_->feature_fraction_bynode);
used_feature_cnt = std::max(used_feature_cnt, 1);
auto sampled_indices = random_.Sample(static_cast<int>(valid_feature_indices_.size()), used_feature_cnt);
int omp_loop_size = static_cast<int>(sampled_indices.size());
#pragma omp parallel for schedule(static, 512) if (omp_loop_size >= 1024)
for (int i = 0; i < omp_loop_size; ++i) {
int used_feature = valid_feature_indices_[sampled_indices[i]];
int inner_feature_index = train_data_->InnerFeatureIndex(used_feature);
CHECK(inner_feature_index >= 0);
ret[inner_feature_index] = 1;
}
} else {
int used_feature_cnt = static_cast<int>(used_feature_indices_.size() * config_->feature_fraction_bynode);
used_feature_cnt = std::max(used_feature_cnt, 1);
auto sampled_indices = random_.Sample(static_cast<int>(used_feature_indices_.size()), used_feature_cnt);
int omp_loop_size = static_cast<int>(sampled_indices.size());
#pragma omp parallel for schedule(static, 512) if (omp_loop_size >= 1024)
for (int i = 0; i < omp_loop_size; ++i) {
int used_feature = valid_feature_indices_[used_feature_indices_[sampled_indices[i]]];
int inner_feature_index = train_data_->InnerFeatureIndex(used_feature);
CHECK(inner_feature_index >= 0);
ret[inner_feature_index] = 1;
}
}
return ret;
}
Expand All @@ -294,8 +321,8 @@ void SerialTreeLearner::BeforeTrain() {
// reset histogram pool
histogram_pool_.ResetMap();

if (config_->feature_fraction < 1 && !config_->feature_fraction_bynode) {
is_feature_used_ = GetUsedFeatures();
if (config_->feature_fraction < 1.0f) {
is_feature_used_ = GetUsedFeatures(true);
} else {
#pragma omp parallel for schedule(static, 512) if (num_features_ >= 1024)
for (int i = 0; i < num_features_; ++i) {
Expand Down Expand Up @@ -523,9 +550,9 @@ void SerialTreeLearner::FindBestSplitsFromHistograms(const std::vector<int8_t>&
std::vector<SplitInfo> larger_best(num_threads_);
std::vector<int8_t> smaller_node_used_features(num_features_, 1);
std::vector<int8_t> larger_node_used_features(num_features_, 1);
if (config_->feature_fraction_bynode) {
smaller_node_used_features = GetUsedFeatures();
larger_node_used_features = GetUsedFeatures();
if (config_->feature_fraction_bynode < 1.0f) {
smaller_node_used_features = GetUsedFeatures(false);
larger_node_used_features = GetUsedFeatures(false);
}
OMP_INIT_EX();
// find splits
Expand All @@ -549,7 +576,7 @@ void SerialTreeLearner::FindBestSplitsFromHistograms(const std::vector<int8_t>&
&smaller_split);
smaller_split.feature = real_fidx;
smaller_split.gain -= config_->cegb_tradeoff * config_->cegb_penalty_split * smaller_leaf_splits_->num_data_in_leaf();
if (!config_->cegb_penalty_feature_coupled.empty() && !feature_used[feature_index]) {
if (!config_->cegb_penalty_feature_coupled.empty() && !is_feature_used_in_split_[feature_index]) {
smaller_split.gain -= config_->cegb_tradeoff * config_->cegb_penalty_feature_coupled[real_fidx];
}
if (!config_->cegb_penalty_feature_lazy.empty()) {
Expand Down Expand Up @@ -580,7 +607,7 @@ void SerialTreeLearner::FindBestSplitsFromHistograms(const std::vector<int8_t>&
&larger_split);
larger_split.feature = real_fidx;
larger_split.gain -= config_->cegb_tradeoff * config_->cegb_penalty_split * larger_leaf_splits_->num_data_in_leaf();
if (!config_->cegb_penalty_feature_coupled.empty() && !feature_used[feature_index]) {
if (!config_->cegb_penalty_feature_coupled.empty() && !is_feature_used_in_split_[feature_index]) {
larger_split.gain -= config_->cegb_tradeoff * config_->cegb_penalty_feature_coupled[real_fidx];
}
if (!config_->cegb_penalty_feature_lazy.empty()) {
Expand Down Expand Up @@ -775,8 +802,8 @@ int32_t SerialTreeLearner::ForceSplits(Tree* tree, Json& forced_split_json, int*
void SerialTreeLearner::Split(Tree* tree, int best_leaf, int* left_leaf, int* right_leaf) {
const SplitInfo& best_split_info = best_split_per_leaf_[best_leaf];
const int inner_feature_index = train_data_->InnerFeatureIndex(best_split_info.feature);
if (!config_->cegb_penalty_feature_coupled.empty() && !feature_used[inner_feature_index]) {
feature_used[inner_feature_index] = true;
if (!config_->cegb_penalty_feature_coupled.empty() && !is_feature_used_in_split_[inner_feature_index]) {
is_feature_used_in_split_[inner_feature_index] = true;
for (int i = 0; i < tree->num_leaves(); ++i) {
if (i == best_leaf) continue;
auto split = &splits_per_leaf_[i*train_data_->num_features() + inner_feature_index];
Expand Down
6 changes: 4 additions & 2 deletions src/treelearner/serial_tree_learner.h
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ class SerialTreeLearner: public TreeLearner {

protected:

virtual std::vector<int8_t> GetUsedFeatures();
virtual std::vector<int8_t> GetUsedFeatures(bool is_tree_level);
/*!
* \brief Some initial works before training
*/
Expand Down Expand Up @@ -135,6 +135,8 @@ class SerialTreeLearner: public TreeLearner {
Random random_;
/*! \brief used for sub feature training, is_feature_used_[i] = false means don't used feature i */
std::vector<int8_t> is_feature_used_;
/*! \brief used feature indices in current tree */
std::vector<int> used_feature_indices_;
/*! \brief pointer to histograms array of parent of current leaves */
FeatureHistogram* parent_leaf_histogram_array_;
/*! \brief pointer to histograms array of smaller leaf */
Expand Down Expand Up @@ -179,7 +181,7 @@ class SerialTreeLearner: public TreeLearner {
std::vector<int> ordered_bin_indices_;
bool is_constant_hessian_;

std::vector<bool> feature_used;
std::vector<bool> is_feature_used_in_split_;
std::vector<uint32_t> feature_used_in_data;
};

Expand Down
6 changes: 3 additions & 3 deletions src/treelearner/voting_parallel_tree_learner.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -379,9 +379,9 @@ void VotingParallelTreeLearner<TREELEARNER_T>::FindBestSplitsFromHistograms(cons
std::vector<SplitInfo> larger_best_per_thread(this->num_threads_);
std::vector<int8_t> smaller_node_used_features(this->num_features_, 1);
std::vector<int8_t> larger_node_used_features(this->num_features_, 1);
if (this->config_->feature_fraction_bynode) {
smaller_node_used_features = this->GetUsedFeatures();
larger_node_used_features = this->GetUsedFeatures();
if (this->config_->feature_fraction_bynode < 1.0f) {
smaller_node_used_features = this->GetUsedFeatures(false);
larger_node_used_features = this->GetUsedFeatures(false);
}
// find best split from local aggregated histograms

Expand Down
4 changes: 2 additions & 2 deletions tests/python_package_test/test_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -1591,8 +1591,8 @@ def test_node_level_subcol(self):
params = {
'objective': 'binary',
'metric': 'binary_logloss',
'feature_fraction': 0.8,
'feature_fraction_bynode': True,
'feature_fraction_bynode': 0.8,
'feature_fraction': 1.0,
'verbose': -1
}
lgb_train = lgb.Dataset(X_train, y_train)
Expand Down