-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new task type: "save_binary". #3651
Conversation
I've looked at this failed check, which is a compilation failure caused by missing |
no you can ignore that one, it's very unreliable. Sorry! I'll re-run manually |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment for some logging tweaks.
@@ -223,7 +224,7 @@ Dataset* DatasetLoader::LoadFromFile(const char* filename, int rank, int num_mac | |||
ConstructBinMappersFromTextData(rank, num_machines, sample_data, parser.get(), dataset.get()); | |||
// initialize label | |||
dataset->metadata_.Init(dataset->num_data_, weight_idx_, group_idx_); | |||
Log::Debug("Making second pass..."); | |||
Log::Info("Making second pass..."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log is helpful to confirm the two_round
parameter is effective.
|
||
auto t2 = std::chrono::high_resolution_clock::now(); | ||
Log::Info("Construct bin mappers from text data time %.2f seconds", | ||
std::chrono::duration<double, std::milli>(t2 - t1) * 1e-3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to add more timing information for the whole process.
@@ -27,7 +27,7 @@ namespace LightGBM { | |||
|
|||
/*! \brief Types of tasks */ | |||
enum TaskType { | |||
kTrain, kPredict, kConvertModel, KRefitTree | |||
kTrain, kPredict, kConvertModel, KRefitTree, kSaveBinary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cyfdecyf Thanks for your contribution! Could you please resolve all conflicts and document new enum option?
LightGBM/include/LightGBM/config.h
Lines 95 to 106 in ed651e8
// [no-save] | |
// [doc-only] | |
// type = enum | |
// default = train | |
// options = train, predict, convert_model, refit | |
// alias = task_type | |
// desc = ``train``, for training, aliases: ``training`` | |
// desc = ``predict``, for prediction, aliases: ``prediction``, ``test`` | |
// desc = ``convert_model``, for converting model file into if-else format, see more information in `Convert Parameters <#convert-parameters>`__ | |
// desc = ``refit``, for refitting existing models with new data, aliases: ``refit_tree`` | |
// desc = **Note**: can be used only in CLI version; for language-specific packages you can use the correspondent functions | |
TaskType task = TaskType::kTrain; |
After adding description of
save_binary
task, run this script to regenerate documentation.https://github.com/microsoft/LightGBM/blob/master/helpers/parameter_generator.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply. Conflicts are resolved and added document for the new enum option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for your contribution!
@@ -102,6 +102,7 @@ struct Config { | |||
// desc = ``predict``, for prediction, aliases: ``prediction``, ``test`` | |||
// desc = ``convert_model``, for converting model file into if-else format, see more information in `Convert Parameters <#convert-parameters>`__ | |||
// desc = ``refit``, for refitting existing models with new data, aliases: ``refit_tree`` | |||
// desc = ``save_binary``, load train (and validation) data then save dataset to binary file. Typical usage: ``save_binary`` first, then run multiple ``train`` tasks in parallel using the saved binary file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for Typical usage
! 👍
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
The main motivation of adding
save_binary
task type is to save memroy.The python function
Dataset.save_binary()
has two drawbacks:two_round
parameter compared to command line verison to reduce memory usageIf you have interest to merge this PR, I'd also update the corresponding docs.