Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix in voting parallel learner #2154

Merged
merged 1 commit into from May 8, 2019

Conversation

y-lan
Copy link
Contributor

@y-lan y-lan commented May 8, 2019

In our environment, we met a bug that when process GlobalVoting function in voting parallel learner,
if the training data is very sparse and cause train_data_->num_total_features() to be different between workers, the local calculated top_k_splits (from MaxK function) can result in different order between workers, thus cause the upcoming ReduceScatter to hang permanently for incorrect send/recv data size.

Sort the calculated top_k_splits seems a quick solution for this.

@StrikerRUS StrikerRUS requested a review from guolinke May 8, 2019 10:59
@guolinke
Copy link
Collaborator

guolinke commented May 8, 2019

Thanks!

@guolinke guolinke merged commit 5d6513e into microsoft:master May 8, 2019
@StrikerRUS
Copy link
Collaborator

@guolinke Shouldn't stable_sort be used here? #1739

@guolinke
Copy link
Collaborator

guolinke commented May 9, 2019

thanks @StrikerRUS , stable_sort will be better, for there may exist some duplicated splits from multi-nodes.

@StrikerRUS
Copy link
Collaborator

@guolinke Oh, just noticed that 3 other files in the repo contain sort: https://github.com/microsoft/LightGBM/search?q=%22std%3A%3Asort%22&unscoped_q=%22std%3A%3Asort%22

@guolinke
Copy link
Collaborator

guolinke commented May 9, 2019

@StrikerRUS these sorts are okay, for:

  1. ParallelSort is only used for auc evaluation, it will not affect the model training.
  2. the rest of them sort the index, which will produce unique results.

@lock lock bot locked as resolved and limited conversation to collaborators Mar 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants