-
Notifications
You must be signed in to change notification settings - Fork 823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LightGBM model doesn't converge when set SingleDatasetMode=True #1404
Comments
Found many warnings from log: |
I had a similar issue on dataset with few millions and resolved it by setting chunkSize to much greater value like 10 000 -> 100 000 . In general, my observations shows that chunkSize should not be less than dataset_size / num_exes * cores_per_exec subject to equal amount of rows in partitions. I also posted an issue on this topic #1478 |
closing as this issue was likely related to the issue: You can try the build with the fix at: Maven Coordinates Maven Resolver The fix will be in the next release after the current 0.9.5 (I'm assuming 0.9.6) |
Describe the bug
Hi, I have tried to migrate the local python lightgbm 3.2 to SynapseML lightgbm, it successfully trained model but got a quite different result for feature importance.
To Reproduce
Train Data : 3.8M * 758 columns
Eval Data: 0.2M * 758 columns
migrating to synapseML lightgbm:
For default setting(SingleDatasetMode=True), there is no obvious convergence on the eval metric l2, which is always around 0.021.
When setting SingleDatasetMode=False, we could see the convergence from 0.021 to 0.017 as expected. Also the feature importance looks good.
I looked through the source code but found nothing could explain this issue. Could you please give more clues?
Info (please complete the following information):
SynapseML Version: 0.9.5
Spark Version: 3.2.1
Spark Platform: GCP Dataproc
The text was updated successfully, but these errors were encountered: