Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] LightGBM | Required time to fit is too long #2226

Open
4 of 19 tasks
victor-cattani-beegol opened this issue May 14, 2024 · 0 comments
Open
4 of 19 tasks

[BUG] LightGBM | Required time to fit is too long #2226

victor-cattani-beegol opened this issue May 14, 2024 · 0 comments

Comments

@victor-cattani-beegol
Copy link

SynapseML version

1.0.4

System information

  • Language version (e.g. python 3.8, scala 2.12):
  • Spark Version (e.g. 3.5.0):
  • Spark Platform (e.g. Synapse, Databricks):

Describe the problem

Hello, folks!

I facing some troubles to train a LightGBM Model. The model fits until a certain point and after that, somehow models stop to train. It is not indicating any kind of error, the model just stop to load and stays in the same place forever. As you can seen below: I've been using features such as: numTasks, numThreads, numBatches, useSingleDatasetMode and useBarrierExecutionMode in order to improve fit performance.

My dataset has about 418 millions lines to train and 18 millions for validation. I've been dealing of with about 21 features, 10 categorical and rest are continuous variables.

DataBricks Cluster Configuration:

--- Single Node
--- 256 GB Ram Memory | 32 Cores

You guys have any idea why I'm having such issue?

Code to reproduce issue

dic_params_reg_model_0 = {'learningRate' : 0.10686341357711826 ,
'featureFraction': 0.9064118023259887,
'maxBin' : 5,
'minDataInLeaf' : 6,
'numIterations' : 53,
'numLeaves' : 147,
'lambdaL2' : 45.405492626469716,
'lambdaL1' : 0.0015480184927416942}

model_cluster_0 = LightGBMRegressor(metric = 'mae', earlyStoppingRound=1, labelCol='target',
dataTransferMode='streaming', numTasks=32, numThreads=32, validationIndicatorCol='validation_col', numBatches=500, useSingleDatasetMode=True, useBarrierExecutionMode=True
).setParams(**dic_params_reg_model_0).fit(train_0)

Other info / logs

Spark Configuration:

spark.master local[*, 8]
spark.databricks.cluster.profile singleNode
spark.driver.maxResultSize 150g
spark.jars.repositories https://mmlspark.azureedge.net/maven

What component(s) does this bug affect?

  • area/cognitive: Cognitive project
  • area/core: Core project
  • area/deep-learning: DeepLearning project
  • area/lightgbm: Lightgbm project
  • area/opencv: Opencv project
  • area/vw: VW project
  • area/website: Website
  • area/build: Project build system
  • area/notebooks: Samples under notebooks folder
  • area/docker: Docker usage
  • area/models: models related issue

What language(s) does this bug affect?

  • language/scala: Scala source code
  • language/python: Pyspark APIs
  • language/r: R APIs
  • language/csharp: .NET APIs
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/synapse: Azure Synapse integrations
  • integrations/azureml: Azure ML integrations
  • integrations/databricks: Databricks integrations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant