Skip to content

Conversation

@SunsetWolf
Copy link
Collaborator

@SunsetWolf SunsetWolf commented Nov 12, 2025

Description

lgb.Dataset() itself does not return None or an empty object, but the data inside it can be empty and cause errors in subsequent training. So we need to determine whether the data inside it is empty or not.

Using the num_data() method, you can get the number of sample rows (the number of data strips) to determine if the internal data is empty.

Calling the num_data() method directly will result in an error, so you need to call the construct() method before calling the num_data() method.

After calling construct() method, it will release the raw data, which will cause an error when executing lgb.train(), so add the parameter free_raw_data=False in lgb.Dataset() method.

Reference documentation for lgb.Dataset(): https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.Dataset.html#lightgbm-dataset

Motivation and Context

How Has This Been Tested?

  • Pass the test by running: pytest qlib/tests/test_all_pipeline.py under upper directory of qlib.
  • If you are adding a new feature, test on your own test scripts.

Screenshots of Test Results (if appropriate):

  1. Pipeline test:
  2. Your own tests:

Types of changes

  • Fix bugs
  • Add new feature
  • Update documentation

@SunsetWolf SunsetWolf merged commit 2b41782 into main Nov 13, 2025
97 of 103 checks passed
@SunsetWolf SunsetWolf deleted the fix/gbdt-finetune-dataset-tuple branch November 13, 2025 03:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants