Skip to content

fix: DDG-DA workflow - LightGBM 4.0+ compatibility and pandas issues#2234

Open
Olcmyk wants to merge 4 commits into
microsoft:mainfrom
Olcmyk:fix/ddgda-pandas-issues
Open

fix: DDG-DA workflow - LightGBM 4.0+ compatibility and pandas issues#2234
Olcmyk wants to merge 4 commits into
microsoft:mainfrom
Olcmyk:fix/ddgda-pandas-issues

Conversation

@Olcmyk
Copy link
Copy Markdown

@Olcmyk Olcmyk commented May 23, 2026

Description

This PR fixes the DDG-DA (Data Distribution Guided Domain Adaptation) workflow, which is completely broken due to a chain of three sequential bugs. Each bug masks the next one, making them impossible to discover without fixing the previous bug first.

The three bugs fixed in this PR:

  1. LightGBM 4.0+ Compatibility Issue

    • LightGBM 4.0+ no longer accepts None for early_stopping_rounds parameter
    • Modified qlib/contrib/model/gbdt.py and qlib/contrib/model/highfreq_gdbt_model.py to only create early_stopping callback when the value is not None
  2. Unhashable List Type Error

    • data_key was a list [start_date, end_date], which cannot be used as dictionary keys or DataFrame column names
    • Modified qlib/contrib/meta/data_selection/dataset.py to convert list to tuple (line 99-100)
  3. Incorrect Pandas MultiIndex Selection

    • Used wrong syntax df.loc(axis=0)[:, "pred"] and group_keys=False caused loss of datetime index
    • Modified qlib/contrib/meta/data_selection/dataset.py to use df.xs("label", level=1) for correct MultiIndex selection (line 112-114)

Files changed:

  • qlib/contrib/model/gbdt.py - LightGBM 4.0+ compatibility
  • qlib/contrib/model/highfreq_gdbt_model.py - LightGBM 4.0+ compatibility
  • qlib/contrib/meta/data_selection/dataset.py - Unhashable list fix + pandas indexing fix

Motivation and Context

Related Issues:

Why is this change required?

The DDG-DA workflow has been completely non-functional due to these bugs. The bugs form a dependency chain where each bug completely blocks execution, masking all subsequent bugs:

  1. Bug 1 (LightGBM) - Occurs first after meta-model training completes
  2. Bug 2 (unhashable list) - Only revealed after fixing Bug 1
  3. Bug 3 (pandas indexing) - Only revealed after fixing Bug 2

Root Causes:

Bug 1: LightGBM 4.0 introduced a breaking change where lgb.early_stopping() no longer accepts None. The DDG-DA workflow explicitly sets early_stopping_rounds=None to disable early stopping, causing:

TypeError: early_stopping_round should be an integer. Got 'NoneType'

Bug 2: The code attempts to use a list as a dictionary key when creating a DataFrame:

data_key = task["dataset"]["kwargs"]["segments"]["train"]  # Returns [start_date, end_date]
self.data_ic_df = pd.DataFrame(dict(zip(key_l, ic_l)))  # ❌ Lists are unhashable

Bug 3: Incorrect pandas syntax and group_keys=False caused index structure issues:

df = df.groupby("datetime", group_keys=False).corr(method="spearman")  # Loses datetime index
corr = df.loc(axis=0)[:, "pred"]["label"].droplevel(axis=0, level=-1)  # Wrong syntax

How Has This Been Tested?

  • Pass the test by running: pytest qlib/tests/test_all_pipeline.py under upper directory of qlib.
  • If you are adding a new feature, test on your own test scripts.

Additional Testing:

Test Environment:

  • Python: 3.8.10
  • OS: Linux (Ubuntu 22.04)
  • LightGBM version: 4.6.0
  • qlib version: 0.9.8.dev34

Test Command:

cd examples/benchmarks_dynamic/DDG-DA
rm -rf mlruns
python workflow.py run

Test Results:
✅ All 154 meta-model training tasks completed successfully
✅ Data similarity matrix calculated correctly
✅ Meta-learning model trained with data selection
✅ Final predictions and backtest results generated

Screenshots of Test Results (if appropriate):

  1. Pipeline test: ✅ Passed

  2. Your own tests:

Before the fix:

# Bug 1: LightGBM error
TypeError: early_stopping_round should be an integer. Got 'NoneType'
File "qlib/contrib/model/gbdt.py", line 71

# Bug 2: Unhashable list error (after fixing Bug 1)
TypeError: unhashable type: 'list'
File "qlib/contrib/meta/data_selection/dataset.py", line 102

# Bug 3: Pandas indexing error (after fixing Bug 2)
ValueError: Cannot remove 1 levels from an index with 1 levels
File "qlib/contrib/meta/data_selection/dataset.py", line 113

After the fix:

train tasks: 100%|████████████████████████████████████| 154/154 [05:31<00:00,  2.15s/it]
calc: 100%|█████████████████████████████████████████| 154/154 [00:01<00:00, 100.48it/s]

[Meta-learning training completed successfully]

Final backtest results:
{'IC': 0.09811810334105552,
 'ICIR': 0.6205460160090581,
 'Long-Avg Ann Return': 1.9030504301190376,
 'Long-Avg Ann Sharpe': 1.9758248184919691,
 'Long-Short Ann Return': 2.6031273677945137,
 'Long-Short Ann Sharpe': 7.502124201699991,
 'Rank IC': 0.11010890300163573,
 'Rank ICIR': 0.669655699598572}
图片

DDG-DA workflow now runs successfully end-to-end! 🎉

Types of changes

  • Fix bugs
  • Add new feature
  • Update documentation

Additional Notes

Why are these fixes combined in one PR?

These bugs form a true dependency chain, not an artificial grouping:

  1. Each bug completely blocks the workflow, masking all subsequent bugs
  2. They were discovered sequentially during testing - impossible to find without fixing the previous one
  3. All three must be fixed together for DDG-DA to work
  4. Splitting would create confusion about reproduction steps and dependencies

Backward Compatibility:

All fixes are backward compatible:

  • LightGBM fix works with both LightGBM < 4.0 and >= 4.0
  • List-to-tuple conversion preserves the same semantic meaning
  • Pandas fix uses correct API that works across pandas versions

Dependencies:

⚠️ This PR depends on PR #2230 (zscore and InternalData pickle whitelist)

Note: This PR's branch already includes PR #2230's changes. If PR #2230 is merged first, the commit history will be clean. If reviewing this PR independently, please note that commits b5e58a00 and 0007720e are from PR #2230.

Impact:

  • 🔴 Critical bug fix: DDG-DA workflow is currently completely non-functional
  • ✅ Fixes LightGBM 4.0+ compatibility issue that affects all LightGBM users
  • ✅ Enables meta-learning data selection feature to work correctly
  • ✅ Affects all users trying to use DDG-DA for domain adaptation

References:

genisis0x and others added 4 commits May 12, 2026 15:43
…aset chain

The RestrictedUnpickler safelist introduced by the recent security
hardening (microsoft#2099 / microsoft#2076 / microsoft#2153) only covered the abstract
``DataHandler`` / ``DataHandlerLP`` classes plus ``StaticDataLoader``.
Any rolling workflow that pickles a real Dataset (the default for
``Rolling._train_rolling_tasks``) walks into one of the contrib stock
handlers and now crashes on reload (issue microsoft#2130):

    UnpicklingError: Forbidden class:
    qlib.contrib.data.handler.Alpha158. Only whitelisted classes
    are allowed for security reasons. ...

Unrolling workflows happened to use a path that did not go through the
restricted loader, which is why downgrading to 0.9.7 hid the issue.

Extend ``SAFE_PICKLE_CLASSES`` with the qlib-internal classes that sit
on the standard recorder pickle graph:

* The four shipped contrib handlers: ``Alpha158``, ``Alpha158vwap``,
  ``Alpha360``, ``Alpha360vwap``.
* The dataset wrappers (``Dataset``, ``DatasetH``, ``TSDatasetH``) and
  the additional concrete loaders (``DataLoader``, ``DLWParser``,
  ``QlibDataLoader``, ``NestedDataLoader``, ``DataLoaderDH``).
* Every concrete ``Processor`` defined in
  ``qlib.data.dataset.processor`` -- they show up in every realistic
  ``learn_processors`` / ``infer_processors`` chain.

These are all classes already shipped inside qlib itself, so adding
them does not weaken the threat model the safelist was designed
against (arbitrary code execution through external pickle payloads).

Add regression tests pinning each added entry plus an end-to-end check
that ``RestrictedUnpickler.find_class`` actually resolves ``Alpha158``
and that other unknown classes are still rejected.

Fixes microsoft#2130
PR microsoft#2213 added Alpha158/Alpha360 handlers to the pickle whitelist but
missed qlib.utils.data.zscore, which is also required by the DDG-DA
workflow. Without this, DDG-DA fails with:

  UnpicklingError: Forbidden class: qlib.utils.data.zscore

This commit adds zscore to the whitelist and includes a test to prevent
regression.

Fixes microsoft#2130 (supplement to PR microsoft#2213)
DDG-DA workflow pickles and reloads InternalData objects during meta-learning
data selection. Without this whitelist entry, the workflow fails with:

    UnpicklingError: Forbidden class: qlib.contrib.meta.data_selection.dataset.InternalData

Changes:
- Add InternalData to SAFE_PICKLE_CLASSES in qlib/utils/pickle_utils.py
- Add test case test_internal_data_is_safelisted to verify the whitelist entry

This is part of the fix for issue microsoft#2130 - DDG-DA workflow requires multiple
classes to be whitelisted for pickle deserialization.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit fixes three sequential bugs that prevented DDG-DA workflow from running.
Each bug masked the next one, making them impossible to discover without fixing
the previous bug first.

Bug 1: LightGBM 4.0+ Compatibility
- Problem: LightGBM 4.0+ no longer accepts None for early_stopping_rounds
- Fix: Only create early_stopping callback when rounds is not None
- Files: qlib/contrib/model/gbdt.py, qlib/contrib/model/highfreq_gdbt_model.py

Bug 2: Unhashable List Type Error
- Problem: data_key was a list [start_date, end_date], cannot be used as dict keys
- Fix: Convert list to tuple to make it hashable
- File: qlib/contrib/meta/data_selection/dataset.py (line 99-100)

Bug 3: Incorrect Pandas MultiIndex Selection
- Problem: Wrong syntax df.loc(axis=0) and group_keys=False caused index issues
- Fix: Use df.xs("label", level=1) for correct MultiIndex selection
- File: qlib/contrib/meta/data_selection/dataset.py (line 112-114)

Testing:
✅ DDG-DA workflow runs successfully end-to-end
✅ All 154 training tasks complete without errors
✅ Meta-learning data selection works correctly
✅ Final backtest results generated successfully

Dependencies:
- Requires PR microsoft#2230 (zscore and InternalData pickle whitelist) to be merged first
- Without PR microsoft#2230, workflow fails earlier with UnpicklingError

Fixes issue microsoft#2130 (DDG-DA workflow broken)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DDG-DA workflow fails with chain of sequential bugs (LightGBM 4.0+, unhashable list, pandas indexing)

2 participants