Skip to content

Fix/pickle whitelist zscore#2230

Open
Olcmyk wants to merge 3 commits into
microsoft:mainfrom
Olcmyk:fix/pickle-whitelist-zscore
Open

Fix/pickle whitelist zscore#2230
Olcmyk wants to merge 3 commits into
microsoft:mainfrom
Olcmyk:fix/pickle-whitelist-zscore

Conversation

@Olcmyk
Copy link
Copy Markdown

@Olcmyk Olcmyk commented May 23, 2026

Description

This PR supplements PR #2213 by adding qlib.utils.data.zscore to the pickle whitelist.

PR #2213 successfully added Alpha158 and Alpha360 handlers to fix issue #2130, but the whitelist is still incomplete. The DDG-DA workflow also requires zscore to be whitelisted, otherwise it fails immediately after the Alpha handlers are loaded.

Changes:

  • Add ("qlib.utils.data", "zscore") to SAFE_PICKLE_CLASSES in qlib/utils/pickle_utils.py
  • Add test case test_zscore_is_safelisted to prevent regression

Motivation and Context

Fixes #2130 (supplement to PR #2213)

Problem: After applying PR #2213, the DDG-DA workflow still fails with:

UnpicklingError: Forbidden class: qlib.utils.data.zscore. 
Only whitelisted classes are allowed for security reasons.

Root cause: The zscore utility function is used in data processing and gets pickled with the dataset, but it was not included in PR #2213's whitelist additions.

How to Reproduce the Issue (Before This Fix)

  1. Apply PR fix(unpickler): allow Alpha158/Alpha360 handlers and the standard dataset chain #2213: fix(unpickler): allow Alpha158/Alpha360 handlers and the standard dataset chain #2213
  2. Run the DDG-DA example:
    cd examples/benchmarks_dynamic/DDG-DA
    rm -rf mlruns
    python workflow.py run

Expected error:

UnpicklingError: Forbidden class: qlib.utils.data.zscore
  File "qlib/contrib/rolling/ddgda.py", line 195, in _dump_data_for_proxy_model
    dataset = init_instance_by_config(task["dataset"])
  ...
图片

How Has This Been Tested?

  • Pass the test by running: python -m unittest tests.misc.test_pickle_safelist -v
  • Verified DDG-DA workflow proceeds past the zscore error after this fix

Testing environment:

  • Python: 3.8
  • OS: Linux (Ubuntu)

Test results:

test_zscore_is_safelisted ... ok

----------------------------------------------------------------------
Ran 10 tests in 0.412s

OK

Manual verification:
After applying this fix and reinstalling (pip install -e .), the DDG-DA workflow successfully loads the dataset and proceeds to the next stage (LightGBM training).

Types of changes

  • Fix bugs
  • Add new feature
  • Update documentation

Additional Notes

This is a minimal, focused fix that adds only the missing zscore class to the whitelist established by PR #2213. The fix follows the same pattern and includes a test case consistent with the existing test structure.

Relationship to PR #2213:

References

genisis0x and others added 2 commits May 12, 2026 15:43
…aset chain

The RestrictedUnpickler safelist introduced by the recent security
hardening (microsoft#2099 / microsoft#2076 / microsoft#2153) only covered the abstract
``DataHandler`` / ``DataHandlerLP`` classes plus ``StaticDataLoader``.
Any rolling workflow that pickles a real Dataset (the default for
``Rolling._train_rolling_tasks``) walks into one of the contrib stock
handlers and now crashes on reload (issue microsoft#2130):

    UnpicklingError: Forbidden class:
    qlib.contrib.data.handler.Alpha158. Only whitelisted classes
    are allowed for security reasons. ...

Unrolling workflows happened to use a path that did not go through the
restricted loader, which is why downgrading to 0.9.7 hid the issue.

Extend ``SAFE_PICKLE_CLASSES`` with the qlib-internal classes that sit
on the standard recorder pickle graph:

* The four shipped contrib handlers: ``Alpha158``, ``Alpha158vwap``,
  ``Alpha360``, ``Alpha360vwap``.
* The dataset wrappers (``Dataset``, ``DatasetH``, ``TSDatasetH``) and
  the additional concrete loaders (``DataLoader``, ``DLWParser``,
  ``QlibDataLoader``, ``NestedDataLoader``, ``DataLoaderDH``).
* Every concrete ``Processor`` defined in
  ``qlib.data.dataset.processor`` -- they show up in every realistic
  ``learn_processors`` / ``infer_processors`` chain.

These are all classes already shipped inside qlib itself, so adding
them does not weaken the threat model the safelist was designed
against (arbitrary code execution through external pickle payloads).

Add regression tests pinning each added entry plus an end-to-end check
that ``RestrictedUnpickler.find_class`` actually resolves ``Alpha158``
and that other unknown classes are still rejected.

Fixes microsoft#2130
PR microsoft#2213 added Alpha158/Alpha360 handlers to the pickle whitelist but
missed qlib.utils.data.zscore, which is also required by the DDG-DA
workflow. Without this, DDG-DA fails with:

  UnpicklingError: Forbidden class: qlib.utils.data.zscore

This commit adds zscore to the whitelist and includes a test to prevent
regression.

Fixes microsoft#2130 (supplement to PR microsoft#2213)
@Olcmyk
Copy link
Copy Markdown
Author

Olcmyk commented May 23, 2026

@Olcmyk please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@microsoft-github-policy-service agree

@Olcmyk
Copy link
Copy Markdown
Author

Olcmyk commented May 23, 2026

@microsoft-github-policy-service agree

Olcmyk added a commit to Olcmyk/qlib that referenced this pull request May 23, 2026
…sues

This commit fixes two critical bugs that prevented DDG-DA workflow from running:

1. **Unhashable list type error in InternalData.setup()**
   - Problem: data_key was a list [start_date, end_date], which cannot be used
     as dictionary keys or DataFrame column names
   - Fix: Convert list to tuple to make it hashable (line 99-100)

2. **Incorrect pandas indexing in _calc_perf()**
   - Problem: Used wrong syntax df.loc(axis=0)[:, "pred"] and group_keys=False
     caused loss of datetime index, leading to droplevel error
   - Fix: Remove group_keys=False and use df.xs("label", level=1) to correctly
     select from MultiIndex (line 112-114)

3. **Missing InternalData in pickle whitelist**
   - Problem: InternalData class was not whitelisted, causing UnpicklingError
   - Fix: Add InternalData to SAFE_PICKLE_CLASSES (pickle_utils.py line 91)

Changes:
- qlib/contrib/meta/data_selection/dataset.py:
  * Convert list to tuple for hashable dictionary keys
  * Fix _calc_perf to use correct pandas MultiIndex selection
- qlib/utils/pickle_utils.py:
  * Add InternalData to pickle whitelist

Testing:
✅ DDG-DA workflow now runs successfully to completion
✅ All 154 training tasks complete without errors
✅ Meta-learning data selection works correctly
✅ Final backtest results generated successfully

This is a WORKING VERSION - DDG-DA workflow runs end-to-end!

Related issues:
- Depends on PR microsoft#2230 (zscore whitelist)
- Depends on LightGBM 4.0+ compatibility fix

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
DDG-DA workflow pickles and reloads InternalData objects during meta-learning
data selection. Without this whitelist entry, the workflow fails with:

    UnpicklingError: Forbidden class: qlib.contrib.meta.data_selection.dataset.InternalData

Changes:
- Add InternalData to SAFE_PICKLE_CLASSES in qlib/utils/pickle_utils.py
- Add test case test_internal_data_is_safelisted to verify the whitelist entry

This is part of the fix for issue microsoft#2130 - DDG-DA workflow requires multiple
classes to be whitelisted for pickle deserialization.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UnpicklingError: Forbidden class: qlib.contrib.data.handler.Alpha158

2 participants