Skip to content

DDG-DA: TypeError: unhashable type: 'list' in InternalData.setup() #2229

@Olcmyk

Description

@Olcmyk

🐛 Bug Description

DDG-DA workflow fails with TypeError: unhashable type: 'list' when trying to use a list as a dictionary key in InternalData.setup().

⚠️ IMPORTANT: This bug is currently masked by two other bugs. To reproduce it, you must first apply the code changes from:

Without these fixes, the workflow will fail earlier and this bug will not be reached.

To Reproduce

IMPORTANT Prerequisites: This bug is currently masked by two other bugs. To reproduce this issue, you must first apply the fixes from:

  1. PR fix(unpickler): allow Alpha158/Alpha360 handlers and the standard dataset chain #2213: fix(unpickler): allow Alpha158/Alpha360 handlers and the standard dataset chain #2213 (fixes UnpicklingError UnpicklingError: Forbidden class: qlib.contrib.data.handler.Alpha158 #2130)
  2. PR fix: LightGBM 4.0+ compatibility for early_stopping_rounds=None #2227: fix: LightGBM 4.0+ compatibility for early_stopping_rounds=None #2227 (fixes LightGBM 4.0+ compatibility LightGBM 4.0+ Compatibility: TypeError when early_stopping_rounds=None #2226)

Without applying both fixes above, the DDG-DA workflow will fail earlier with different errors, and this bug will not be reached.

Steps to reproduce:

  1. Apply the code changes from PR fix(unpickler): allow Alpha158/Alpha360 handlers and the standard dataset chain #2213 and PR fix: LightGBM 4.0+ compatibility for early_stopping_rounds=None #2227
  2. Navigate to DDG-DA example:
    cd examples/benchmarks_dynamic/DDG-DA
  3. Clean previous runs:
    rm -rf mlruns
  4. Run the workflow:
    python workflow.py run

Error:

TypeError: unhashable type: 'list'
  File "qlib/contrib/meta/data_selection/dataset.py", line 102, in setup
    self.data_ic_df = pd.DataFrame(dict(zip(key_l, ic_l)))

Expected Behavior

The DDG-DA workflow should complete successfully without raising a TypeError.

Screenshot

N/A (Error is in terminal output)

Environment

  • Qlib version: d5379c5 (main branch)
  • Python version: 3.8
  • OS: Linux (Ubuntu, kernel 6.2.0-26-generic)
  • Hardware: NVIDIA GeForce RTX 4080 SUPER (AutoDL cloud instance)
  • Commit number: d5379c5

Additional Notes

Root Cause

File: qlib/contrib/meta/data_selection/dataset.py
Lines: 97-102

# Line 97: Extract train segment (which is a list)
data_key = task["dataset"]["kwargs"]["segments"]["train"]
# Example: [datetime.date(2008, 1, 1), datetime.date(2014, 12, 31)]

# Line 98: Append to key_l
key_l.append(data_key)

# Line 102: Try to use list as dictionary key - FAILS!
self.data_ic_df = pd.DataFrame(dict(zip(key_l, ic_l)))

Problem: In Python, lists are unhashable and cannot be used as dictionary keys. The segments["train"] value from the config is a list [start_date, end_date], which cannot be used as a dictionary key.

Solution: Convert the list to a tuple before using it as a key:

key_l.append(tuple(data_key))  # Convert list to tuple

Scope

This bug is isolated to this one location. I searched the entire codebase for similar dict(zip()) patterns and verified that only this occurrence has the issue.

Relationship to Other Issues

This is the third bug discovered in the DDG-DA workflow. These bugs are chained - each one masks the next:

  1. Issue UnpicklingError: Forbidden class: qlib.contrib.data.handler.Alpha158 #2130 (UnpicklingError) - Fixed by PR fix(unpickler): allow Alpha158/Alpha360 handlers and the standard dataset chain #2213

  2. Issue LightGBM 4.0+ Compatibility: TypeError when early_stopping_rounds=None #2226 (LightGBM 4.0+ compatibility) - Fixed by PR fix: LightGBM 4.0+ compatibility for early_stopping_rounds=None #2227

  3. This issue (unhashable list) - Not yet fixed

Why the dependency? Each bug prevents the code from reaching the next stage. Without fixing bug #1, you never reach bug #2. Without fixing bugs #1 and #2, you never reach bug #3 (this issue).

These are three independent bugs in different parts of the code, but they happen to execute in sequence in the DDG-DA workflow.

Impact

  • Severity: HIGH
  • Affected users: Anyone trying to run DDG-DA workflow
  • Workaround: None (code must be fixed)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions