Skip to content

function load_dataset can't solve folder path with regex characters like "[]" #7468

@Hpeox

Description

@Hpeox

Describe the bug

When using the load_dataset function with a folder path containing regex special characters (such as "[]"), the issue occurs due to how the path is handled in the resolve_pattern function. This function passes the unprocessed path directly to AbstractFileSystem.glob, which supports regular expressions. As a result, the globbing mechanism interprets these characters as regex patterns, leading to a traversal of the entire disk partition instead of confining the search to the intended directory.

Steps to reproduce the bug

just create a folder like E:\[D_DATA]\koch_test, then load_dataset("parquet", data_dir="E:\[D_DATA]\\test", split="train")
it will keep searching the whole disk.

I add two print in glob and resolve_pattern to see the path

Expected behavior

it should load the dataset as in normal folders

Environment info

  • datasets version: 3.3.2
  • Platform: Windows-10-10.0.22631-SP0
  • Python version: 3.10.16
  • huggingface_hub version: 0.29.1
  • PyArrow version: 19.0.1
  • Pandas version: 2.2.3
  • fsspec version: 2024.12.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions