-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
Describe the bug
When using the load_dataset function with a folder path containing regex special characters (such as "[]"), the issue occurs due to how the path is handled in the resolve_pattern function. This function passes the unprocessed path directly to AbstractFileSystem.glob, which supports regular expressions. As a result, the globbing mechanism interprets these characters as regex patterns, leading to a traversal of the entire disk partition instead of confining the search to the intended directory.
Steps to reproduce the bug
just create a folder like E:\[D_DATA]\koch_test, then load_dataset("parquet", data_dir="E:\[D_DATA]\\test", split="train")
it will keep searching the whole disk.
I add two print in glob and resolve_pattern to see the path
Expected behavior
it should load the dataset as in normal folders
Environment info
datasetsversion: 3.3.2- Platform: Windows-10-10.0.22631-SP0
- Python version: 3.10.16
huggingface_hubversion: 0.29.1- PyArrow version: 19.0.1
- Pandas version: 2.2.3
fsspecversion: 2024.12.0