Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LocalDatasetModuleFactoryWithoutScript extracts invalid builder name #4399

Closed
apohllo opened this issue May 24, 2022 · 5 comments
Closed

LocalDatasetModuleFactoryWithoutScript extracts invalid builder name #4399

apohllo opened this issue May 24, 2022 · 5 comments
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@apohllo
Copy link
Contributor

apohllo commented May 24, 2022

Describe the bug

Trying to load a local dataset raises an error indicating that the config builder has to have a name.
No error should be reported, since the call is completly valid.

Steps to reproduce the bug

load_dataset("./data/some-dataset/", name="some-name")

Expected results

The dataset should be loaded.

Actual results

Traceback (most recent call last):
  File "train_lquad.py", line 19, in <module>
    load(tokenize_target_function, tokenize_target_function, {}, tokenizer)
  File "train_lquad.py", line 14, in load
    dataset = load_dataset("./data/lquad/", name="lquad")
  File "/net/pr2/scratch/people/plgapohl/python-3.8.6/lib/python3.8/site-packages/datasets/load.py", line 1708, in load_dataset                                                                           
    builder_instance = load_dataset_builder(
  File "/net/pr2/scratch/people/plgapohl/python-3.8.6/lib/python3.8/site-packages/datasets/load.py", line 1560, in load_dataset_builder                                                                   
    builder_instance: DatasetBuilder = builder_cls(
  File "/net/pr2/scratch/people/plgapohl/python-3.8.6/lib/python3.8/site-packages/datasets/builder.py", line 269, in __init__                                                                             
    self.config, self.config_id = self._create_builder_config(
  File "/net/pr2/scratch/people/plgapohl/python-3.8.6/lib/python3.8/site-packages/datasets/builder.py", line 403, in _create_builder_config                                                               
    raise ValueError(f"BuilderConfig must have a name, got {builder_config.name}")
ValueError: BuilderConfig must have a name, got

Environment info

  • datasets version: 2.2.2
  • Platform: Linux-4.18.0-348.20.1.el8_5.x86_64-x86_64-with-glibc2.2.5
  • Python version: 3.8.6
  • PyArrow version: 8.0.0
  • Pandas version: 1.4.2

The error is probably in line 795 in load.py:

 builder_kwargs = {                        
     "hash": hash,
     "data_files": data_files,
     "name": os.path.basename(self.path),
     "base_path": self.path,
     **builder_kwargs,
 }

os.path.basename for a directory returns an empty string, rather than the name of the directory.

@apohllo apohllo added the bug Something isn't working label May 24, 2022
@apohllo
Copy link
Contributor Author

apohllo commented May 24, 2022

Ok, so

os.path.basename("/home/user/")

gives '' while

os.path.basename("/home/user")

gives user.
The code should check if the last char is a slash.

@apohllo
Copy link
Contributor Author

apohllo commented May 24, 2022

The fix is:

"name": os.path.basename(self.path[:-1] if self.path[-1] == "/" else self.path)

@TheSeriousProgrammer
Copy link

I came through the same issue , just removing the last slash in the dataset path fixed it for me, may be this repo moderators could accept this as an accepted answer atleast if this could not be integrated

The fix is:

"name": os.path.basename(self.path[:-1] if self.path[-1] == "/" else self.path)

@apohllo consider making a pull request on this

Thanks for the amazing contributions from huggingface people !!

@mariosasko
Copy link
Collaborator

@apohllo Would you be interested in submitting a PR with the fix?

apohllo added a commit to apohllo/datasets that referenced this issue Sep 11, 2022
@apohllo
Copy link
Contributor Author

apohllo commented Sep 11, 2022

@mariosasko here we go:

#4967

TBH I haven't tested it yet, but should work, since this is a basic change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants